Reference Files

Genome Reference

The 4DN Data Portal currently hosts the genome references of 5 species - human, mouse, chicken, fruit fly and zebrafish, in various formats. The reference genomes of human and mouse are identical to those used by the ENCODE consortium.

speciesgenome versionfasta BWA index  Bowtie2 index 
humanGRCh38 no alt v15 (encode)fasta.gztgz / tartar
mousemm10/GRCm38 no alt (encode)fasta.gztgz / tartar
fruit flydm6fasta.gztgz

Chrom Sizes

Chrom sizes files in the following tab-delimited text format are also hosted. Main-only chrom sizes files contain only chromosomes and not contigs.

<chr_1>  <length_of_chr_2>
<chr_2>  <length_of_chr_2>
specieschrom sizes
humanmain only / all
mousemain only / all
chickenmain only / extended
fruit flymain only
zebrafishmain only

Gene Annotation

Consistent with ENCODE, we host gene annotations for human and mouse based on GENCODE. All data processing at the portal involving gene annotations uses these annotation files.

speciesannotation versionGTF/GFFtranscript-gene mappingStar indexRSEM indexHiGlass
humanGENCODE V29encodetsvtgztgzbeddb
mouseGENCODE M21encodetsvtgztgzbeddb

Restriction Enzyme Sites

The genomic positions of restriction enzyme sites are listed in the following format that the Juicer program recognizes.

  <chr_1> <position_1> <position_2> <position_3> ...
  <chr_2> <position_1> <position_2> <position_3> ...

An example of a few fields in the first line looks as below:

  chr1 3004105 3005820 3008316 3008894 3009813

The files can be obtained from the links below.

speciesHindIIIMboIDpnIINcoIMspINcoI-MspI-BspHI mixAluIDdeIDdeI-DpnII mix
fruit flytxt