Guide to Uploading Processed Results

Submitter Guide
Guide to Uploading Processed Results

Online Submission

Summary

4DN members may want to submit results of their analyses to the data portal for a variety of reasons. These may include:

Sharing of preliminary results with other 4DN members while collaborating on projects or preparing manuscripts.
Providing results for 4DN members for datasets for which the 4DN-DCIC has not yet developed a standardized pipeline for data processing.

Quick Start

If you don’t already have an assigned data submitter for your group write to us at support@4dnucleome.org so that we can give them write access to the portal.
Prepare an excel worksheet with minimal metadata - see below for details.
E-mail your worksheet to your data wrangler (or support@4dnucleome.org)
Install the Submit4DN python package pip install Submit4DN in a location that has access to your files (more info).
Generate access keys for authentication following these instructions and copy them to a file in your home directory named keypairs.json
Submit your metadata and upload the files to the portal (more info).

To validate your spreadsheet:

import_data <metadata.xlsx>

To initiate submission and file upload:

import_data --update <metadata.xlsx>

Metadata Preparation

Minimal metadata is required for each file that is to be uploaded. To prepare this metadata enter it into a FileProcessed worksheet - available for download here.

Each row represents one file.

The fields are:

aliases -- you should enter your own identifier that you can use to reference this file in the future and on the accompanying sheet. an alias must take the form - pi-name-lab:identifier_here eg. bing-ren-lab:tad_calls_1
description -- a brief description for the file eg. TAD calls for H1 cells by TopDom
file_format -- eg. bam, bedGraph, bigWig see below for a list of current formats and their designations - new formats can be added at request as needed
file_type -- the type of file based on the information it contains eg. alignments - see below for some suggested file_types values
genome_assembly -- the assembly upon which the analysis was done - valid options are GRCh38, GRCm38, dm6, or galGal5.
filename -- this field must contain the full local path to the file on your system.

NOTE: file names must end with standard extensions. Some files are expected to be compressed using the gzip program and have a .gz extension included. See below for allowable extensions.

Filling in this field is what will trigger the file upload when using the Submit4DN import_data program.

produced_from -- (optional) fill in this field with one or more aliases for files that are directly used to generate the file described in this row.

For example, list the aliases for the fastq files that were aligned to generate the bam file in this row. This information can be used to generate a provenance graph for how files are produced in a processing pipeline (see https://data.4dnucleome.org/experiment-set-replicates/4DNESRJ8KV4Q/#graph-section) * availability -- ‘public’ or ‘internal’: Should the results be made available to public or only within the 4DN Network? * linked_datasets -- the 4DN accessions of the experiment set(s) or publications that these files should be associated eg. 4DNES2M5JIGV (the accession for the Dekker lab in situ Hi-C on H1 cells). * comments -- any other information for the 4DN-DCIC. For example, if a file is more appropriately linked to an existing portal page like the Joint Analysis page you can indicate that here.

Note that the values in the last 3 columns will not be directly submitted but used by the DCIC to make appropriate links and set access permissions for the submitted files.

Additional information

Data Processing Standards

Required: All data processing should be based on the following genome assemblies.

Human: GRCh38
Mouse: GRCm38
Fruitfly: dm6
Chicken: Galgal5

Recommended: standard resolutions 1kb 5kb 10kb 25kb 50kb 100kb.

Supported File formats

HiGlass (Visualization) compatible file formats:

Bed (sorted gzipped)
Bedgraph (sorted gzipped)
Bigwig
Bigbed
Mcool

Other currently supported file formats.

Please contact us if you would like to submit a file in a format that is not listed above. We can also work with you in converting other 2-way or multi-way contact lists to a cooler contact matrix file.

Filename extensions

Filename extensions are standardized though variations are allowed in some common cases.

If a file is compressed with gzip the filename should end with .gz after the usual extension. These cases include:

bed - bed.gz
bedpe - bedpe.gz
bedGraph - bedGraph.gz
clusters - cluster.gz
compressed_fasta - fasta.gz
commpressed text - txt.gz
normvector_juicerformat - normvector.juicerformat.gz

Other standard and allowable extensions can be found here

File types

A list of existing file types in the portal. Please use a similar (or same) short descriptive title for your files:

read pairs
alignments
unfiltered alignments
contact list
contact list-replicate
contact list-combined
contact matrix
normalized contact matrix
long range chromatin interactions
intensity values
peaks
image
locus distances submitter format
dot calls
compartments
insulation score - diamond
insulation score - potential
domain calls
boundaries

Guide to Uploading Processed Results

Browser Suggestion

Previous

Online Submission

Summary

Quick Start

Metadata Preparation

Additional information

Data Processing Standards

Supported File formats

HiGlass (Visualization) compatible file formats:

Filename extensions

File types

Previous

Online Submission