The 4DN Repli-seq data processing pipeline includes read clipping, alignment, filtering, and aggregation. Downstream normalization, smoothing and replicate merging steps will be implemented in the near future.
Adaptor sequences are clipped from repli-seq reads using
cutadapt version 1.14. Specifically, we run:
cutadapt -q 0 -O 1 -m 0 -a <adaptor> <fastq>
-q 0is used to turn off low-quality base removal before adapter searching.
-0 1sets the minimum required overlap length between read end and adaptor to be 1 (default is 3), in case the adaptor sequence partially overlaps with the read rather than being contained in a read.
-m 0means that empty reads are kept and will appear in the output.
AGATCGGAAGAGCACACGTCTG is used as adaptor sequence.
For filtering valid Repli-seq alignments, we use
Specifically, the filtering workflow consists of the following
- MAPQ filtering:
samtools viewcommand with
-q 20was used to skip alignments with MAPQ smaller than 20.
samtools sortcommand was used to sort alignments by genomic coordinates.
- Removal of PCR duplicates:
samtools rmdupcommand was used to remove duplicate alignments.
Binning and Aggregation
Filtered reads were aggregated for each 5kb window using
bedtools coverage. Specifically, the following command was used.
bedtools coverage -counts -sorted -a <BINFILE> -b <INPUT_BAM>
Output is provided in both gzipped
bigwig formats and can be viewed using HiGlass.
As of v16.1, the pipeline output includes a raw counts file in addition to the default scaled counts (RPKM).
The pipeline components are pre-installed in a publicly
available Docker image (
Docker Hub. The source code for the Docker image and pipeline
description in Common Workflow Language (CWL) can be found on
- Latest version (v16.1)
- Workflow metadata : https://data.4dnucleome.org/workflows/622bdf75-2dd1-457f-ad78-d4cd128f8f5b/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16.1/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16.1
- Older versions
- Workflow metadata : https://data.4dnucleome.org/workflows/2a6807f1-93db-4c7b-b148-672534193974/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v16
- Workflow metadata : https://data.4dnucleome.org/workflows/4459a4d8-1bd8-4b6a-b2cc-2506f4270a34/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v14/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v14
- Workflow metadata : https://data.4dnucleome.org/workflows/146da22a-502d-4500-bf57-a7cf0b4b2364/
- CWL : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v13.1/cwl
- Docker : https://github.com/4dn-dcic/docker-4dn-repliseq/tree/v13.1