Adaptor sequences are clipped from repli-seq reads using
cutadapt version 1.14. Specifically, we run:
cutadapt -q 0 -O 1 -m 0 -a <adaptor> <fastq>
-q 0is used to turn off low-quality base removal before adapter searching.
-0 1sets the minimum required overlap length between read end and adaptor to be 1 (default is 3), in case the adaptor sequence partially overlaps with the read rather than being contained in a read.
-m 0means that empty reads are kept and will appear in the output.
AGATCGGAAGAGCACACGTCTG is used as adaptor sequence.
For filtering valid Repli-seq alignments, we use
Specifically, the filtering workflow consists of the following
- MAPQ filtering:
samtools viewcommand with
-q 20was used to skip alignments with MAPQ smaller than 20.
samtools sortcommand was used to sort alignments by genomic coordinates.
- Removal of PCR duplicates:
samtools rmdupcommand was used to remove duplicate alignments.
Binning and Aggregation
Filtered reads were aggregated for each 5kb window using
bedtools coverage. Specifically, the following command was used.
bedtools coverage -counts -sorted -a <BINFILE> -b <INPUT_BAM>
Output is provided in both gzipped
bigwig formats and can be viewed using HiGlass.
The pipeline components are pre-installed in a publicly
available Docker image (
Docker Hub. The source code for the Docker image and pipeline
description in Common Workflow Language (CWL) can be found on
- Latest version (v16)
- Older versions