UserContent
released
October 4th, 2021 at 3:19pm
Parsing
Interaction pairs are parsed from the bam files using pairtools version 0.2.2. Filtering consists of several commands:
pairtools parse- Produces a pairsam file from an input
bamfile. - The pairsam file is a pairs file, listing one read pair per line, with additional columns to track the sam-file lines, and a pairtools read classification.
- These classifications include information on whether the read aligned to 0, 1, or multiple places in the genome and whether it aligned end-to-end or if it was clipped.
- This tool also upper-triangularizes the reads, i.e. if the coordinate of second read is higher than the first, the reads are flipped.
-
For more details, see the pairtools documentation.
-
pairtools sort - Produces a sorted
pairsamfile from an inputpairsamfile. -
Note that the flipping order and sort order of chromosomes is not identical. See the docs for more details.
-
pairtools dedup --mark-dups - (equivalent to
pairtools markasdup) - Identify duplicate alignments.
-
Arbitrarily designate the duplicate status among the two duplicate alignments.
-
pairtools select - Remove duplicates, multi-mapped reads, and reads non-uniquely mapped at the 5' end.
Source files (v1.1.1_dcic_4):
- Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPB1/
- CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-bam.cwl