StaticSection Parsing

UserContent
released
   October 4th, 2021 at 3:19pm

Parsing


Interaction pairs are parsed from the bam files using pairtools version 0.2.2. Filtering consists of several commands:

  • pairtools parse
  • Produces a pairsam file from an input bam file.
  • The pairsam file is a pairs file, listing one read pair per line, with additional columns to track the sam-file lines, and a pairtools read classification.
  • These classifications include information on whether the read aligned to 0, 1, or multiple places in the genome and whether it aligned end-to-end or if it was clipped.
  • This tool also upper-triangularizes the reads, i.e. if the coordinate of second read is higher than the first, the reads are flipped.
  • For more details, see the pairtools documentation.

  • pairtools sort

  • Produces a sorted pairsam file from an input pairsam file.
  • Note that the flipping order and sort order of chromosomes is not identical. See the docs for more details.

  • pairtools dedup --mark-dups

  • (equivalent to pairtools markasdup)
  • Identify duplicate alignments.
  • Arbitrarily designate the duplicate status among the two duplicate alignments.

  • pairtools select

  • Remove duplicates, multi-mapped reads, and reads non-uniquely mapped at the 5' end.

Source files (v1.1.1_dcic_4):

  • Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPB1/
  • CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-bam.cwl