{"name": "resources/data-analysis/imargi-pipeline", "title": "iMARGI Processing Pipeline", "status": "released", "aliases": ["4dn-dcic-lab:imargi-pipeline"], "content": [{"lab": {"@type": ["Lab", "Item"], "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "status": "current", "@id": "/labs/4dn-dcic-lab/", "display_title": "4DN DCIC, HMS", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "MARGI is a protocol for mapping RNA-DNA contacts on a genome-wide scale, analogous to Hi-C methods which map DNA-DNA contacts. In situ MARGI (iMARGI) is the successor of the MARGI technique, requiring fewer input cells and less time than required by MARGI. \n\nThe 4DN iMARGI data processing pipeline is adapted from the [Zhong iMARGI pipeline](https://github.com/Zhong-Lab-UCSD/iMARGI-Docker). Its primary components are cleaning and alignment of reads, parsing of alignments into pairs, and merging and aggregation of pairs. To learn more about the original pipeline or experimental protocol, please reference the [iMARGI Pipeline documentation](http://sysbiocomp.ucsd.edu/public/frankyan/imargi_pipeline/).\n\nThe primary modifications are:\n\n* An additional output pairs file from the parsing step\n* The addition of 4DN standard resolutions and modification of flag usage in the creation of cool files\n* Swapping of column order for DNA and RNA in `cooler cload pairs`, and\n* Additional test files and CWLs for running the pipeline.\n\nThe iMARGI Docker, used in all steps of the pipeline, can be found at https://hub.docker.com/r/4dndcic/imargi/v1.1.1_dcic_4", "name": "resources.data-analysis.imargi-processing-pipeline.overview", "award": {"@type": ["Award", "Item"], "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "status": "current", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "@id": "/awards/2U01CA200059-06/", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "Overview", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.imargi-processing-pipeline.overview"], "options": {"filetype": "md", "collapsible": false, "default_open": true}, "date_created": "2021-10-04T13:11:41.638504+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2021-10-12T16:39:45.215001+00:00"}, "schema_version": "2", "@id": "/static-sections/25cdadf9-8d2a-44a7-a360-634b1cf1b5d8/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "25cdadf9-8d2a-44a7-a360-634b1cf1b5d8", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "Overview", "external_references": [], "content": "MARGI is a protocol for mapping RNA-DNA contacts on a genome-wide scale, analogous to Hi-C methods which map DNA-DNA contacts. In situ MARGI (iMARGI) is the successor of the MARGI technique, requiring fewer input cells and less time than required by MARGI. \n\nThe 4DN iMARGI data processing pipeline is adapted from the [Zhong iMARGI pipeline](https://github.com/Zhong-Lab-UCSD/iMARGI-Docker). Its primary components are cleaning and alignment of reads, parsing of alignments into pairs, and merging and aggregation of pairs. To learn more about the original pipeline or experimental protocol, please reference the [iMARGI Pipeline documentation](http://sysbiocomp.ucsd.edu/public/frankyan/imargi_pipeline/).\n\nThe primary modifications are:\n\n* An additional output pairs file from the parsing step\n* The addition of 4DN standard resolutions and modification of flag usage in the creation of cool files\n* Swapping of column order for DNA and RNA in `cooler cload pairs`, and\n* Additional test files and CWLs for running the pipeline.\n\nThe iMARGI Docker, used in all steps of the pipeline, can be found at https://hub.docker.com/r/4dndcic/imargi/v1.1.1_dcic_4", "filetype": "md", "content_as_html": "<div class=\"markdown-container\"><p>MARGI is a protocol for mapping RNA-DNA contacts on a genome-wide scale, analogous to Hi-C methods which map DNA-DNA contacts. In situ MARGI (iMARGI) is the successor of the MARGI technique, requiring fewer input cells and less time than required by MARGI. </p>\n<p>The 4DN iMARGI data processing pipeline is adapted from the <a href=\"https://github.com/Zhong-Lab-UCSD/iMARGI-Docker\" rel=\"noopener noreferrer\" target=\"_blank\">Zhong iMARGI pipeline</a>. Its primary components are cleaning and alignment of reads, parsing of alignments into pairs, and merging and aggregation of pairs. To learn more about the original pipeline or experimental protocol, please reference the <a href=\"http://sysbiocomp.ucsd.edu/public/frankyan/imargi_pipeline/\" rel=\"noopener noreferrer\" target=\"_blank\">iMARGI Pipeline documentation</a>.</p>\n<p>The primary modifications are:</p>\n<ul>\n<li>An additional output pairs file from the parsing step</li>\n<li>The addition of 4DN standard resolutions and modification of flag usage in the creation of cool files</li>\n<li>Swapping of column order for DNA and RNA in <code>cooler cload pairs</code>, and</li>\n<li>Additional test files and CWLs for running the pipeline.</li>\n</ul>\n<p>The iMARGI Docker, used in all steps of the pipeline, can be found at https://hub.docker.com/r/4dndcic/imargi/v1.1.1_dcic_4</p></div>"}, {"lab": {"@type": ["Lab", "Item"], "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "status": "current", "@id": "/labs/4dn-dcic-lab/", "display_title": "4DN DCIC, HMS", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "In iMARGI experiments, two random bases initiate each RNA end read. Thus to improve mapping, R1 reads are cleaned using `seqtk` version 1.3. The command ::\n\n    seqtk trimfq -b 2\n\nremoves two bases (``-b 2``) from the left end of each read.\n\nReads are then mapped to the `GRCh38 <https://data.4dnucleome.org/files-reference/4DNFIZQZ39L9/>`_ (human) or `mm10 <https://data.4dnucleome.org/files-reference/4DNFI823LSI8/>`_ (mouse) reference genome using `bwa` version 0.7.17. In particular, we run: ::\n\n    bwa mem -t <nthreads> -SP5M <genome_index> <fastq1> <fastq2>\n\n* The ``-SP`` option is used to ensure the results are equivalent to that obtained by running ``bwa mem`` on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in ``bwa mem`` that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.\n* The ``-5`` option is used to report the 5' portion of chimeric alignments as the primary alignment. For chimeric alignments, ``bwa mem`` reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either 'supplementary' or 'secondary'.\n* The ``-M`` option is used to annotate the secondary/supplementary clipped reads as *secondary* rather than *supplementary*, for compatibility with some public software tools such as ``picard MarkDuplicates``.\n* The ``-t`` option is used for multi-threading and should not affect the result.\n\nSource files (v1.1.1\\_dcic\\_4):\n\n* Workflow: `https://data.4dnucleome.org/workflows/4DNWFMRGIPA1/`\n* CWL: `https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-fastq.cwl`\n", "name": "resources.data-analysis.imargi-processing-pipeline.fastq", "award": {"@type": ["Award", "Item"], "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "status": "current", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "@id": "/awards/2U01CA200059-06/", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "Cleaning and Alignment", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.imargi-processing-pipeline.sources"], "options": {"filetype": "rst", "collapsible": false, "default_open": true, "convert_ext_links": true}, "date_created": "2021-10-04T14:36:12.617939+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2024-08-19T23:34:13.588295+00:00"}, "schema_version": "2", "@id": "/static-sections/f4b4589f-a631-4aba-8d2a-924aac55169f/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "f4b4589f-a631-4aba-8d2a-924aac55169f", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "Cleaning and Alignment", "external_references": [], "content": "In iMARGI experiments, two random bases initiate each RNA end read. Thus to improve mapping, R1 reads are cleaned using `seqtk` version 1.3. The command ::\n\n    seqtk trimfq -b 2\n\nremoves two bases (``-b 2``) from the left end of each read.\n\nReads are then mapped to the `GRCh38 <https://data.4dnucleome.org/files-reference/4DNFIZQZ39L9/>`_ (human) or `mm10 <https://data.4dnucleome.org/files-reference/4DNFI823LSI8/>`_ (mouse) reference genome using `bwa` version 0.7.17. In particular, we run: ::\n\n    bwa mem -t <nthreads> -SP5M <genome_index> <fastq1> <fastq2>\n\n* The ``-SP`` option is used to ensure the results are equivalent to that obtained by running ``bwa mem`` on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in ``bwa mem`` that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.\n* The ``-5`` option is used to report the 5' portion of chimeric alignments as the primary alignment. For chimeric alignments, ``bwa mem`` reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either 'supplementary' or 'secondary'.\n* The ``-M`` option is used to annotate the secondary/supplementary clipped reads as *secondary* rather than *supplementary*, for compatibility with some public software tools such as ``picard MarkDuplicates``.\n* The ``-t`` option is used for multi-threading and should not affect the result.\n\nSource files (v1.1.1\\_dcic\\_4):\n\n* Workflow: `https://data.4dnucleome.org/workflows/4DNWFMRGIPA1/`\n* CWL: `https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-fastq.cwl`\n", "filetype": "rst", "content_as_html": "<div class=\"rst-container\"><p>In iMARGI experiments, two random bases initiate each RNA end read. Thus to improve mapping, R1 reads are cleaned using <cite>seqtk</cite> version 1.3. The command</p><pre class=\"literal-block\">\nseqtk trimfq -b 2\n</pre><p>removes two bases (<code></code>) from the left end of each read.</p><p>Reads are then mapped to the <a class=\"reference external\" href=\"https://data.4dnucleome.org/files-reference/4DNFIZQZ39L9/\" rel=\"noopener noreferrer\" target=\"_blank\">GRCh38</a> (human) or <a class=\"reference external\" href=\"https://data.4dnucleome.org/files-reference/4DNFI823LSI8/\" rel=\"noopener noreferrer\" target=\"_blank\">mm10</a> (mouse) reference genome using <cite>bwa</cite> version 0.7.17. In particular, we run:</p><pre class=\"literal-block\">\nbwa mem -t &lt;nthreads&gt; -SP5M &lt;genome_index&gt; &lt;fastq1&gt; &lt;fastq2&gt;\n</pre><ul class=\"simple\"><li>The <code>-SP</code> option is used to ensure the results are equivalent to that obtained by running <code>bwa mem</code> on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in <code>bwa mem</code> that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.</li><li>The <code>-5</code> option is used to report the 5' portion of chimeric alignments as the primary alignment. For chimeric alignments, <code>bwa mem</code> reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either 'supplementary' or 'secondary'.</li><li>The <code>-M</code> option is used to annotate the secondary/supplementary clipped reads as <em>secondary</em> rather than <em>supplementary</em>, for compatibility with some public software tools such as <code>picard MarkDuplicates</code>.</li><li>The <code>-t</code> option is used for multi-threading and should not affect the result.</li></ul><p>Source files (v1.1.1_dcic_4):</p><ul class=\"simple\"><li>Workflow: <cite>https://data.4dnucleome.org/workflows/4DNWFMRGIPA1/</cite></li><li>CWL: <cite>https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-fastq.cwl</cite></li></ul></div>"}, {"lab": {"@type": ["Lab", "Item"], "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "status": "current", "@id": "/labs/4dn-dcic-lab/", "display_title": "4DN DCIC, HMS", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "Interaction pairs are parsed from the `bam` files using [`pairtools`](https://github.com/mirnylab/pairtools) version 0.2.2. Filtering consists of several commands:\n\n* `pairtools parse`\n   * Produces a [pairsam](https://pairsamtools.readthedocs.io/en/latest/pairsam.html) file from an input `bam` file.\n   * The pairsam file is a pairs file, listing one read pair per line, with additional columns to track the sam-file lines, and a pairtools read classification.\n   * These classifications include information on whether the read aligned to 0, 1, or multiple places in the genome and whether it aligned end-to-end or if it was clipped.\n   * This tool also upper-triangularizes the reads, i.e. if the coordinate of second read is higher than the first, the reads are flipped.\n   * For more details, see the [pairtools documentation](https://pairtools.readthedocs.io/en/latest/parsing.html).\n\n* `pairtools sort`\n   * Produces a sorted `pairsam` file from an input `pairsam` file.\n   * Note that the flipping order and sort order of chromosomes is not identical. See [the docs](https://pairtools.readthedocs.io/en/latest/sorting.html#chromosomal-order-for-sorting-and-flipping) for more details.\n\n* `pairtools dedup --mark-dups`\n   * (equivalent to `pairtools markasdup`)\n   * Identify duplicate alignments.\n   * Arbitrarily designate the duplicate status among the two duplicate alignments.\n\n* `pairtools select`\n   * Remove duplicates, multi-mapped reads, and reads non-uniquely mapped at the 5' end.\n\nSource files (v1.1.1\\_dcic\\_4): \n\n* Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPB1/\n* CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-bam.cwl", "name": "resources.data-analysis.imargi-processing-pipeline.bam", "award": {"@type": ["Award", "Item"], "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "status": "current", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "@id": "/awards/2U01CA200059-06/", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "Parsing", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.imargi-processing-pipeline.bam"], "options": {"filetype": "md", "collapsible": false, "default_open": true, "convert_ext_links": true}, "date_created": "2021-10-04T15:19:02.775182+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2024-03-26T18:52:41.840887+00:00"}, "schema_version": "2", "@id": "/static-sections/7d3dfe7b-f35f-4681-aa3e-46bdf3ecde54/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "7d3dfe7b-f35f-4681-aa3e-46bdf3ecde54", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "Parsing", "external_references": [], "content": "Interaction pairs are parsed from the `bam` files using [`pairtools`](https://github.com/mirnylab/pairtools) version 0.2.2. Filtering consists of several commands:\n\n* `pairtools parse`\n   * Produces a [pairsam](https://pairsamtools.readthedocs.io/en/latest/pairsam.html) file from an input `bam` file.\n   * The pairsam file is a pairs file, listing one read pair per line, with additional columns to track the sam-file lines, and a pairtools read classification.\n   * These classifications include information on whether the read aligned to 0, 1, or multiple places in the genome and whether it aligned end-to-end or if it was clipped.\n   * This tool also upper-triangularizes the reads, i.e. if the coordinate of second read is higher than the first, the reads are flipped.\n   * For more details, see the [pairtools documentation](https://pairtools.readthedocs.io/en/latest/parsing.html).\n\n* `pairtools sort`\n   * Produces a sorted `pairsam` file from an input `pairsam` file.\n   * Note that the flipping order and sort order of chromosomes is not identical. See [the docs](https://pairtools.readthedocs.io/en/latest/sorting.html#chromosomal-order-for-sorting-and-flipping) for more details.\n\n* `pairtools dedup --mark-dups`\n   * (equivalent to `pairtools markasdup`)\n   * Identify duplicate alignments.\n   * Arbitrarily designate the duplicate status among the two duplicate alignments.\n\n* `pairtools select`\n   * Remove duplicates, multi-mapped reads, and reads non-uniquely mapped at the 5' end.\n\nSource files (v1.1.1\\_dcic\\_4): \n\n* Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPB1/\n* CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-bam.cwl", "filetype": "md", "content_as_html": "<div class=\"markdown-container\"><p>Interaction pairs are parsed from the <code>bam</code> files using <a href=\"https://github.com/mirnylab/pairtools\" rel=\"noopener noreferrer\" target=\"_blank\"><code>pairtools</code></a> version 0.2.2. Filtering consists of several commands:</p>\n<ul>\n<li><code>pairtools parse</code></li>\n<li>Produces a <a href=\"https://pairsamtools.readthedocs.io/en/latest/pairsam.html\" rel=\"noopener noreferrer\" target=\"_blank\">pairsam</a> file from an input <code>bam</code> file.</li>\n<li>The pairsam file is a pairs file, listing one read pair per line, with additional columns to track the sam-file lines, and a pairtools read classification.</li>\n<li>These classifications include information on whether the read aligned to 0, 1, or multiple places in the genome and whether it aligned end-to-end or if it was clipped.</li>\n<li>This tool also upper-triangularizes the reads, i.e. if the coordinate of second read is higher than the first, the reads are flipped.</li>\n<li>\n<p>For more details, see the <a href=\"https://pairtools.readthedocs.io/en/latest/parsing.html\" rel=\"noopener noreferrer\" target=\"_blank\">pairtools documentation</a>.</p>\n</li>\n<li>\n<p><code>pairtools sort</code></p>\n</li>\n<li>Produces a sorted <code>pairsam</code> file from an input <code>pairsam</code> file.</li>\n<li>\n<p>Note that the flipping order and sort order of chromosomes is not identical. See <a href=\"https://pairtools.readthedocs.io/en/latest/sorting.html#chromosomal-order-for-sorting-and-flipping\" rel=\"noopener noreferrer\" target=\"_blank\">the docs</a> for more details.</p>\n</li>\n<li>\n<p><code>pairtools dedup --mark-dups</code></p>\n</li>\n<li>(equivalent to <code>pairtools markasdup</code>)</li>\n<li>Identify duplicate alignments.</li>\n<li>\n<p>Arbitrarily designate the duplicate status among the two duplicate alignments.</p>\n</li>\n<li>\n<p><code>pairtools select</code></p>\n</li>\n<li>Remove duplicates, multi-mapped reads, and reads non-uniquely mapped at the 5' end.</li>\n</ul>\n<p>Source files (v1.1.1_dcic_4): </p>\n<ul>\n<li>Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPB1/</li>\n<li>CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-bam.cwl</li>\n</ul></div>"}, {"lab": {"@type": ["Lab", "Item"], "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "status": "current", "@id": "/labs/4dn-dcic-lab/", "display_title": "4DN DCIC, HMS", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "Pairs are merged and aggregated with `pairix` version 0.3.3. Pairs files are then converted to `mcool` via `cooler` version 0.8.5.\n\nMerging:\n\n* There is no merging of sequencing replicates. Processing is performed separately for each sequencing replicate.\n* Biological replicates are merged using the same method as used by the Hi-C processing pipeline. That is,\n  * Biological replicates are merged after the duplicate removal step, since PCR duplication events happen independently in each replicate.\n  * Merging is performed on `pairs` files using [run-merge-pairs.sh](https://github.com/4dn-dcic/docker-4dn-hic/blob/master/scripts/run-merge-pairs.sh).\n  * 4DN DCIC provides a merged output as a merged `pairs` file.\n\nFile Format Conversion:\n\n* `mcool` files are contact matrices containing multiple resolutions which can be visualized in [HiGlass](http://higlass.io/).\n* The 4DN standard resolutions for `mcool` files are: 1kb, 2kb, 5kb, 10kb, 25kb, 50kb, 100kb, 250kb, 500kb, 1Mb, 2.5Mb, 5Mb, 10Mb.\n\nSource files (v1.1.1\\_dcic\\_4):\n\n* Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPC1/\n* CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-pairs.cwl\n* Docker (for merging only): https://github.com/4dn-dcic/docker-4dn-hic/tree/v43", "name": "resources.data-analysis.imargi-processing-pipeline.aggregation", "award": {"@type": ["Award", "Item"], "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "status": "current", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "@id": "/awards/2U01CA200059-06/", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "Aggregation", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.imargi-processing-pipeline.aggregation"], "options": {"filetype": "md", "collapsible": false, "default_open": true, "convert_ext_links": true}, "date_created": "2021-10-04T15:29:54.387430+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2024-03-26T18:53:39.345240+00:00"}, "schema_version": "2", "@id": "/static-sections/e504cbe6-5440-4c5b-8ecb-3d40bd2d8ff8/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "e504cbe6-5440-4c5b-8ecb-3d40bd2d8ff8", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "Aggregation", "external_references": [], "content": "Pairs are merged and aggregated with `pairix` version 0.3.3. Pairs files are then converted to `mcool` via `cooler` version 0.8.5.\n\nMerging:\n\n* There is no merging of sequencing replicates. Processing is performed separately for each sequencing replicate.\n* Biological replicates are merged using the same method as used by the Hi-C processing pipeline. That is,\n  * Biological replicates are merged after the duplicate removal step, since PCR duplication events happen independently in each replicate.\n  * Merging is performed on `pairs` files using [run-merge-pairs.sh](https://github.com/4dn-dcic/docker-4dn-hic/blob/master/scripts/run-merge-pairs.sh).\n  * 4DN DCIC provides a merged output as a merged `pairs` file.\n\nFile Format Conversion:\n\n* `mcool` files are contact matrices containing multiple resolutions which can be visualized in [HiGlass](http://higlass.io/).\n* The 4DN standard resolutions for `mcool` files are: 1kb, 2kb, 5kb, 10kb, 25kb, 50kb, 100kb, 250kb, 500kb, 1Mb, 2.5Mb, 5Mb, 10Mb.\n\nSource files (v1.1.1\\_dcic\\_4):\n\n* Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPC1/\n* CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-pairs.cwl\n* Docker (for merging only): https://github.com/4dn-dcic/docker-4dn-hic/tree/v43", "filetype": "md", "content_as_html": "<div class=\"markdown-container\"><p>Pairs are merged and aggregated with <code>pairix</code> version 0.3.3. Pairs files are then converted to <code>mcool</code> via <code>cooler</code> version 0.8.5.</p>\n<p>Merging:</p>\n<ul>\n<li>There is no merging of sequencing replicates. Processing is performed separately for each sequencing replicate.</li>\n<li>Biological replicates are merged using the same method as used by the Hi-C processing pipeline. That is,</li>\n<li>Biological replicates are merged after the duplicate removal step, since PCR duplication events happen independently in each replicate.</li>\n<li>Merging is performed on <code>pairs</code> files using <a href=\"https://github.com/4dn-dcic/docker-4dn-hic/blob/master/scripts/run-merge-pairs.sh\" rel=\"noopener noreferrer\" target=\"_blank\">run-merge-pairs.sh</a>.</li>\n<li>4DN DCIC provides a merged output as a merged <code>pairs</code> file.</li>\n</ul>\n<p>File Format Conversion:</p>\n<ul>\n<li><code>mcool</code> files are contact matrices containing multiple resolutions which can be visualized in <a href=\"http://higlass.io/\" rel=\"noopener noreferrer\" target=\"_blank\">HiGlass</a>.</li>\n<li>The 4DN standard resolutions for <code>mcool</code> files are: 1kb, 2kb, 5kb, 10kb, 25kb, 50kb, 100kb, 250kb, 500kb, 1Mb, 2.5Mb, 5Mb, 10Mb.</li>\n</ul>\n<p>Source files (v1.1.1_dcic_4):</p>\n<ul>\n<li>Workflow: https://data.4dnucleome.org/workflows/4DNWFMRGIPC1/</li>\n<li>CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-pairs.cwl</li>\n<li>Docker (for merging only): https://github.com/4dn-dcic/docker-4dn-hic/tree/v43</li>\n</ul></div>"}, {"lab": {"@type": ["Lab", "Item"], "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "status": "current", "@id": "/labs/4dn-dcic-lab/", "display_title": "4DN DCIC, HMS", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "The 4DN version of the iMARGI pipeline also contains an output QC report and summary statistics generated on output `pairs` files. See an example report [here](https://data.4dnucleome.org/quality-metrics-margi/8a16a1a0-d16a-4cb8-a1c0-1098123ed93d).\n\nSource files (v1.1.1\\_dcic\\_4)\n\n* CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi_qc.cwl\n* Script: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/scripts/imargi_stats.sh", "name": "resources.data-analysis.imargi-processing-pipeline.qc", "award": {"@type": ["Award", "Item"], "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "status": "current", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "@id": "/awards/2U01CA200059-06/", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "QC", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.imargi-processing-pipeline.qc"], "options": {"filetype": "md", "collapsible": false, "default_open": true}, "date_created": "2021-10-05T13:59:47.515513+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2021-10-12T16:40:38.619162+00:00"}, "schema_version": "2", "@id": "/static-sections/7d2cd1c1-4c31-4e7d-bc9c-bd2efe7dcb42/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "7d2cd1c1-4c31-4e7d-bc9c-bd2efe7dcb42", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "QC", "external_references": [], "content": "The 4DN version of the iMARGI pipeline also contains an output QC report and summary statistics generated on output `pairs` files. See an example report [here](https://data.4dnucleome.org/quality-metrics-margi/8a16a1a0-d16a-4cb8-a1c0-1098123ed93d).\n\nSource files (v1.1.1\\_dcic\\_4)\n\n* CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi_qc.cwl\n* Script: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/scripts/imargi_stats.sh", "filetype": "md", "content_as_html": "<div class=\"markdown-container\"><p>The 4DN version of the iMARGI pipeline also contains an output QC report and summary statistics generated on output <code>pairs</code> files. See an example report <a href=\"https://data.4dnucleome.org/quality-metrics-margi/8a16a1a0-d16a-4cb8-a1c0-1098123ed93d\" rel=\"noopener noreferrer\" target=\"_blank\">here</a>.</p>\n<p>Source files (v1.1.1_dcic_4)</p>\n<ul>\n<li>CWL: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi_qc.cwl</li>\n<li>Script: https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/scripts/imargi_stats.sh</li>\n</ul></div>"}], "date_created": "2021-10-04T15:50:03.601963+00:00", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2024-08-19T23:34:24.829003+00:00"}, "schema_version": "3", "content_location": "bottom", "table-of-contents": {"enabled": false, "expanded": true, "skip-depth": 1, "header-depth": 4, "include-top-link": false}, "@id": "/pages/9d7c7eab-9c15-4d46-860a-4abdc64d825c/", "@type": ["Page", "Item"], "uuid": "9d7c7eab-9c15-4d46-860a-4abdc64d825c", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "iMARGI Processing Pipeline", "external_references": [], "@context": "/terms/", "aggregated-items": {}, "validation-errors": []}