{"lab": {"correspondence": [{"contact_email": "cGV0ZXJfcGFya0BobXMuaGFydmFyZC5lZHU=", "@id": "/users/fb287a31-e765-41c5-8c1d-665f8e9f025b/", "display_title": "Peter Park"}], "status": "current", "display_title": "4DN DCIC, HMS", "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "title": "4DN DCIC, HMS", "@type": ["Lab", "Item"], "@id": "/labs/4dn-dcic-lab/", "pi": {"error": "no view permissions"}, "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "In iMARGI experiments, two random bases initiate each RNA end read. Thus to improve mapping, R1 reads are cleaned using `seqtk` version 1.3. The command ::\n\n    seqtk trimfq -b 2\n\nremoves two bases (``-b 2``) from the left end of each read.\n\nReads are then mapped to the `GRCh38 <https://data.4dnucleome.org/files-reference/4DNFIZQZ39L9/>`_ (human) or `mm10 <https://data.4dnucleome.org/files-reference/4DNFI823LSI8/>`_ (mouse) reference genome using `bwa` version 0.7.17. In particular, we run: ::\n\n    bwa mem -t <nthreads> -SP5M <genome_index> <fastq1> <fastq2>\n\n* The ``-SP`` option is used to ensure the results are equivalent to that obtained by running ``bwa mem`` on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in ``bwa mem`` that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.\n* The ``-5`` option is used to report the 5' portion of chimeric alignments as the primary alignment. For chimeric alignments, ``bwa mem`` reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either 'supplementary' or 'secondary'.\n* The ``-M`` option is used to annotate the secondary/supplementary clipped reads as *secondary* rather than *supplementary*, for compatibility with some public software tools such as ``picard MarkDuplicates``.\n* The ``-t`` option is used for multi-threading and should not affect the result.\n\nSource files (v1.1.1\\_dcic\\_4):\n\n* Workflow: `https://data.4dnucleome.org/workflows/4DNWFMRGIPA1/`\n* CWL: `https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-fastq.cwl`\n", "name": "resources.data-analysis.imargi-processing-pipeline.fastq", "award": {"description": "DCIC: The goals of the 4D Nucleome (4DN) Data Coordination and Integration Center (DCIC) are to collect, store, curate, display, and analyze data generated in the 4DN Network. We have assembled a team of investigators, staff scientists, and developers with a strong track record in analysis of chromatin interaction data, image processing, data visualization, integrative analysis of genomic and epigenomic data, data portal development, large-scale computing, and development of secure and \ufb02exible cloud technologies. In the \ufb01rst phase of the 4DN Project, we have developed the 4DN Data Portal as a central resource with tools for data submission, curation, analysis and quality control, visualization, exploration, and download. The portal provides an easy-to-navigate interface for accessing raw and intermediate data \ufb01les, allows for programmatic access via APIs, and incorporates novel analysis and visualization tools developed by DCIC as well as other Network members. In the second phase of the 4DN Project, we will continue to support the research activities by the 4DN Network, and to lead the creation of a well curated 4DN data resource for the scienti\ufb01c community. At the same time, we propose to enhance the utility of the 4DN Scienti\ufb01c Data and the Data Portal in multiple ways: i. We will create a platform to integrate imaging and sequencing data and support the creating of reference nuclear maps in a common coordinate system; ii. We will provide support for 4DN Projects on Human Health and Disease with customized ontology applications and protected data management; iii. We will develop new cloud platform capabilities to bring user analyses to the 4DN Data Portal, and apply cost-ef\ufb01ciency improvements to support increasing data volumes; iv. We will perform regular outreach activities to raise awareness about the data and tools generated by the Network and DCIC. Overall, we will ensure that the data generated in 4DN will have maximal impact for the scienti\ufb01c community.", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "@type": ["Award", "Item"], "status": "current", "name": "2U01CA200059-06", "@id": "/awards/2U01CA200059-06/", "center_title": "DCIC - Park", "project": "4DN", "pi": {"error": "no view permissions"}, "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "Cleaning and Alignment", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.imargi-processing-pipeline.sources"], "options": {"filetype": "rst", "collapsible": false, "default_open": true, "convert_ext_links": true}, "date_created": "2021-10-04T14:36:12.617939+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2024-08-19T23:34:13.588295+00:00"}, "schema_version": "2", "@id": "/static-sections/f4b4589f-a631-4aba-8d2a-924aac55169f/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "f4b4589f-a631-4aba-8d2a-924aac55169f", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "Cleaning and Alignment", "external_references": [], "content": "In iMARGI experiments, two random bases initiate each RNA end read. Thus to improve mapping, R1 reads are cleaned using `seqtk` version 1.3. The command ::\n\n    seqtk trimfq -b 2\n\nremoves two bases (``-b 2``) from the left end of each read.\n\nReads are then mapped to the `GRCh38 <https://data.4dnucleome.org/files-reference/4DNFIZQZ39L9/>`_ (human) or `mm10 <https://data.4dnucleome.org/files-reference/4DNFI823LSI8/>`_ (mouse) reference genome using `bwa` version 0.7.17. In particular, we run: ::\n\n    bwa mem -t <nthreads> -SP5M <genome_index> <fastq1> <fastq2>\n\n* The ``-SP`` option is used to ensure the results are equivalent to that obtained by running ``bwa mem`` on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in ``bwa mem`` that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.\n* The ``-5`` option is used to report the 5' portion of chimeric alignments as the primary alignment. For chimeric alignments, ``bwa mem`` reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either 'supplementary' or 'secondary'.\n* The ``-M`` option is used to annotate the secondary/supplementary clipped reads as *secondary* rather than *supplementary*, for compatibility with some public software tools such as ``picard MarkDuplicates``.\n* The ``-t`` option is used for multi-threading and should not affect the result.\n\nSource files (v1.1.1\\_dcic\\_4):\n\n* Workflow: `https://data.4dnucleome.org/workflows/4DNWFMRGIPA1/`\n* CWL: `https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-fastq.cwl`\n", "filetype": "rst", "content_as_html": "<div class=\"rst-container\"><p>In iMARGI experiments, two random bases initiate each RNA end read. Thus to improve mapping, R1 reads are cleaned using <cite>seqtk</cite> version 1.3. The command</p><pre class=\"literal-block\">\nseqtk trimfq -b 2\n</pre><p>removes two bases (<code></code>) from the left end of each read.</p><p>Reads are then mapped to the <a class=\"reference external\" href=\"https://data.4dnucleome.org/files-reference/4DNFIZQZ39L9/\" rel=\"noopener noreferrer\" target=\"_blank\">GRCh38</a> (human) or <a class=\"reference external\" href=\"https://data.4dnucleome.org/files-reference/4DNFI823LSI8/\" rel=\"noopener noreferrer\" target=\"_blank\">mm10</a> (mouse) reference genome using <cite>bwa</cite> version 0.7.17. In particular, we run:</p><pre class=\"literal-block\">\nbwa mem -t &lt;nthreads&gt; -SP5M &lt;genome_index&gt; &lt;fastq1&gt; &lt;fastq2&gt;\n</pre><ul class=\"simple\"><li>The <code>-SP</code> option is used to ensure the results are equivalent to that obtained by running <code>bwa mem</code> on each mate separately, while retaining the right formatting for paired-end reads. This option skips a step in <code>bwa mem</code> that forces alignment of a poorly aligned read given an alignment of its mate with the assumption that the two mates are part of a single genomic segment.</li><li>The <code>-5</code> option is used to report the 5' portion of chimeric alignments as the primary alignment. For chimeric alignments, <code>bwa mem</code> reports two alignments: one of them is annotated as primary and soft-clipped, retaining the full-length of the original sequence. The other end is annotated as hard-clipped and marked as either 'supplementary' or 'secondary'.</li><li>The <code>-M</code> option is used to annotate the secondary/supplementary clipped reads as <em>secondary</em> rather than <em>supplementary</em>, for compatibility with some public software tools such as <code>picard MarkDuplicates</code>.</li><li>The <code>-t</code> option is used for multi-threading and should not affect the result.</li></ul><p>Source files (v1.1.1_dcic_4):</p><ul class=\"simple\"><li>Workflow: <cite>https://data.4dnucleome.org/workflows/4DNWFMRGIPA1/</cite></li><li>CWL: <cite>https://github.com/4dn-dcic/iMARGI-Docker/blob/v1.1.1_dcic_4/src/cwl/imargi-processing-fastq.cwl</cite></li></ul></div>", "@context": "/terms/", "aggregated-items": {}, "validation-errors": []}