{"lab": {"title": "4DN DCIC, HMS", "@type": ["Lab", "Item"], "correspondence": [{"contact_email": "cGV0ZXJfcGFya0BobXMuaGFydmFyZC5lZHU=", "@id": "/users/fb287a31-e765-41c5-8c1d-665f8e9f025b/", "display_title": "Peter Park"}], "uuid": "828cd4fe-ebb0-4b36-a94a-d2e3a36cc989", "@id": "/labs/4dn-dcic-lab/", "status": "current", "display_title": "4DN DCIC, HMS", "pi": {"error": "no view permissions"}, "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.lab_submitter", "submits_for.828cd4fe-ebb0-4b36-a94a-d2e3a36cc989"]}}, "body": "Alignments in ``bam`` format are sorted and duplicates marked with ``Picard`` version 2.20.7. This consists of two steps:\n\n* Sorting bams with ``SortSam`` with ``SORT_ORDER=coordinate`` to specify that the input should be sorted by coordinate (used in the second step).\n* Marking duplicates for removal with ``MarkDuplicates``.\n\nIn both steps, the flag ``VALIDATION_STRINGENCY=LENIENT`` is used to specify a more relaxed validation stringency (relative to ``STRICT``). Refer to the `Picard documentation <https://broadinstitute.github.io/picard/command-line-overview.html>`_ for further details.\n\nDuplicates are then removed using ``samtools`` version 1.9. Specifically, the command is: ::\n\n    samtools view -F 1024 -f 2 -b <input.bam>\n\n* `-F 1024` omits reads marked as PCR or optical duplicates\n* `-f 2` restricts the results to reads mapped in a proper pair\n\nThe alignments are converted into ``bedpe`` format using ``bedtools`` version 2.29.0: ::\n\n    bedtools bamtobed -i <input.bam> -bedpe > <input.bedpe>\n\nFinally, the files pass through a final set of cleaning and sorting recommended for peak calling with SEACR (see section below): ::\n\n    awk '$1==$4 && $1!=\".\" && $6-$2 < 1000 {print $0}' <input.bedpe> | cut -f 1-6 | sort -k1,1 -k2,2n -k3,3n\n\n* `awk '$1==$4` specifies that mates must be located on the same chromosome,\n* `$1!=\".\"` specifies that mates cannot be null,\n* `$6-$2 < 1000` specifies that mates' matched ends cannot be more than 1000 bases apart, and\n* `cut -f 1-6` retains only the first six columns of the bam file.\n", "name": "resources.data-analysis.cut-and-run-pipeline.filtering", "award": {"status": "current", "display_title": "4D NUCLEOME NETWORK DATA COORDINATION AND INTEGRATION CENTER - PHASE II", "name": "2U01CA200059-06", "description": "DCIC: The goals of the 4D Nucleome (4DN) Data Coordination and Integration Center (DCIC) are to collect, store, curate, display, and analyze data generated in the 4DN Network. We have assembled a team of investigators, staff scientists, and developers with a strong track record in analysis of chromatin interaction data, image processing, data visualization, integrative analysis of genomic and epigenomic data, data portal development, large-scale computing, and development of secure and \ufb02exible cloud technologies. In the \ufb01rst phase of the 4DN Project, we have developed the 4DN Data Portal as a central resource with tools for data submission, curation, analysis and quality control, visualization, exploration, and download. The portal provides an easy-to-navigate interface for accessing raw and intermediate data \ufb01les, allows for programmatic access via APIs, and incorporates novel analysis and visualization tools developed by DCIC as well as other Network members. In the second phase of the 4DN Project, we will continue to support the research activities by the 4DN Network, and to lead the creation of a well curated 4DN data resource for the scienti\ufb01c community. At the same time, we propose to enhance the utility of the 4DN Scienti\ufb01c Data and the Data Portal in multiple ways: i. We will create a platform to integrate imaging and sequencing data and support the creating of reference nuclear maps in a common coordinate system; ii. We will provide support for 4DN Projects on Human Health and Disease with customized ontology applications and protected data management; iii. We will develop new cloud platform capabilities to bring user analyses to the 4DN Data Portal, and apply cost-ef\ufb01ciency improvements to support increasing data volumes; iv. We will perform regular outreach activities to raise awareness about the data and tools generated by the Network and DCIC. Overall, we will ensure that the data generated in 4DN will have maximal impact for the scienti\ufb01c community.", "uuid": "71171a4e-dca1-44cb-8375-fafd896c6923", "@type": ["Award", "Item"], "@id": "/awards/2U01CA200059-06/", "center_title": "DCIC - Park", "project": "4DN", "pi": {"error": "no view permissions"}, "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin"]}}, "title": "Filtering", "status": "released", "aliases": ["4dn-dcic-lab:resources.data-analysis.cut-and-run-pipeline.filtering"], "options": {"filetype": "rst", "collapsible": false, "default_open": true, "convert_ext_links": true}, "date_created": "2021-10-06T20:17:22.831092+00:00", "section_type": "Page Section", "submitted_by": {"error": "no view permissions"}, "last_modified": {"modified_by": {"error": "no view permissions"}, "date_modified": "2024-08-19T23:28:21.788461+00:00"}, "schema_version": "2", "@id": "/static-sections/b00f516f-96ac-4a60-bc37-90a0652591f6/", "@type": ["StaticSection", "UserContent", "Item"], "uuid": "b00f516f-96ac-4a60-bc37-90a0652591f6", "principals_allowed": {"view": ["system.Everyone"], "edit": ["group.admin", "role.owner", "userid.545f1931-792c-4a7e-83b3-3e91baea4e30"]}, "display_title": "Filtering", "external_references": [], "content": "Alignments in ``bam`` format are sorted and duplicates marked with ``Picard`` version 2.20.7. This consists of two steps:\n\n* Sorting bams with ``SortSam`` with ``SORT_ORDER=coordinate`` to specify that the input should be sorted by coordinate (used in the second step).\n* Marking duplicates for removal with ``MarkDuplicates``.\n\nIn both steps, the flag ``VALIDATION_STRINGENCY=LENIENT`` is used to specify a more relaxed validation stringency (relative to ``STRICT``). Refer to the `Picard documentation <https://broadinstitute.github.io/picard/command-line-overview.html>`_ for further details.\n\nDuplicates are then removed using ``samtools`` version 1.9. Specifically, the command is: ::\n\n    samtools view -F 1024 -f 2 -b <input.bam>\n\n* `-F 1024` omits reads marked as PCR or optical duplicates\n* `-f 2` restricts the results to reads mapped in a proper pair\n\nThe alignments are converted into ``bedpe`` format using ``bedtools`` version 2.29.0: ::\n\n    bedtools bamtobed -i <input.bam> -bedpe > <input.bedpe>\n\nFinally, the files pass through a final set of cleaning and sorting recommended for peak calling with SEACR (see section below): ::\n\n    awk '$1==$4 && $1!=\".\" && $6-$2 < 1000 {print $0}' <input.bedpe> | cut -f 1-6 | sort -k1,1 -k2,2n -k3,3n\n\n* `awk '$1==$4` specifies that mates must be located on the same chromosome,\n* `$1!=\".\"` specifies that mates cannot be null,\n* `$6-$2 < 1000` specifies that mates' matched ends cannot be more than 1000 bases apart, and\n* `cut -f 1-6` retains only the first six columns of the bam file.\n", "filetype": "rst", "content_as_html": "<div class=\"rst-container\"><p>Alignments in <code>bam</code> format are sorted and duplicates marked with <code>Picard</code> version 2.20.7. This consists of two steps:</p><ul class=\"simple\"><li>Sorting bams with <code>SortSam</code> with <code>SORT_ORDER=coordinate</code> to specify that the input should be sorted by coordinate (used in the second step).</li><li>Marking duplicates for removal with <code>MarkDuplicates</code>.</li></ul><p>In both steps, the flag <code>VALIDATION_STRINGENCY=LENIENT</code> is used to specify a more relaxed validation stringency (relative to <code>STRICT</code>). Refer to the <a class=\"reference external\" href=\"https://broadinstitute.github.io/picard/command-line-overview.html\" rel=\"noopener noreferrer\" target=\"_blank\">Picard documentation</a> for further details.</p><p>Duplicates are then removed using <code>samtools</code> version 1.9. Specifically, the command is:</p><pre class=\"literal-block\">\nsamtools view -F 1024 -f 2 -b &lt;input.bam&gt;\n</pre><ul class=\"simple\"><li><cite>-F 1024</cite> omits reads marked as PCR or optical duplicates</li><li><cite>-f 2</cite> restricts the results to reads mapped in a proper pair</li></ul><p>The alignments are converted into <code>bedpe</code> format using <code>bedtools</code> version 2.29.0:</p><pre class=\"literal-block\">\nbedtools bamtobed -i &lt;input.bam&gt; -bedpe &gt; &lt;input.bedpe&gt;\n</pre><p>Finally, the files pass through a final set of cleaning and sorting recommended for peak calling with SEACR (see section below):</p><pre class=\"literal-block\">\nawk '$1==$4 &amp;&amp; $1!=\".\" &amp;&amp; $6-$2 &lt; 1000 {print $0}' &lt;input.bedpe&gt; | cut -f 1-6 | sort -k1,1 -k2,2n -k3,3n\n</pre><ul class=\"simple\"><li><cite>awk '$1==$4</cite> specifies that mates must be located on the same chromosome,</li><li><cite>$1!=\".\"</cite> specifies that mates cannot be null,</li><li><cite>$6-$2 &lt; 1000</cite> specifies that mates' matched ends cannot be more than 1000 bases apart, and</li><li><cite>cut -f 1-6</cite> retains only the first six columns of the bam file.</li></ul></div>", "@context": "/terms/", "aggregated-items": {}, "validation-errors": []}