Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manta FILTER==PASS and any cnv with 90% overlap #77

Merged
merged 7 commits into from
Aug 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions analyses/copy_number_consensus_call/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,14 @@ The per-sample pipeline revolves around the use of Snakemake to run analysis for
4) Run the Snakemake pipeline to perform analysis **per sample**.
5) Filter for any CNVs that are over a certain **SIZE_CUTOFF** (default 3000 bp)
6) Filter for any **significant** CNVs called by Freec (default pval = 0.01)
7) Filter out any CNVs that overlap 50% or more with **Immunoglobulin, telomeric, centromeric, seg_dup regions** as found in the file `ref/cnv_excluded.bed`
8) Merge any CNVs of the same sample and call method if they **overlap or within 10,000 bp** (We consider CNV calls within 10,000 bp the same CNV)
9) Reformat the columns of the files (So the info are easier to read)
10) **Call consensus** by comparing CNVs from 2 call methods at a time.

Since there are 3 callers, there were 3 comparisons: `manta-cnvkit`, `manta-freec`, and `cnvkit-freec`. If a CNV from 1 caller **overlaps 50% or more** with at least 1 CNV from another caller, the common region of the overlapping CNV would be the new CONSENSUS CNV.
7) Filter to keep manta calls that PASS all filters
8) Filter out any CNVs that overlap 50% or more with **Immunoglobulin, telomeric, centromeric, seg_dup regions** as found in the file `ref/cnv_excluded.bed`
9) Merge any CNVs of the same sample and call method if they **overlap or within 10,000 bp** (We consider CNV calls within 10,000 bp the same CNV)
10) Reformat the columns of the files (So the info are easier to read)
11) **Call consensus** by comparing CNVs from 2 call methods at a time.

Since there are 3 callers, there were 3 comparisons: `manta-cnvkit`, `manta-freec`, and `cnvkit-freec`. If a CNV from 1 caller **50% or more reciprocal overlaps** with at least 1 CNV from another caller,
**OR any CNV in 1** caller overlaps 90% or more in another to gather focal CNV calls; these common region of the overlapping CNV would be the new CONSENSUS CNV.

11) **Sort and merge** the CNVs from the comparison pairs ,`manta-cnvkit` `manta-freec` `cnvkit-freec`, together into 1 file
12) Resolve overlapping segments where duplications are embedded within larger deletion segments, or deletions within duplications.
Expand Down
5 changes: 2 additions & 3 deletions analyses/copy_number_consensus_call/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -98,14 +98,13 @@ rule manta_filter:
## the first awk also filters out for CNV length
## The sort command sorts the first digit of chromosome number numerically
## The last pipe is to introduce tab into the file and output file name.
"""awk '$6~/DEL/ {{if ($5 > {params.SIZE_CUTOFF}) {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
"""awk '$6~/DEL/ {{if ($5 > {params.SIZE_CUTOFF} && $11 == 'PASS') {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
""" | sort -k1,1 -k2,2n """
""" | tr [:blank:] '\t' > {output.manta_del} && """
"""awk '$6~/DUP/ {{if ($5 > {params.SIZE_CUTOFF}) {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
"""awk '$6~/DUP/ {{if ($5 > {params.SIZE_CUTOFF} && $11 == 'PASS') {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input} """
""" | sort -k1,1 -k2,2n """
""" | tr [:blank:] '\t' > {output.manta_dup}"""


rule generate_excluded:
# Combine the sets of regions that are not well called by CNV algorithms for exclusion
input:
Expand Down
Binary file not shown.
19,877 changes: 10,079 additions & 9,798 deletions analyses/copy_number_consensus_call/results/cnv_consensus.tsv

Large diffs are not rendered by default.

Loading