This repository has been archived by the owner on Jun 21, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 83
CNV consensus (6 of 6): Merge consensus files and name columns #403
Merged
jaclyn-taroni
merged 55 commits into
AlexsLemonade:master
from
nhatduongnn:merge_consensus_files
Jan 8, 2020
Merged
Changes from all commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
3f55855
add to Snakefile
9a923a3
resolve conflict
d38289c
Merge remote-tracking branch 'upstream/master'
3a20aa0
updating fork
76ebc97
add final consensus merging step
5cbad49
add column naming
4804bac
add column naming
217232c
add column naming
797bf53
change so that output files doesn't have extra indentation at the end
98714fa
change so that output files doesn't have extra indentation at the end
60a143f
Merge branch 'master' into merge_consensus_files
nhatduongnn 9467ad5
changed output path and name
b52693f
changed output path and name
9a3ed65
Update analyses/copy_number_consensus_call/Snakefile
nhatduongnn 7c2a454
Update analyses/copy_number_consensus_call/Snakefile
nhatduongnn e3ad43e
Update analyses/copy_number_consensus_call/Snakefile
nhatduongnn 8979e04
Update analyses/copy_number_consensus_call/Snakefile
nhatduongnn dd15b95
Update analyses/copy_number_consensus_call/src/scripts/compare_varian…
nhatduongnn 8a2e7aa
Update analyses/copy_number_consensus_call/src/scripts/compare_varian…
nhatduongnn be894fe
Update analyses/copy_number_consensus_call/Snakefile
nhatduongnn 96ddea1
changed Snakemake, reduced redundancy
ab76a58
changed Snakemake, reduced redundancy
0d05ad7
changed minor error
d8dc0e4
update Snakefile
d3fba93
add README.md
0e57d91
Merge branch 'master' into merge_consensus_files
nhatduongnn 20d5cff
Update README.md
nhatduongnn 3f7aabc
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 4899a5d
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 7c7f99b
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 7ca720c
Update analyses/copy_number_consensus_call/README.md
nhatduongnn e4525bd
Update analyses/copy_number_consensus_call/README.md
nhatduongnn f7ae54a
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 77f9b09
Update analyses/copy_number_consensus_call/README.md
nhatduongnn b88d54d
Update README.md
nhatduongnn 399f910
Update README.md
nhatduongnn cf979a5
Update README.md
nhatduongnn 93dcb9f
add result consensus file
cdb818f
make changes to analyses/README.md
ab1ec46
Update analyses/README.md
nhatduongnn 39d2f26
Update analyses/copy_number_consensus_call/README.md
nhatduongnn c50da32
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 2b5c5bd
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 4b504f2
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 97ed47f
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 08d610d
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 058be07
Update analyses/copy_number_consensus_call/Snakefile
nhatduongnn ac1b0e7
Update analyses/copy_number_consensus_call/README.md
nhatduongnn 5c3e70b
cut the last column
007599d
Merge branch 'merge_consensus_files' of https://github.com/fingerfen/…
61eb681
add the result file with out the last column
4e14e9d
Merge branch 'master' into merge_consensus_files
nhatduongnn 10586db
Merge branch 'master' into merge_consensus_files
nhatduongnn 52dc3e2
Merge branch 'master' into merge_consensus_files
nhatduongnn 9e35786
Merge branch 'master' into merge_consensus_files
nhatduongnn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Copy Number Consensus Call | ||
|
||
## Overview | ||
|
||
The PBTA data set contains CNVs called from different callers, ie. Manta, CNVkit, and Freec. | ||
The goal is to use all of these callers to reduce false positives and come up with a final consensus list of CNVs. | ||
This analysis uses information from the following files generated from the 3 callers | ||
|
||
* `pbta-cnv-cnvkit.seg.gz` | ||
* `pbta-cnv-controlfreec.tsv.gz` | ||
* `pbta-sv-manta.tsv.gz` | ||
|
||
## Running the pipeline | ||
|
||
To run the entire pipeline, make sure to have the latest release of the three input files mentioned in the Overview section. | ||
Go to OpenPBTA-analysis/analyses/copy_number_consensus_call and run `bash run_consensus_call.sh` | ||
|
||
## Methods | ||
|
||
This pipeline revolves around the use of Snakemake to run analysis for each patient sample. The overview of the steps are as followed: | ||
|
||
1) Parse through the 3 input files and put CNVs of the **same caller and sample** in the same files. | ||
2) Remove any sample/caller combination files with **more than 2500** CNVs called. | ||
We belive these to be noisy/poor quality samples (this came from what GISTIC uses as a cutoff for noisy samples). | ||
3) Create a `config_snakemake.yaml` that contains all of the samples names to run the Snakemake pipeline | ||
4) Run the Snakemake pipeline to perform analysis **per sample**. | ||
5) Filter for any CNVs that are over a certain **SIZE_CUTOFF** (default 3000 bp) | ||
6) Filter for any **significant** CNVs called by Freec (default pval = 0.01) | ||
7) Filter out any CNVs that overlap 50% or more with **IGLL, telomeric, centromeric, seg_dup regions** | ||
8) Merge any CNVs of the same sample and call method if they **overlap or within 10,000 bp** (We consider CNV calls within 10,000 bp the same CNV) | ||
9) Reformat the columns of the files (So the info are easier to read) | ||
10) **Call consensus** by comparing CNVs from 2 call methods at a time. | ||
|
||
Since there are 3 callers, there were 3 comparisons: `manta-cnvkit`, `manta-freec`, and `cnvkit-freec`. If a CNV from 1 caller **overlaps 50% or more** with at least 1 CNV from another caller, the common region of the overlapping CNV would be the new CONSENSUS CNV. | ||
|
||
11) **Sort and merge** the CNVs from the comparison pairs ,`manta-cnvkit` `manta-freec` `cnvkit-freec`, together into 1 file | ||
12) After every samples' consensus CNVs were called, **combine all merged files** from step 10 and output to `results/cnv_consensus.tsv` | ||
|
||
## Example Output File | ||
|
||
``` | ||
chrom start end manta_CNVs cnvkit_CNVs freec_CNVs CNV_type Biospecimen file_names | ||
chr11 771036 866778 NULL 770516:866778:3 771036:871536:3 DUP BS_007JTNB8 BS_007JTNB8.cnvkit_freec.dup.bed | ||
chr13 99966948 99991872 NULL 99954829:99994557:3 99966948:99991872:3 DUP BS_007JTNB8 BS_007JTNB8.cnvkit_freec.dup.bed | ||
chr14 103515996 103563240 NULL 103515996:103563363:3 103511784:103541532:3,103543140:103563240:3 DUP BS_007JTNB8 BS_007JTNB8.cnvkit_freec.dup.bed | ||
``` | ||
|
||
* The 1st line of the file is the header which contains the column names. There are 9 columns in total | ||
* The 2nd line is the first CNV of the file. | ||
* Column 1 is the **consensus** CNV chromosome | ||
* Column 2 is the **consensus** CNV start location | ||
* Column 3 is the **consensus** CNV end location | ||
* Columns 4, 5, and 6 contain the calls of Manta, CNVkit, and Freec that make up the **consensus** CNV described in columns 1, 2, and 3. | ||
* ie. If there is info in column 4, that means one or more CNVs called from Manta made up the current **consensus** CNV described in columns 1, 2, and 3. | ||
* Columns 4, 5, and 6 have the following format: `START:END:COPY_NUMBER,START:END:COPY_NUMBER` | ||
* Note that if there is more than one original CNV call corresponding to a given consensus CNV from a given caller, the information for each of the CNV calls will be comma separated. | ||
* In the example output above column 6 of line 4 contains `103511784:103541532:3,103543140:103563240:3` which means 2 CNVs called by FreeC helped to make up the **consensus** CNV on line 4. | ||
One has the start and end coordinates of `103511784:103541532` **on the same chromosome** and has a copy number of `3` and another has the coordinates `103543140:103563240` and has a copy number of `3`. | ||
* Column 7 is the CNVtype. This will be one of DUP or DEL, corresponding to duplications or deletions, respectively. Note that this does not describe the number of copies, only the direction of the copy number change. | ||
* Column 8 is the Sample name | ||
* Column 9 contains the name of of the two-caller consensus files (`manta-cnvkit` `manta-freec` `cnvkit-freec`) that made up the final **consensus** CNV. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest moving this section above the methods, as it is likely to be needed more often by people coming into the analysis/rerunning it.