Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manta FILTER==PASS and any cnv with 90% overlap #77

Merged
merged 7 commits into from
Aug 11, 2021

Conversation

kgaonkar6
Copy link

@kgaonkar6 kgaonkar6 commented Aug 10, 2021

Purpose/implementation Section

What scientific question is your analysis addressing?

  • Update scripts/merged_to_individual_files.py to create the Snakemake file and run the module only for tutor WGS and generate 2/3 cnv caller consensus calls

  • Adding a FILTER==PASS for manta calls used in copy_number_consensus_call. I think this filter is necessary so that we use a subset of high-confidence broad SVs called by manta.

  • After investigation of consensus call with only Manta FILTER==PASS we see the CNV alterations n controlfreec and cnkit .
    We believe criteria for cnvs to have 50% reciprocal overlap might be too stringent. After the Update manta FILTER=='PASS' Part3 : oncoprint rerun  AlexsLemonade/OpenPBTA-analysis#1116 we are missing some subtype defining CNVs. For example chr19 amplification in BS_K07KNTFY is seen in both controlfreec and cnvkit but missed out of consensus calls because cnvkit region is 11% of controlfreec region so we want to expand the overlap to include any CNV that overlaps 90% or more of a CNV in other caller.

BS_K07KNTFY.cnvkit.dup.filtered3.bed: chr19	54138551	54427104
BS_K07KNTFY.freec.dup.filtered3.bed:  chr19	53641020	56141391

What was your approach?

Added

awk '$6~/DEL/ {{if ($5 > {params.SIZE_CUTOFF} && $11 == 'PASS') {{print "chr"$2,$3,$4,$5,"NA","NA","NA",$6}}}}' {input}

since 11th column is FILTER

AND add the logic below to include any CNV that has 90% or more overlap with a CNV in another caller

if (coverage_list1 >= 0.5 and coverage_list2 >= 0.5) or (coverage_list1 >=0.9 and coverage_list2 >0 ) or (coverage_list1>0 and coverage_list2 >=0.9):

What GitHub issue does your pull request address?

d3b-center/ticket-tracker-OPC#151
d3b-center/ticket-tracker-OPC#152

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Does the update in scripts/merged_to_individual_files.py look right? We want to make the consensus cnv module only for WGS for OT I believe but handling WXS can be added as an update in another PR.

Is there anything that you want to discuss further?

NA

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

This summary is specific to PBTA data
Broad SVs calls from manta that don't have FILTER=='PASS' are removed.
The manta FILTER columns are distributed as :

                          MaxDepth 
                               2663 
                MaxDepth;MaxMQ0Frac 
                                636 
MaxDepth;MaxMQ0Frac;MinSomaticScore 
                                931 
           MaxDepth;MinSomaticScore 
                               7189 
                         MaxMQ0Frac 
                                752 
         MaxMQ0Frac;MinSomaticScore 
                               1954 
                    MinSomaticScore 
                             251720 
                               PASS 
                             108129 

Since looking at some deep deletion in chrX discrepancy lead to this investigation I'm pointing out the differences in deep deletions in consensus cnv in this PR compared to master

# consensus_cnv in this updated PR
> consensus_cnv[which(consensus_cnv$copy.num=="0"),"chrom"] %>% table()

 chr1 chr10 chr11 chr13 chr14 chr17  chr2 chr22  chr4  chr6 
    5     4     1     1     1     8    13     4     3     1 
 chr7  chr8  chr9  chrX  chrY 
    2     2     7     3    36 

# consensus_cnv from master
> consensus_cnv_master[which(consensus_cnv_master$copy.num=="0"),"chrom"] %>% table()
.
 chr1 chr10 chr11 chr13 chr14 chr16 chr17  chr2 chr22  chr3 
    6     6     1     1     1     2     8    14     6     1 
 chr4  chr5  chr6  chr7  chr8  chr9  chrX  chrY 
    4     3     4     2     2     8    44    36 

We see comparable calls in both versions of consensus cnv calls in all chromosome except chrX. Majority of the deep deletions in chrX are removed if we only filter for calls that PASS all filters in manta. Suggesting, that majority of the chrX deep deletions in the current master branch are low confidence calls.

## Difference in CNV calls latest consensus cnv seg
> change_log_in_latest$copy.num %>% table()
.
   0    1    2    3    4    5    6    7    8   12   20 
  71 1177  342 1852  425  313   92   23    9    3    1 

> change_log_in_latest %>% mutate(size = abs(loc.start-loc.end)) %>% select(size) %>% summary() 
      size          
 Min.   :        1  
 1st Qu.:    75456  
 Median :   681020  
 Mean   :  6754822  
 3rd Qu.:  5921025  
 Max.   :158482914  

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@kgaonkar6 kgaonkar6 requested review from jharenza and logstar August 11, 2021 17:57
@jharenza jharenza removed the request for review from logstar August 11, 2021 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants