Part1: Freec as default and neutral NA #1066

kgaonkar6 · 2021-05-12T12:32:58Z

⚠️ This is a re-do of #987

Purpose/implementation Section

What scientific question is your analysis addressing?

According to Bo's rationale since ploidy is based on ControlFREEC using its corresponding CN would probably result in a better estimate of overall CN.

What was your approach?

Just a change in

OpenPBTA-analysis/analyses/copy_number_consensus_call/scripts/bed_to_segfile.R

Lines 119 to 128 in 2c1f5fa

    
           # Calculate summary stats from merged CNV calls. \ 
        
           cnvs <- cnvs %>% 
        
             dplyr::mutate(cnvkit_df = purrr::map(cnvkit_CNVs, segstrings_to_df), 
        
                           freec_df = purrr::map(freec_CNVs, segstrings_to_df), 
        
                           segmean = purrr::map_dbl(cnvkit_df, segmean_function), 
        
                           cnvkit_cn = purrr::map_dbl(cnvkit_df, copies_wmedian), 
        
                           freec_cn = purrr::map_dbl(freec_df, copies_wmedian), 
        
                           copynum = ifelse(is.finite(cnvkit_cn), # use cnvkit if available 
        
                                            cnvkit_cn, freec_cn), #otherwise use freec 
        
                           num.mark = NA)

to use controlfreec if available if not use cnvkit:

                copynum = ifelse(is.finite(freec_cn), # use freec if available
                                 freec_cn, cnvkit_cn), #otherwise use cnvkit

What GitHub issue does your pull request address?

#964

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

NA, it is just a 1 line code update

Is there anything that you want to discuss further?

Note: I have not added any downstream modules ( like I did in #987) in this PR since those downstream analysis are just re-runs and can come in after this PR is merged to master.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

tables

What is your summary of the results?

Here I'm comparing ( consensus_seg_annotated_cn_autosomes.tsv.gz + consensus_seg_annotated_cn_x_and_y.tsv.gz ) from the following versions of the files

controlfreec (default for copy number ) + neutral calls as NA
cnvkit (default for copy number) + neutral calls as NA
and v18 release

The changes between using controlfreec and cnvkit are mainly as follows:

BS_ZSH09N84 and BS_CBMAWSAR are initial samples from the same participant PT_MNSEJCDM, controlfreec calls EGFR and MET gains in both samples, cnvkit only calls in BS_ZSH09N84
BS_JRFVST47 and BS_3Z40EZHD are initial samples for PT_9GKVQ9QS, controlfreec calls MET gain in both samples , cnvkit only calls for BS_3Z40EZHD
Other than the above calls being missed, we also see some copy change level changes in KIT,KDR and PDGFRA calls where ,controlfree calls amplification for BS_JRFVST47 and cnvkit shows as gains; for BS_3Z40EZHD cnvkit shows as amplification and controlfreec shows as gains.

My underdstanding is controlfreec helps identifying consistent calls for multiple samples from the same participant ids where as cnvkit seems to be missing the call for 1 of the 2 multiple samples in the participants mentioned above.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jaclyn-taroni

👍🏻 approving on the basis of discussion #1010 + the code does what is described in #964 and #1010! (I did not look at the SEG file.)

kgaonkar6 · 2021-05-12T12:36:57Z

I have not added any downstream modules ( like I did in #987) in this PR since those downstream analysis are just re-runs and can come in after this PR is merged to master.

Ubuntu and others added 2 commits May 12, 2021 03:04

freec as default and neutral NA

ff2e952

Update README.md

777af34

kgaonkar6 requested review from jharenza and jaclyn-taroni May 12, 2021 12:33

jaclyn-taroni approved these changes May 12, 2021

View reviewed changes

jaclyn-taroni added the merge next label May 12, 2021

kgaonkar6 mentioned this pull request May 12, 2021

Part2: Freec as default and neutral NA update to focal-cn-file-preparation #1067

Merged

5 tasks

jaclyn-taroni merged commit 9afff4a into AlexsLemonade:master May 12, 2021

This was referenced May 13, 2021

Updated analysis: Neutral region called as losses when compared to ploidy #1010

Closed

Updated analysis: Update CN consensus calls to use ControlFREEC CN as default? #964

Closed

kgaonkar6 deleted the cnv-update branch May 13, 2021 17:18

kgaonkar6 mentioned this pull request Aug 5, 2021

v20 CNV update part4 : Rerun gistic and molecular subtyping v20 #1127

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part1: Freec as default and neutral NA #1066

Part1: Freec as default and neutral NA #1066

kgaonkar6 commented May 12, 2021 •

edited

Loading

jaclyn-taroni left a comment

kgaonkar6 commented May 12, 2021

	# Calculate summary stats from merged CNV calls. \
	cnvs <- cnvs %>%
	dplyr::mutate(cnvkit_df = purrr::map(cnvkit_CNVs, segstrings_to_df),
	freec_df = purrr::map(freec_CNVs, segstrings_to_df),
	segmean = purrr::map_dbl(cnvkit_df, segmean_function),
	cnvkit_cn = purrr::map_dbl(cnvkit_df, copies_wmedian),
	freec_cn = purrr::map_dbl(freec_df, copies_wmedian),
	copynum = ifelse(is.finite(cnvkit_cn), # use cnvkit if available
	cnvkit_cn, freec_cn), #otherwise use freec
	num.mark = NA)

Part1: Freec as default and neutral NA #1066

Part1: Freec as default and neutral NA #1066

Conversation

kgaonkar6 commented May 12, 2021 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni left a comment

Choose a reason for hiding this comment

kgaonkar6 commented May 12, 2021

kgaonkar6 commented May 12, 2021 •

edited

Loading