-
Notifications
You must be signed in to change notification settings - Fork 83
Proposed Analysis: Copy number consensus calls #128
Comments
We will use additional methods to get CNV calls with a balanced sensitivity and specificity for the cohort. |
Are you planning on tackling this @jharenza and @xiehongbo? If so, I will mark as |
We have a pipeline, count me in as well! |
Yeah, we are tackling this. |
@fingerfen - great! What is the pipeline and what inputs do you need? We may have to set you up on CAVATICA to run this. |
let talk about it today during our meeting. |
Hi @jharenza @xiehongbo @fingerfen, Do you have an idea of when we should expect the first pull request for this issue? I am also wondering if we know what the format of the output of this analysis will be now that the two callers have different file formats. This information will help us in development for issues like #6 and #186. |
Hi @fingerfen and @xiehongbo - we were able to finish the data releases to include the new CNVkit and ControlFreeC files, so now you are able to submit a pull request with your analysis. Is it possible to do this next week? @fingerfen can you also list here the columns you will have in your final file for @jaclyn-taroni ? Thanks! |
We assume, that sample QC has been done by the sample noisy level.
|
I do not understand what this is referring to. Looking at the manuscript, I am not clear on what QC steps were performed on CNV calls, if any. Are there standard QC steps that should be added to the CNV results, and/or documented in the manuscript? Perhaps @jharenza or @yuankunzhu can provide some insight here? |
@jharenza I think we had this discussion before. Do you want to remove samples with extra high noisy levels, or keep every sample regardless. If we do want to remove samples with high noisy level, which "noisy sample" refers sample with high SD of Depth of coverage, we can do so. Otherwise, we can report CNVs from ALL samples. Up to your guys. I am fine with either way. |
@jashapiro when you recover CNVs from given samples, do you remove "noisy" samples? If so, what is your practice to do so? |
@hongboxie That makes sense. As I do not have the raw data, I can't see the raw coverage metrics, but after you brought up such QC, I went to look in the manuscript for details, and didn't find them mentioned. This is not my area of expertise, so I do not know what the standard practices are; I am just trying to understand the data as we work on some of the downstream analysis. |
@jashapiro no worries! I am learning this topic from everyone as well. I am open to any suggestions. |
@hongboxie sorry just reading this now... We discussed with @fingerfen to remove any samples if they showed whole genome-gain. That was a measure of inaccurate CN calling. I think we also chose to use a cutoff of >2500 segments for noise to follow what the arrays used, but I also may recall that ControlFreeC might not have smoothed their segments as CNVkit did (ie collapse multiple into one), so they may have a larger number of segments than expected and this cutoff may not be good. @yuankunzhu do you remember when we redid the ControlFreeC TSV file, if this is smoothed? No samples were removed when we provided the data. This was being done via #128. When I checked out the CNVkit seg, the sample call quality all looked reasonable in IGV, so this may only apply to controlFreeC. |
We are perplexed by the outcome of Manta. There are multiple CNVs overlapping the same region. Currently we decide to merge all CNVs into one single consent event. There are something strange about Manta somatic CNV output. |
We haven't had a chance to dig into the root of the problem. |
About Manta:
|
About consensus CNV:
|
Pipeline_Visual_Example_on_chr7.pptx @jharenza Attached is the ppt from Wednesday's presentation. Sorry for the delay. |
Documenting the outcomes of the in person meeting this afternoon (@jaclyn-taroni @jashapiro @hongboxie @fingerfen):
|
Scientific goals
What are the scientific goals of the analysis?
Create consensus calls from ControlFreeC and CNVKit
Proposed methods
What methods do you plan to use to accomplish the scientific goals?
Breakpoints will not perfectly overlap between algorithms, so the analyst will likely have to define a window for overlap of copy number alterations to deem consensus calls.
Required input data
What input data will you use for this analysis?
SEG files
Proposed timeline
What is the timeline for the analysis?
2 weeks
Relevant literature
If there is relevant scientific literature, put links to those items here.
The text was updated successfully, but these errors were encountered: