-
Notifications
You must be signed in to change notification settings - Fork 83
Planned Analysis: Integrated CNV and SV analyses and chromothripsis #27
Comments
After AlexsLemonade/OpenPBTA-manuscript#15 is approved and merged, can you write up the CNV methods and file a PR into that subsection so that we can link folks to the current version of the processing code? It may change in the future, but then we'll have an accurate manuscript-ready description of what was done. |
This machine learning publication may help us with CN true positives: |
Yes - will work on getting this filled in by the harmonization team. |
Integrated CNV and SV analyses and chromothripsis. The proposed analyses broadly addresses the prevalence and functional impact of structural variation across brain tumors. It is important to note that copy number variations are essentially a subset of structural variants and as such, both CNV and SV calls are highly overlapping and complementary and should be studied together. I am effectively proposing to merge #27 and #28 issues. In order to integrate CNV calls and SV calls we focus on breakpoint co-locallization, more details in the manuscript: https://www.biorxiv.org/content/10.1101/572248v3 Chromothripsis is a catastrophic one time event involving multiple breakpoints and rearrangements of localized regions in the genome. As opposed to chromoplexia, which involve gradually acquired structural variations. Chromothripsis can be identified by a pattern of oscillating copy number states and concomitant structural variants that allow walking through the newly formed chromosome. In practical terms, It can be identified as regions of abnormally high number of CNVs and SVs. The input format for developing downstream analyses are: CNV segmentation data: Allele specific CNV (optional; defining regions of LoH and allelic imbalance) SV calls file content: (already filtered by Somatic Score; no need to be annotated) Some proposed readouts and output analyses Structural variation.
Chromothripsis: Survival analyses (probably addressed in issue #18) |
merged #27 and #28 here per @gonzolgarcia's request |
Issue with lumpy data As I am trying to filter somatic SVs from the table I realized that the evidence columns "Tumor" and "Normal" are switched. In addition, there is no somatic score and haven't found much guidelines for somatic filtering of tumor/normal lumpy results. I will be considering this: arq5x/lumpy-sv#268 |
Thanks, @gonzolgarcia! You are right, the T/N columns are swapped - we will fix this in V5 release coming next week. |
The Yang Lab will perform analysis on chromothripsis. |
Note that there are two callers for CNV (cnvkit & controlfreek) and SV (manta & lumpy) |
@gonzolgarcia are you planning to generate SV consensus calls? |
Before getting a consensus, lumpy requires somatic filtering. It would be nice to have this added to next release |
@guru-yang - do you have any experience with somatic filtering of LUMPY SVs? The comment referred to here suggests the following:
|
We haven't used LUMPY at all. The filtering steps sounds reasonable. Based on my experience, Manta alone might be good enough for SV calling. |
l
You're probably right and manta alone + cnvkit should be enough for Shatterseek? |
Should be enough. |
Great! @guru-yang and @gonzolgarcia - you can plan to use Manta + CNVkit for Shatterseek and then we can work on a filtered lumpy data file for release in the next few weeks for general recurrent SV analysis. |
@guru-yang and @gonzolgarcia as an update, we are going to remove LUMPY from the release. SVTyper processing is very long per sample (>10 hours), and will require some benchmarking for filtering, which we have de-prioritized in favor of benchmarking copy number. You have both said Manta is fine, so we will drop it. We will have a data release with new CN results coming next week #146, so please let us know if you need help with creating PRs! |
@jharenza Thanks for the update. I am wondering how to get sample metadata. We are able to get gender, age at diagnose, tumor type from Kids First data portal. In order to perform survival analysis, age at last follow up would be needed. Do you know how to get that information? Are there any other information available for the patients, or their parents, such as smoking, alcohol consumption of the parents? |
@guru-yang : have you examined the metadata available in the files associated with this project? Once you do, could you file a new issue noting anything that's missing that you'd need for your analysis? Thanks! |
@cgreene I am able to find overall survival in pedcbioportal. Thanks. |
Hi @guru-yang - overall survival, gender, age at diagnosis, and tumor type are all available in the We need people to use that file when putting together their analyses because that ensures that different contributors that are working independently are using the same information across their analyses (e.g., the same overall survival values). If there are additional fields you would like to see in the |
@guru-yang as @jaclyn-taroni mentioned, the survival is in the provided histologies file in the data download. It is better to use this file, as we have further categorized tumors and provided additional data not in the KF portal. We do not have age at last followup in the file currently, but it can be added in the release due next week. Can you please file an issue for that? We have no parental information available, but if there are other things you would like to see from patients, you can also ask in an issue and I can check whether we have the info available. |
@jaclyn-taroni @jharenza I see. Thanks a lot. What about smoking and alcohol usage for the probands? I don't expect smokers in pediatric cohort. Just curious. |
@guru-yang : please file a new github issue with requests for metadata so that we can keep this issue, currently titled "Planned Analysis: Integrated CNV and SV analyses and chromothripsis" on that topic. Thanks! |
Hi @gonzolgarcia and @guru-yang! When do you think you will be able to file a pull request with either of your analyses? Thanks! |
@jharenza We have made some progress. Is there a regular conference call or similar to share results among the group? Or everything is through github? |
Hi @guru-yang, great to hear! We encourage you to file pull requests adding the code used to generate results as you have them. The analysis does not need to be complete before getting added to the repository. We have a pull request template with a section for summarizing results to facilitate discussion. You can join the Cancer Data Science Slack |
I will echo @jaclyn-taroni and @jharenza : please file pull requests adding code as you are writing it. It is much harder to integrate a large amount of code after it is entirely written. Thanks! |
@jharenza @jaclyn-taroni @cgreene Will try to do that soon. I am traveling this week. One quick question, we have seen quite some patients with more than one tumors sequenced. When working on variants, is there a particular strategy to handle these tumors? Such as randomly pick one? |
As of the v7 release, we now provide lists of independent specimens (one tumor per individual) that we would like analyses to use. These are randomly selected, as you suggest, but this allows everyone to use consistent sets. See the bottom of the Data Formats section of the README for descriptions of those files. |
I noticed in some samples the CNV calls from two algorithms are quite different. I wonder what's the plan going forward. It seems to me generating a consensus CNV call is not easy. |
Hi @guru-yang - have you taken a look at the copy number consensus issue: #128? |
Hello everyone, I wanted to apologize for my lack of contribution to this issue, which I proposed initially. Unfortunately the requirements of my new position at Mount Sinai have let me with very little time bandwidth. For the time being I cannot guaranty that I will contributing steadily to this issue. However, I'd be happy to provide support if still needed as I am working on developing new tools for the integrated analysis of CNVs and structural variations. Best regards to everyone. |
We have generated CNV output from ControlFreeC and CNVKit, but are seeking individuals to determine consensus focal calls and/or identify additional algorithms we can run to instill high confidence in focal CNV calls from the WGS dataset.
The text was updated successfully, but these errors were encountered: