-
Notifications
You must be signed in to change notification settings - Fork 83
Proposed Analysis: Generate hotspot and hot region lists #932
Comments
In addition to Oncogene and TSGs from the genereferencelist.tsv we will also add brain gene list https://chopri.box.com/s/tqfq8whojsgg6htoz4begrgmpqyy3adt. |
Adding filter for independent samples while checking for recurrence from today's meeting: |
Just wanted to clarify for the kinase domain (it seems the same also applies to TSG gene body region mutations ) do we want to only retain sites that are either not in Cosmic Census and MSKCC hotspot as outputs so that these novel sites can be added to the hotspot list? |
For these "regions", I do not envision them being added to the hotspot list because I think there will be a lot of uniqueness here, but rather we capture those for any sample. For instance, if we see a non-canonical kinase domain mutation in sample X, but it is only in 2/3 callers, we should scavenge that back. Does that make sense? |
Also adding here that @adamcresnick suggests we add to the |
but do we want to keep any mutation in domain region to scavenge back or only scavenge back novel (sites not overlapping MSKCC and gene not in Cosmic Census gene)? Because it seems to me, we would already use the MSKCC and Cosmic Census gene list as hotspot to scavenge back any mutations. |
Correct, you can make it a novel set, but my point was that I don't think we need to make a list, but rather a region in which we find deleterious mutations that, if not captured in previous steps (ie MSKCC/Cosmic), we would scavenge. |
I am going to close this issue in favor of simplifying to the original issue here #819 - determining novel recurrent hotspots may be beyond the scope of this manuscript and require some bench validation since most of the VAFs are very low. |
What are the scientific goals of the analysis?
Generate hotspot and hot region file for #819
What methods do you plan to use to accomplish the scientific goals?
After some discussion with David Wheeler (formerly, BCM, now St. Jude), we have decided to derive a cancer hotspots (oncogenes) and hot regions file with which to perform recovery of potentially missed oncogenic DNA alterations. The rationale behind this is that frequently, many oncogenes are mutated recurrently at the same nucleotide(s), whereas tumor suppressor genes (TSG) frequently can have a non-recurrent mutation in a functional domain or a truncated alteration, leading to inactivation of the gene. In addition, kinase genes often may have recurrent or non-recurrent alterations within the kinase domain, leading to activation of the kinase.
We will first search for deleterious recurrent point mutations
within the OpenPBTA cohort (N>=2) within the annotated oncogene and TSG lists from
fusion_filtering
here to determine whether we are missing any pediatric brain-specific hotspots from the cancer hotspots v2 file found here. We will inspect those and potentially add them to the list of hotspots.For hot regions, we will create a BED file of all kinase domain coordinates in which to scavenge back deleterious alterations.
For TSGs, we will include the gene as the "region" and scavenge back deleterious alterations that occur within these genes. We may either add these to the region list or just add this as a step in #819.
What input data are required for this analysis?
How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?
1 week, yes
Who will complete the analysis (please add a GitHub handle here if relevant)?
@kgaonkar6
What relevant scientific literature relates to this analysis?
Chang, et. al, 2016 and 2017
The text was updated successfully, but these errors were encountered: