-
Notifications
You must be signed in to change notification settings - Fork 83
Proposed Analysis: add scavenging of cancer hotspots to consensus SNV calls #819
Comments
I think this is a good idea, but I would keep it as a separate analysis from the general consensus. In other words, I would not "scavenge" back mutations into the consensus, but rather include an entirely separate analysis that evaluates known mutations. This would keep the standards clear and separate de novo analysis from analysis with outside influence. |
Ok - yeah I went back and forth on that. Thanks! |
@kgaonkar6, after our internal discussion, I think we need to first determine our hotspot list:
If that makes sense, I think that can be the first PR for this series. Thanks@ |
We are having a call on Thursday Jan 28 with David Wheeler (St Jude, formerly BCM) who has done this sort of thing while leading the BCM Genomics Lab. We might also want to add pediatric0-specific genes such as those from Ma, 2018 and Grobner, 2018 |
Don't think there are annotations in maf format to filter using the information in the paper describing the TERT promoter variant, should I use other filtering the exact genomic site to capture? I believe chr5 | 1295113 | 1295113 which is also annotated as existing_variant rs1242535815,COSM1716563,COSM1716558 which is 66bp away from TSS is what we are looking for corresponding to C228T. and chr5 | 1295135 | 1295135 | is 88 bp away from TSS is the COSM1716559 variant which corresponds to C250T promoter variant. From my google searches :D I checked strelka for upstream variants as a check and we have both these sites (along with others) :
|
We still want to filter by IMPACT == 'HIGH|MODERATE|MODIFIER' to remove any LOW impact mutations ( like silent mutations) in the given amino acid position in hotspot database, right? |
Are you saying there are low impact mutations on the MSK list? I would assume they would not be low. |
This looks right to me, and nucleotides are reversed because TERT is on the reverse strand. So, I think we should use the genomic coordinates here + nucleotides. |
There were a few instances that the hotspot amino acid site had |
* recurrence strelka * n>-2 * add Protein_position * combined snv * snv-recurrence * re-run filter more than 2 * removeing old folder * removing unused functions * add a readme * Update README.md * combine types * combine types * uniq * Update README.md * Update README.md * Update README.md * comment edits * update brain-goi * updating cols to use * adding plots * add uniq hits plots * added dbSNP_RS * adding upset plots for each type of calls * update to snv-caller path * adding comments for maf creation * remove swp files * adding upset function * independent samples in recurrence * adding Ref_Allele Tumor_Allele to recurrence * Update analyses/hotspots-detection/01-reccurence-hotspot-overlap.Rmd Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com> * Update analyses/hotspots-detection/01-reccurence-hotspot-overlap.Rmd Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com> * Update analyses/hotspots-detection/01-reccurence-hotspot-overlap.Rmd Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com> * combine and filter * re-dp script 1 * filter for known hotspots only * Delete brain-goi-list-new.txt * add html * add images * add comments * adding all maf columns in filtered file * run script * add per caller filters * add per caller filters * Delete 01-combine-maf.Rmd * Delete 01-combine-maf.nb.html * Delete combined_maf_hotspots.RDS * adding vardict subset * files un-committed * Update analyses/hotspots-detection/README.md Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/README.md Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/prepMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/prepMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/prepMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/prepMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/run_overlaps_hotspot.sh Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/run_overlaps_hotspot.sh Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/run_overlaps_hotspot.sh Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/00-subset-maf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * updates * add just strelka * update grep genes before R filtering * styling * removing genes as input param * subsetByOverlap * read_tsv seems to assign the columns accurately compared to fread * add to ci * add task name * Update README.md * splice and indel * adding indels with ? and filter for canonical transcripts * fixing fail because updated file was not committed * Update analyses/hotspots-detection/utils/filterMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/filterMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/run_overlaps_hotspot.sh Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/filterMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/00-subset-maf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * description update * asuggested changes from review;comments update * add error if MSKCC hotspot is not complete * uniq gene list * Update analyses/hotspots-detection/utils/filterMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> * Update analyses/hotspots-detection/utils/filterMaf.R Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org> Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com> Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org>
* update col types * script to add col types Co-authored-by: jashapiro <josh.shapiro@ccdatalab.org>
Closed with #819 |
What analysis are you proposing and why?
Create a new MAF which contains consensus SNV calls from consensus snv calling and cancer hotspot calls missed by consensus, as noted below.
We previously noticed that by taking a 3/3 approach for consensus calls, we are inevitably missing some cancer hotspot mutations. We got around that for one specific cancer (DMGs) because we have clinical reports containing histone variant calls that we can add into molecular subtyping pathology module (#735 and #751). However, we are likely still missing some cancer hotspot mutations and I propose that we add a final step in which we scavenge back cancer hotspot mutations using a well-curated and downloadable list of these.
What changes need to be made? Please provide enough detail for another participant to make the update.
The next step would be to assess if any of these hotspot mutations are being missed using a 3/3 method and then determining a set of rules for adding these mutations back to the consensus SNV file. For example:
Perhaps the new file can be called
pbta-consensus-snvs-plus-hotspot.maf.gz
What input data should be used? Which data were used in the version being updated?
Cancer hotspots table, downloadable here: https://www.cancerhotspots.org/#/download plus TERT promoter mutations, noted from this paper.
When do you expect the analysis will be completed?
not sure
Who will complete the updated analysis?
@migbro@kgaonkar6The text was updated successfully, but these errors were encountered: