-
Notifications
You must be signed in to change notification settings - Fork 83
Fusion Filtering ("other" reading frame ) edits #553
Fusion Filtering ("other" reading frame ) edits #553
Conversation
@kgaonkar6 – checking my understanding – put a slightly different way, it's alright for putative oncogenic fusions to not be inframe or frameshift gene fusions, is that correct? |
Yes, because the reading-frame as in-frame/frameshift are only predictions from the algorithm and when predictions cannot be made they put "." which we reannotated as "other". So from my understanding the fusion in the oncogene can be true and like in the case IGH-MYC which is a known fusion ( @jharenza mentioned that it is known to be inframe/frameshift in literature) but StarFusion couldn't predict the frame for for sample. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reordering implemented here appears to match what is described. I would like to see the TODO added to the shell script before merging. I also had a comment about the handling of the inconsistent gene symbols, but I think that can be addressed in a future pull request.
@@ -6752,3 +6752,7 @@ | |||
"6751" "GAK" "PfamKinase" "Kinase" | |||
"6752" "PIK3C2B" "PfamKinase" "Kinase" | |||
"6753" "HUNK" "PfamKinase" "Kinase" | |||
"6754" "IGH@" "addedToallOnco_Feb2017.tsv" "Oncogene" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think handling this way is fine for now. However, the better solution/design decision is to deal with these atypical or inconsistent gene symbols in the standardization steps. I can imagine a situation where you have some reference file that essentially contains genes that need to be recoded and what they should be standardized to for each caller you support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing! Yes, I agree that would be a better idea going forward, I will add that as a future PR.
add TODO Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
@jaclyn-taroni I asked @kgaonkar6 to dig as to why the algorithms could not predict the frames. However, some of the fusions in this category are canonical fusions, for eg: |
@jaclyn-taroni and @jharenza from going through the code for arriba (STAR fusion seemed a little to complicated with Perl and multiple utils scripts etc. ) to me it looks like the reading frame is detected by looking for specific features of the pileup of aligned chimeric dna sequence and then predicted peptide. There seem to be many conditions in which the tools will not be able to detect the frame: First would be to identify if the any coding exons can be predicted between the transcript and breakpoint. If no protein coding region is detected the sequence is "." which means no frame information can be predicted as well. |
Purpose/implementation Section
Fusion filtering steps that affect the putative_oncogene fusions need to be updated through this PR because of the following issues:
What scientific question is your analysis addressing?
Identify all inframe/frameshift/other putative fusions that pass QC and expression based filtering.
What was your approach?
And then remove the "other" fusion while scavenging the fusions for recurrent non-oncogenic fusions here only:
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/155045684646a57920d684577db6148aaeec3b6d/analyses/fusion_filtering/04-project-specific-filtering.Rmd#L146
corrected the spelling in run_script here https://github.com/kgaonkar6/OpenPBTA-analysis/blob/155045684646a57920d684577db6148aaeec3b6d/analyses/fusion_filtering/run_fusion_merged.sh#L53
IGH-@,IGH@ , IGL-@ and IGL@ need to be added to reference list as oncogenic genes.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/add_other_fusion/analyses/fusion_filtering/references/genelistreference.txt
What GitHub issue does your pull request address?
Updated analysis: Fusion Filtering
#552
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
@jaclyn-taroni @jharenza
Which areas should receive a particularly close look?
Code review of project specific filtering because of the "other" inclusion for putative-oncogene fusions.
Is there anything that you want to discuss further?
Please review filtering process and it's implementation:
Putative Driver:
Filtering for general cancer specific genes ( after QC+expression_filteirng and removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs)
Fusions with genes in either onco from 02 script in columns Gene1A_anno,Gene1B_anno,Gene2A_anno,Gene2B_anno
Scavenge back filtered fusions to add to putative oncogenic fusions ( after QC+expression_filteirng removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs) :
In-frame/frameshift fusions is called in atleast 2 samples per histology OR
In-frame/frameshift fusions is called in atleast 2 callers
AND
Remove filtered-fusions found in more than 1 histology OR
Remove filtered-fusion with genes that have multi-fused gene (more than 5 times in sample)
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Results
What types of results are included (e.g., table, figure)?
results/
FilteredFusion.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv
What is your summary of the results?
4354 pbta-fusion-putative-oncogenic.tsv fusions
Also IGH-@--MYC is now being captured in pbta-fusion-putative-oncogenic.tsv which is a known fusion.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.