Fusion Filtering ("other" reading frame ) edits #553

kgaonkar6 · 2020-02-21T14:34:18Z

Purpose/implementation Section

Fusion filtering steps that affect the putative_oncogene fusions need to be updated through this PR because of the following issues:

During project specific filtering I mistakenly removed "other" fusions while filtering on column Fusion_Type for putative-oncogene fusion list.
There is also a spelling mistake in the run_script
IGH-@,IGH@ , IGL-@ and IGL@ need to be added to reference list as oncogenic genes.

What scientific question is your analysis addressing?

Identify all inframe/frameshift/other putative fusions that pass QC and expression based filtering.

What was your approach?

I re-ordered the filtering for putative-oncogene fusions to this chunk https://github.com/kgaonkar6/OpenPBTA-analysis/blob/155045684646a57920d684577db6148aaeec3b6d/analyses/fusion_filtering/04-project-specific-filtering.Rmd#L110

And then remove the "other" fusion while scavenging the fusions for recurrent non-oncogenic fusions here only:
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/155045684646a57920d684577db6148aaeec3b6d/analyses/fusion_filtering/04-project-specific-filtering.Rmd#L146

corrected the spelling in run_script here https://github.com/kgaonkar6/OpenPBTA-analysis/blob/155045684646a57920d684577db6148aaeec3b6d/analyses/fusion_filtering/run_fusion_merged.sh#L53
IGH-@,IGH@ , IGL-@ and IGL@ need to be added to reference list as oncogenic genes.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/add_other_fusion/analyses/fusion_filtering/references/genelistreference.txt

What GitHub issue does your pull request address?

Updated analysis: Fusion Filtering
#552

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

@jaclyn-taroni @jharenza

Which areas should receive a particularly close look?

Code review of project specific filtering because of the "other" inclusion for putative-oncogene fusions.

Is there anything that you want to discuss further?

Please review filtering process and it's implementation:

Putative Driver:

Filtering for general cancer specific genes ( after QC+expression_filteirng and removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs)
Fusions with genes in either onco from 02 script in columns Gene1A_anno,Gene1B_anno,Gene2A_anno,Gene2B_anno

Scavenge back filtered fusions to add to putative oncogenic fusions ( after QC+expression_filteirng removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs) :

In-frame/frameshift fusions is called in atleast 2 samples per histology OR
In-frame/frameshift fusions is called in atleast 2 callers
AND
Remove filtered-fusions found in more than 1 histology OR
Remove filtered-fusion with genes that have multi-fused gene (more than 5 times in sample)

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

results/
FilteredFusion.tsv
pbta-fusion-putative-oncogenic.tsv
pbta-fusion-recurrent-fusion-byhistology.tsv
pbta-fusion-recurrent-fusion-bysample.tsv
pbta-fusion-recurrently-fused-genes-byhistology.tsv
pbta-fusion-recurrently-fused-genes-bysample.tsv

What is your summary of the results?

4354 pbta-fusion-putative-oncogenic.tsv fusions
Also IGH-@--MYC is now being captured in pbta-fusion-putative-oncogenic.tsv which is a known fusion.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jaclyn-taroni · 2020-02-21T15:56:17Z

@kgaonkar6 – checking my understanding – put a slightly different way, it's alright for putative oncogenic fusions to not be inframe or frameshift gene fusions, is that correct?

kgaonkar6 · 2020-02-21T16:05:01Z

Yes, because the reading-frame as in-frame/frameshift are only predictions from the algorithm and when predictions cannot be made they put "." which we reannotated as "other". So from my understanding the fusion in the oncogene can be true and like in the case IGH-MYC which is a known fusion ( @jharenza mentioned that it is known to be inframe/frameshift in literature) but StarFusion couldn't predict the frame for for sample.

jaclyn-taroni

The reordering implemented here appears to match what is described. I would like to see the TODO added to the shell script before merging. I also had a comment about the handling of the inconsistent gene symbols, but I think that can be addressed in a future pull request.

analyses/fusion_filtering/run_fusion_merged.sh

jaclyn-taroni · 2020-02-21T16:32:11Z

analyses/fusion_filtering/references/genelistreference.txt

@@ -6752,3 +6752,7 @@
 "6751"	"GAK"	"PfamKinase"	"Kinase"
 "6752"	"PIK3C2B"	"PfamKinase"	"Kinase"
 "6753"	"HUNK"	"PfamKinase"	"Kinase"
+"6754"	"IGH@"	"addedToallOnco_Feb2017.tsv"	"Oncogene"


I think handling this way is fine for now. However, the better solution/design decision is to deal with these atypical or inconsistent gene symbols in the standardization steps. I can imagine a situation where you have some reference file that essentially contains genes that need to be recoded and what they should be standardized to for each caller you support.

Thanks for reviewing! Yes, I agree that would be a better idea going forward, I will add that as a future PR.

add TODO Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

jharenza · 2020-02-21T21:12:53Z

@jaclyn-taroni I asked @kgaonkar6 to dig as to why the algorithms could not predict the frames. However, some of the fusions in this category are canonical fusions, for eg: IGH-MYC, and in some cases are either inframe or frameshift but still oncogenic. This came up in our lymphoma sample, and this fusion is canonical in certain leukemias and lymphomas. I am imagining that if we ran additional fusion algorithms, we might have a frame assigned, so not sure why these are the way they are.

kgaonkar6 · 2020-02-24T15:28:49Z

@jaclyn-taroni and @jharenza from going through the code for arriba (STAR fusion seemed a little to complicated with Perl and multiple utils scripts etc. ) to me it looks like the reading frame is detected by looking for specific features of the pileup of aligned chimeric dna sequence and then predicted peptide. There seem to be many conditions in which the tools will not be able to detect the frame:

First would be to identify if the any coding exons can be predicted between the transcript and breakpoint. If no protein coding region is detected the sequence is "." which means no frame information can be predicted as well.
https://github.com/suhrig/arriba/blob/ca1d40b0575e958243fe2e7fd28acd54de038349/source/output_fusions.cpp#L201
If it a 5' gene cannot be predicted then the frame also cannot be detected because the tool looks for a start codon in the 5' to predict frame in peptide sequence
https://github.com/suhrig/arriba/blob/ca1d40b0575e958243fe2e7fd28acd54de038349/source/output_fusions.cpp#L803
If the breakpoints cannot be determined on contigs of the reference assembly:
https://github.com/suhrig/arriba/blob/ca1d40b0575e958243fe2e7fd28acd54de038349/source/output_fusions.cpp#L810

kgaonkar6 added 2 commits February 21, 2020 09:14

edits as per issue AlexsLemonade#552

1550456

add IGH@/IGL@

0d44e2b

jaclyn-taroni self-requested a review February 21, 2020 14:49

kgaonkar6 changed the title ~~edits as per issue #552~~ Fusion Filtering ("other" reading frame ) edits as per issue #552 Feb 21, 2020

jaclyn-taroni mentioned this pull request Feb 21, 2020

Updated analysis: Fusion Filtering #552

Closed

kgaonkar6 changed the title ~~Fusion Filtering ("other" reading frame ) edits as per issue #552~~ Fusion Filtering ("other" reading frame ) edits Feb 21, 2020

kgaonkar6 mentioned this pull request Feb 21, 2020

Planned release: v15 #543

Closed

5 tasks

jaclyn-taroni approved these changes Feb 21, 2020

View reviewed changes

kgaonkar6 and others added 2 commits February 21, 2020 11:57

Update analyses/fusion_filtering/run_fusion_merged.sh

0d05722

add TODO Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

Merge branch 'master' into add_other_fusion

09be624

jaclyn-taroni merged commit 31fe795 into AlexsLemonade:master Feb 21, 2020

kgaonkar6 deleted the add_other_fusion branch December 8, 2020 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fusion Filtering ("other" reading frame ) edits #553

Fusion Filtering ("other" reading frame ) edits #553

kgaonkar6 commented Feb 21, 2020 •

edited

Loading

jaclyn-taroni commented Feb 21, 2020

kgaonkar6 commented Feb 21, 2020

jaclyn-taroni left a comment

jaclyn-taroni Feb 21, 2020

kgaonkar6 Feb 21, 2020

jharenza commented Feb 21, 2020

kgaonkar6 commented Feb 24, 2020

Fusion Filtering ("other" reading frame ) edits #553

Fusion Filtering ("other" reading frame ) edits #553

Conversation

kgaonkar6 commented Feb 21, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Putative Driver:

Scavenge back filtered fusions to add to putative oncogenic fusions ( after QC+expression_filteirng removing LOCAL_REARRANGEMENT|LOCAL_INVERSION as potential read-throughs) :

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jaclyn-taroni commented Feb 21, 2020

kgaonkar6 commented Feb 21, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Feb 21, 2020

Choose a reason for hiding this comment

kgaonkar6 Feb 21, 2020

Choose a reason for hiding this comment

jharenza commented Feb 21, 2020

kgaonkar6 commented Feb 24, 2020

kgaonkar6 commented Feb 21, 2020 •

edited

Loading