Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Recode Oncoprints #1009

Merged
merged 19 commits into from
May 11, 2021
Merged

Recode Oncoprints #1009

merged 19 commits into from
May 11, 2021

Conversation

cbethell
Copy link
Contributor

@cbethell cbethell commented Apr 15, 2021

Purpose/implementation Section

What scientific question is your analysis addressing?

Per #981 (comment), this PR recodes the gain and loss values as Amp and Del, respectively, for the purpose of taking advantage of default maftools behavior that requires this coding when plotting.

What was your approach?

This PR recodes the CNV file's gain and loss values as Amp and Del within the 01-plot-oncoprint.R script of the oncoprint-landscape module.

What GitHub issue does your pull request address?

This PR closes #981 as it allows for better visualization of the Multi_Hit instances.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The updated oncoprint plots in oncoprint-landscape/plots should receive a close look, probably from @jharenza, to ensure that this is the behavior we expect and want.

Is there anything that you want to discuss further?

I will note that upon recoding, I updated the oncoprint_color_palette.tsv file in figures/palettes to match the new coding. I also removed the amplification value from this file as it did not appear in any of the previous oncoprints and would conflict with the new coding.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes.

Results

What types of results are included (e.g., table, figure)?

No new results, just the broad histology oncoprints in oncoprint-landscape/plots were updated.

What is your summary of the results?

Based on the example left in #981 (comment), the plots appear to now reflect the behavior we would want.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

recode "loss" values as "Del"  and "gain" values as "Amp"
@cbethell cbethell requested a review from jharenza April 15, 2021 21:57
@jharenza
Copy link
Collaborator

Hi @cbethell - I think this looks great for Amp/Dels. I did notice (especially for LGG KIAA1549--BRAF fused tumors) that we are still getting a Multi-Hit Fusion for these.

From this issue, I had added:

I think that we need to separate out Multi-Hit from reciprocal fusions, and collapse these fusions, as a reciprocal fusion should only be counted once.

To clarify - the Multi-Hit Fusion was mainly created within the PPTC project because there were some instances of a gene with promiscuous partners. Since in this dataset, we have so many kinase fusions which often have reciprocals (eg: KIAA1549--BRAF and BRAF--KIAA1549), I think we should collapse these as one fusion event - so any occurrence of a reciprocal fusion - same genes/opposite direction, we should just note as Fusion in the oncoplot. I think that will keep things clearer for those reading the plots.

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the comment, if we can update the reciprocal fusions to not be multi-hit, I think that would take care of the rest of this PR.

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cbethell! Thanks for updating this!

I think the output looks exactly like we expect now and I did run the code and do some spot checking, and all looks good with your logic. I just made a small suggestion about shortening one of the dataframes.

I also noticed that all_participants_primary_only_oncoprint.png and all_participants_primary-plus_oncoprint.png are still in the code/plot output. Either we should keep them and update this new fusion code to be rendered in those plots or remove that piece of the code. I think that removing them should be fine.

Otherwise, I think this is almost set!

remove `all_participants` plots
@cbethell
Copy link
Contributor Author

cbethell commented Apr 26, 2021

Hi @cbethell! Thanks for updating this!

I think the output looks exactly like we expect now and I did run the code and do some spot checking, and all looks good with your logic. I just made a small suggestion about shortening one of the dataframes.

I also noticed that all_participants_primary_only_oncoprint.png and all_participants_primary-plus_oncoprint.png are still in the code/plot output. Either we should keep them and update this new fusion code to be rendered in those plots or remove that piece of the code. I think that removing them should be fine.

Otherwise, I think this is almost set!

@jharenza, thanks for the re-review! Although the updated code was not quite ready as I noticed instances where there are more than one multi-hit fusions per sample but only one would be labeled. That said, it appears that I successfully fixed this issue in the most recent commit and now everything should look as we would expect!

I also could not locate where all_participants_primary_only_oncoprint.png and all_participants_primary-plus_oncoprint.png are still in the code, but I removed the plots from this modules plots directory as you mentioned and renamed the histology plots (in the shell script) to exclude the all_participants naming.

@cbethell cbethell requested a review from jharenza April 26, 2021 22:28
@jharenza
Copy link
Collaborator

@jharenza, thanks for the re-review! Although the updated code was not quite ready as I noticed instances where there are more than one multi-hit fusions per sample but only one would be labeled. That said, it appears that I successfully fixed this issue in the most recent commit and now everything should look as we would expect!

I think something happened to the reciprocal fusions in this latest commit in that I think we are missing a lot of them. I have been using LGG as a guide (eg in which we should see upwards of 50% of these samples harboring KIAA1549--BRAF fusions). See this oncoplot from the same data in pedcbio. Also pasted below.

Screen Shot 2021-04-26 at 7 07 32 PM

I also could not locate where all_participants_primary_only_oncoprint.png and all_participants_primary-plus_oncoprint.png are still in the code, but I removed the plots from this modules plots directory as you mentioned and renamed the histology plots (in the shell script) to exclude the all_participants naming.

I see they are now removed - I think the code was still in the bash script, but don't see it now. Thanks!

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment about the reciprocal fusions gone missing after last update.

@cbethell
Copy link
Contributor Author

I think something happened to the reciprocal fusions in this latest commit in that I think we are missing a lot of them. I have been using LGG as a guide (eg in which we should see upwards of 50% of these samples harboring KIAA1549--BRAF fusions). See this oncoplot from the same data in pedcbio. Also pasted below.

Ahh yes great catch @jharenza, after some combination of the code in the previous commits, I believe the logic now produces what we would expect and we appear to have the reciprocal fusions back while retaining the true multi-hit fusions in 8cec80d (after using LGG as a guide and some further spot checking, this seems to be the case at least)! Let me know if you agree or if it appears that we may be missing some data -- I've also refactored a bit in the last set of updates so please feel free to comment on the structuring there!

@cbethell cbethell requested a review from jharenza April 28, 2021 14:05
Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this @cbethell. I don't know why I didn't think of this before, but @kgaonkar6 had created a function in fusion_filtering to identify reciprocal fusions and in fact, that populates the reciprocal_exists column in the putative oncogenic file. So, if you want, you can leverage that column to possibly shorten that code. Up to you!

Otherwise, the outputs look good and I spot checked the few MET multi-hits from HGG and they look as expected.

@cbethell
Copy link
Contributor Author

Thanks for updating this @cbethell. I don't know why I didn't think of this before, but @kgaonkar6 had created a function in fusion_filtering to identify reciprocal fusions and in fact, that populates the reciprocal_exists column in the putative oncogenic file. So, if you want, you can leverage that column to possibly shorten that code. Up to you!

Otherwise, the outputs look good and I spot checked the few MET multi-hits from HGG and they look as expected.

I'll take a look at the code you linked, thanks @jharenza!
Glad to hear that the output looks as expected otherwise :) 👍

@cbethell
Copy link
Contributor Author

I don't know why I didn't think of this before, but @kgaonkar6 had created a function in fusion_filtering to identify reciprocal fusions and in fact, that populates the reciprocal_exists column in the putative oncogenic file. So, if you want, you can leverage that column to possibly shorten that code. Up to you!

Unfortunately, using the reciprocal_exists column did not necessarily shorten the code (perhaps if I went further down the rabbit hole it would have), but in the meantime, I was able to refactor and shorten the code a bit otherwise (with the output looking as expected) in bbfef36.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1063 includes my comments about how the fusions are being handled, returning a comment about recoding the CNAs because it might disappear if #1063 gets merged!

@cbethell
Copy link
Contributor Author

#1063 includes my comments about how the fusions are being handled, returning a comment about recoding the CNAs because it might disappear if #1063 gets merged!

I've addressed your comment about recoding the CNAs in 54e5b57 @jaclyn-taroni and re-ran this PR with said change and the merged changes from #1063. That said, I believe the plots now look as we expect so this is ready for a re-review.

@cbethell cbethell requested a review from jaclyn-taroni May 11, 2021 14:16
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at some of the plots that were discussed earlier in the thread and I get the sense that they look as expected 👍🏻

@jaclyn-taroni
Copy link
Member

I'm going to merge this now, since there are a number of outstanding oncoprint PRs that will need to be updated!

@jaclyn-taroni jaclyn-taroni merged commit c6144f7 into master May 11, 2021
@jaclyn-taroni jaclyn-taroni deleted the cbethell/recode-oncoprints branch May 11, 2021 19:29
kgaonkar6 added a commit to kgaonkar6/OpenPBTA-analysis that referenced this pull request May 11, 2021
kgaonkar6 added a commit to kgaonkar6/OpenPBTA-analysis that referenced this pull request May 11, 2021
kgaonkar6 added a commit to kgaonkar6/OpenPBTA-analysis that referenced this pull request May 11, 2021
kgaonkar6 added a commit to kgaonkar6/OpenPBTA-analysis that referenced this pull request May 11, 2021
@kgaonkar6 kgaonkar6 mentioned this pull request May 26, 2021
5 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updated analysis: update Multi-Hit definition
3 participants