Update oncoprints to use histology-specific goi lists #1046

cbethell · 2021-04-29T13:09:32Z

Purpose/implementation Section

What scientific question is your analysis addressing?

This PR incorporates histology-specific genes of interest lists for the appropriate histology oncoprints.

What was your approach?

Using the oncoprint-goi-lists-OpenPBTA.csv file, linked in #969 (comment), I prepared a script (saved in util) to make each a column into its own TSV file, named appropriately for the associated histology and stored in oncoprint-landscape/data for use with 01-plot-oncoprint.R

I then adjusted the file paths to the goi lists in the run-oncoprint.sh script.

What GitHub issue does your pull request address?

This PR closes #1004

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The goi plots have been updated so they should receive a close look to ensure that they were updated as expected.

Is there anything that you want to discuss further?

The file path to the main goi file (in the run-oncoprint.sh script) will likely need to be updated.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes, although it will need to be re-ran once v19 is merged and the file path to the main goi file will likely need to be updated if it is planned to be included in the data release.

Results

What types of results are included (e.g., table, figure)?

Updated goi oncoprints

What is your summary of the results?

The plots seems to be updated as expected

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jharenza

Hi @cbethell! I think this looks good and oncoprints and lists match as expected, but the top = 25 parameter in the oncoplot function is not working as expected (I am seeing >25 genes in some of the oncoprints).

We should get #1009 merged before this, but after the v19 release.

analyses/oncoprint-landscape/util/prepare-goi-lists.R

Co-authored-by: Jo Lynne Rokita <jolynnerokita@d3b.center>

cbethell · 2021-05-03T12:25:13Z

Hi @cbethell! I think this looks good and oncoprints and lists match as expected, but the top = 25 parameter in the oncoplot function is not working as expected (I am seeing >25 genes in some of the oncoprints).

We should get #1009 merged before this, but after the v19 release.

Hi @jharenza, is it the goi oncoprints that appear to not adhering to the expected behavior? If so, as noted in #975 (comment),

it appears that a default behavior of the oncoplot() function is to display the top 20 genes with mutations, a functionality that can be over-ridden if supplied a genes of interest list with more than 20 genes.

In other words, the top = 25 parameter is over-ridden in cases where a goi list has been supplied. That said, if a goi list contains >25 genes, >25 genes will be displayed on the oncoprint.

jharenza · 2021-05-03T19:23:53Z

In other words, the top = 25 parameter is over-ridden in cases where a goi list has been supplied. That said, if a goi list contains >25 genes, >25 genes will be displayed on the oncoprint.

Ah, yes. I remember us discussing this, but could not find it at the time. Looking back at the PPTC paper code, it looks like what I had done to create the oncoprints as a max of N genes while using genelists was to get the summary of mutations per gene first, then read just that gene list into the oncoprint function:
https://github.com/marislab/create-pptc-pdx-oncoprints/blob/13e2fd942b33ae7f4ba3f0cec844c0631283b8f5/R/create-complexheat-oncoprint-revision.R#L20-L59

What do you think about implementing this?

cbethell · 2021-05-03T21:01:12Z

In other words, the top = 25 parameter is over-ridden in cases where a goi list has been supplied. That said, if a goi list contains >25 genes, >25 genes will be displayed on the oncoprint.

Ah, yes. I remember us discussing this, but could not find it at the time. Looking back at the PPTC paper code, it looks like what I had done to create the oncoprints as a max of N genes while using genelists was to get the summary of mutations per gene first, then read just that gene list into the oncoprint function:
https://github.com/marislab/create-pptc-pdx-oncoprints/blob/13e2fd942b33ae7f4ba3f0cec844c0631283b8f5/R/create-complexheat-oncoprint-revision.R#L20-L59

What do you think about implementing this?

Ah yes @jharenza, I can refactor the module to take in GOI lists and top N as arguments and where both specified, count up the number of mutations in the genes in the GOI list from the MAF file and then subset the GOI list to the top N genes on the basis of the number of mutations, which is what seems to be implemented in the code you linked above!

…alysis into cbethell/prep-for-histology-goi-lists

jaclyn-taroni · 2021-05-04T20:03:32Z

analyses/oncoprint-landscape/01-plot-oncoprint.R

+# Subset `maf_object` for histology-specific goi list
+if (!is.null(opt$goi_list)) {
+  maf_object = subsetMaf(
+    maf = maf_object,
+    tsb = metadata$Tumor_Sample_Barcode,
+    genes = goi_list,
+    mafObj = TRUE
+  )
+
+  # Get top mutated genes per this subset object
+  gene_sum <- mafSummary(maf_object)$gene.summary
+
+  # Sort to get top altered genes rather than mutated only genes
+  goi_ordered <-
+    gene_sum[order(gene_sum$AlteredSamples, decreasing = T),]
+
+  if (!is.null(opt$top_n)) {
+
+    # Select top `n` genes if the argument is provided
+    top_n <- ifelse(nrow(gene_sum) < opt$top_n, nrow(gene_sum), opt$top_n)
+
+    goi_list <- goi_ordered[1:top_n,]
+
+  }
+
+}


The actual code changes I'm suggesting are untested, but I believe you will only have to take these steps if someone specifies a GOI list and a top n, so you can simplify this a bit:

Suggested change

# Subset `maf_object` for histology-specific goi list

if (!is.null(opt$goi_list)) {

maf_object = subsetMaf(

maf = maf_object,

tsb = metadata$Tumor_Sample_Barcode,

genes = goi_list,

mafObj = TRUE

)

# Get top mutated genes per this subset object

gene_sum <- mafSummary(maf_object)$gene.summary

# Sort to get top altered genes rather than mutated only genes

goi_ordered <-

gene_sum[order(gene_sum$AlteredSamples, decreasing = T),]

if (!is.null(opt$top_n)) {

# Select top `n` genes if the argument is provided

top_n <- ifelse(nrow(gene_sum) < opt$top_n, nrow(gene_sum), opt$top_n)

goi_list <- goi_ordered[1:top_n,]

}

}

# Subset `maf_object` for histology-specific goi list

if (!is.null(opt$goi_list) & !is.null(opt$top_n)) {

maf_object <- subsetMaf(

maf = maf_object,

tsb = metadata$Tumor_Sample_Barcode,

genes = goi_list,

mafObj = TRUE

)

# Get top mutated genes per this subset object

gene_sum <- mafSummary(maf_object)$gene.summary

# Sort to get top altered genes rather than mutated only genes

goi_ordered <-

gene_sum[order(gene_sum$AlteredSamples, decreasing = T),]

# Select top `n` genes if the argument is provided

top_n <- ifelse(nrow(gene_sum) < opt$top_n, nrow(gene_sum), opt$top_n)

goi_list <- goi_ordered[1:top_n,]

}

Upon testing this method does not appear to obey the top_n argument.

Can you expand on that a bit?

It does not limit the amount of genes being displayed to top_n, it instead shows all of the genes listed in the goi list (the behavior it previously exhibited but we did not want).

But the two if way worked?

That's correct.

That suggests that the logic isn't quite right because it's using the original list (goi_list) - the conditions for the if() are not being met. Since you don't want to subset the MAF and do the mafSummary() steps unless you have to and you only have to if you have a top_n argument I would see if you can get to the bottom of it.

analyses/oncoprint-landscape/01-plot-oncoprint.R

* Move goi prep out of util and renumber, etc. * Add a distinct() step * New naming scheme * Use more arrays in the bash script, renumbering * Update oncoprint PNGs

- re-ran to ensure everything is running as expected

cbethell · 2021-05-06T21:19:41Z

As mentioned in #1053 (comment)

It appears that in sample 7316-2285, when looking at the lgat plots, the display_group value is Neuronal and mixed neuronal-glial tumor while the broad_histology value is Low-grade astrocytic tumor.

See the display_group representation in the example of primary-plus_lgat_goi_oncoprint.png below:

I guess the question now is should we leave this info as is or should we recode the display_group? (A similar situation is occurring in the Other CNS plots where Chordoma is the display_group value associated with the broad_histology Choroid plexus tumor -- the value we do want). Any thoughts here @jharenza?

Otherwise, this PR now appears to do what we expect re incorporating goi_list and top_n.

jharenza · 2021-05-06T21:28:24Z

hi @cbethell! For 7316-2285, in v19, this has broad_histology == Neuronal and mixed neuronal-glial tumor, so this should be fixed now. For Chordoma samples, they now have broad_histology == Chordoma. The choroid plexus tumor was a mismatch because the WHO 2016 didn't have a spot for chordomas and we made it its own broad histology, as it is different. So, I think this should help.

cbethell · 2021-05-06T21:30:02Z

hi @cbethell! For 7316-2285, in v19, this has broad_histology == Neuronal and mixed neuronal-glial tumor, so this should be fixed now. For Chordoma samples, they now have broad_histology == Chordoma. The choroid plexus tumor was a mismatch because the WHO 2016 didn't have a spot for chordomas and we made it its own broad histology, as it is different. So, I think this should help.

Great, thanks @jharenza! I'll be sure to re-run with v19 files.

* Update top_n logic * top_n can not be NULL + rerun

jaclyn-taroni · 2021-05-07T12:32:41Z

@cbethell can you run this with v19 again now that #1054 is in please? I'm going to start merging the subtyping PRs that should probably get rerun with v19!

jharenza

hi @cbethell and @jaclyn-taroni. I think this generally looks good as far as goi output. There is one thing I noticed - not sure if we want to fix here or elsewhere, and that is the display of genes when there are no mutations - these we should remove from the plots. I wasn't expecting this, but also did not add that to the ticket specifically. I will approve and let you both decide whether we need a new ticket for that.

cbethell · 2021-05-10T12:44:01Z

hi @cbethell and @jaclyn-taroni. I think this generally looks good as far as goi output. There is one thing I noticed - not sure if we want to fix here or elsewhere, and that is the display of genes when there are no mutations - these we should remove from the plots. I wasn't expecting this, but also did not add that to the ticket specifically. I will approve and let you both decide whether we need a new ticket for that.

After some further investigation, it seems that the underlying issue here was the lack of coding for Intron, 5'Flank, and 3'Flank. These accounted for some number of mutations in the AlteredSamples, and therefore showed up "blank" with lack of coding.

On my local machine, I have updated the coding in oncoprint_color_palette.tsv and oncoplot_functions.R to account for the values named above. This seems to have fixed the display of genes as mentioned above, but I am not sure if this is the ideal fix as the Intron values seem to crowd the non-goi plots. See the lgat goi vs non-goi plots below and feel free to leave any thoughts here @jaclyn-taroni and @jharenza! (Note that I have committed these changes in 04874a4 for your reference but am ready to revert these changes if this is not what we want).

- re-run

cbethell · 2021-05-10T13:05:24Z

Given the findings in #1046 (comment), do we want to instead filter the Intron, 5'Flank, and 3'Flank cases out of the MAF file before subsetting and plotting @jharenza ?

jharenza · 2021-05-10T18:02:52Z

Given the findings in #1046 (comment), do we want to instead filter the Intron, 5'Flank, and 3'Flank cases out of the MAF file before subsetting and plotting @jharenza ?

Oh, nice find! We definitely do not want to include intronic variants here, so I would suggest we can either remove them from the MAF to lighten that load or remove them from the altered samples table. We should keep the 3' and 5' flank - TERT promoter mutations from #819 will be included as 5' Flank. Based on the hotspot maf, we have Variant_Classification ==

Missense_Mutation 
Splice_Site    
Nonsense_Mutation
Frame_Shift_Del   
In_Frame_Ins   
In_Frame_Del      
Frame_Shift_Ins
5'Flank

so, everything else is currently captured

- filter to genes with `AlteredSamples >1` and re-run

cbethell · 2021-05-10T20:10:26Z

Given the findings in #1046 (comment), do we want to instead filter the Intron, 5'Flank, and 3'Flank cases out of the MAF file before subsetting and plotting @jharenza ?

Oh, nice find! We definitely do not want to include intronic variants here, so I would suggest we can either remove them from the MAF to lighten that load or remove them from the altered samples table. We should keep the 3' and 5' flank - TERT promoter mutations from #819 will be included as 5' Flank. Based on the hotspot maf, we have Variant_Classification ==
Missense_Mutation 
Splice_Site    
Nonsense_Mutation
Frame_Shift_Del   
In_Frame_Ins   
In_Frame_Del      
Frame_Shift_Ins
5'Flank
so, everything else is currently captured.a

Thanks for the feedback @jharenza!

In the most recent commit 52ac162, you'll find that @jaclyn-taroni and I made the decision to filter to genes with AlteredSamples > 1, as to handle any cases with 0 (or near to) mutations being displayed on the oncoprints.

We also added a flag called include_introns, which by default is equal to FALSE -- meaning that we have removed intronic variants from the oncoprints by default while the option to include them is also there if we need to at some point in the future.

jharenza · 2021-05-12T23:09:59Z

In the most recent commit 52ac162, you'll find that @jaclyn-taroni and I made the decision to filter to genes with AlteredSamples > 1, as to handle any cases with 0 (or near to) mutations being displayed on the oncoprints.

Thanks! This looks like it does what it should - filters out introns and removes non-mutated genes, except now I am seeing that the TMB portion of the plot seems to be missing bars.
This can be seen vividly in LGAT goi and

CNS other goi

Scrolling through above, I see this happened a few other times, and perhaps warrants its own issue.

The other thing I noticed is that these oncoprints are missing the reciprocal fusion logic in that commit, but I see it in your latest commit ea6e207. However, in this latest commit, I am seeing the odd behavior of goi and non-goi oncoplot names being reversed, but not even reversed because the goi lists aren't being populated properly. For instance:

primary-plus-low-grade-astrocytic-tumor_goi_oncoprint.png:

This is not the goi list, but the file refers to it as such

primary-plus-low-grade-astrocytic-tumor_oncoprint.png:

There are genes in this that are not in the goi list, so I am not sure where exactly this list comes from?

The TMB issue seems to be [mostly] resolved in commit ea6e207. I say mostly because there are just a few samples with no bars, but they seem to be very few compared to before and maybe those TMBs are below the visible log10 threshold which would be fine.

jaclyn-taroni · 2021-05-13T00:07:17Z

The fact that there are no PNG files in files changed when we are changing how we set file names in the shell script here is certainly suspicious.

jharenza · 2021-05-13T00:11:44Z

The fact that there are no PNG files in files changed when we are changing how we set file names in the shell script here is certainly suspicious.

yes, I checked multiple times because I thought I was going crazy...

jaclyn-taroni · 2021-05-13T00:12:49Z

Rerunning locally now 🎲

jaclyn-taroni · 2021-05-13T00:25:19Z

New plots added in 3ad02de @jharenza. I haven't looked at them in detail yet (that is a tomorrow problem!) but seems like the TMB issue might still exist in the GOI plots?

jharenza · 2021-05-13T00:47:08Z

New plots added in 3ad02de @jharenza. I haven't looked at them in detail yet (that is a tomorrow problem!) but seems like the TMB issue might still exist in the GOI plots?

Ok great - the oncoprint content looks correct now, but yes, the TMB issue still exists. It looks like the goi subset is resulting in a counting of only those gene alterations for the TMB calculation and in the non-goi plots, you can see that the TMB plots contain all alterations. I may have seen this before when subsetting the maf. I think the only way to perhaps get around this would be to refactor the code in this order:

subset maf to goi list
gather top N goi from summary
plot full maf (not subsetted maf) but only plot genes from 2 (instead of top N from 1).
If we want to test this in a new PR, I can submit an issue.

cbethell · 2021-05-13T13:34:52Z

New plots added in 3ad02de @jharenza. I haven't looked at them in detail yet (that is a tomorrow problem!) but seems like the TMB issue might still exist in the GOI plots?

Ok great - the oncoprint content looks correct now, but yes, the TMB issue still exists. It looks like the goi subset is resulting in a counting of only those gene alterations for the TMB calculation and in the non-goi plots, you can see that the TMB plots contain all alterations. I may have seen this before when subsetting the maf. I think the only way to perhaps get around this would be to refactor the code in this order:

subset maf to goi list

gather top N goi from summary

plot full maf (not subsetted maf) but only plot genes from 2 (instead of top N from 1).
If we want to test this in a new PR, I can submit an issue.

With the changes made and committed in d71f76b, it looks like the missing bars associated with the TMB portion of the goi plots have been retained, would you agree @jharenza?

jharenza · 2021-05-13T18:16:00Z

With the changes made and committed in d71f76b, it looks like the missing bars associated with the TMB portion of the goi plots have been retained, would you agree @jharenza?

Yes, this looks great and I think good to go!

jharenza

LGTM now!

cbethell and others added 2 commits April 29, 2021 08:57

Incorporate histology specific goi lists

5d0bcf0

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

a5e5af1

cbethell changed the title ~~WIP: Update oncoprints to use histology-specific goi lists~~ Update oncoprints to use histology-specific goi lists Apr 29, 2021

cbethell marked this pull request as ready for review April 29, 2021 13:17

jharenza self-requested a review April 30, 2021 22:23

jharenza suggested changes Apr 30, 2021

View reviewed changes

analyses/oncoprint-landscape/util/prepare-goi-lists.R Outdated Show resolved Hide resolved

jharenza added the merge after release label Apr 30, 2021

cbethell and others added 2 commits May 3, 2021 08:19

Update analyses/oncoprint-landscape/util/prepare-goi-lists.R

da63721

Co-authored-by: Jo Lynne Rokita <jolynnerokita@d3b.center>

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

7e28741

cbethell and others added 3 commits May 4, 2021 13:03

Merge branch 'master' of https://github.com/AlexsLemonade/OpenPBTA-an…

bdc3319

…alysis into cbethell/prep-for-histology-goi-lists

add top_n argument for goi plots and re-run

9cbd240

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

d4c2e51

jaclyn-taroni reviewed May 4, 2021

View reviewed changes

cbethell mentioned this pull request May 4, 2021

Combine broad histology oncoprints into a single multi-panel plot #1051

Merged

5 tasks

jaclyn-taroni reviewed May 4, 2021

View reviewed changes

analyses/oncoprint-landscape/01-plot-oncoprint.R Outdated Show resolved Hide resolved

jaclyn-taroni mentioned this pull request May 5, 2021

Oncoprint GOI list revisions #1053

Merged

cbethell and others added 3 commits May 6, 2021 16:18

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

c9a1f8a

Oncoprint GOI list revisions (#1053)

c51132c

* Move goi prep out of util and renumber, etc. * Add a distinct() step * New naming scheme * Use more arrays in the bash script, renumbering * Update oncoprint PNGs

remove tolower() and gsub()

b39e417

- re-ran to ensure everything is running as expected

jharenza removed the merge after release label May 6, 2021

jaclyn-taroni mentioned this pull request May 6, 2021

Update top_n logic #1054

Merged

cbethell and others added 2 commits May 6, 2021 18:23

re-run module shell script with v19 data

e64290f

Update top_n logic (#1054)

567effe

* Update top_n logic * top_n can not be NULL + rerun

jaclyn-taroni requested a review from jharenza May 7, 2021 13:02

jharenza approved these changes May 7, 2021

View reviewed changes

add coding for intron, 5'flank, and 3'flank

04874a4

- re-run

add flag to handle the inclusion of introns

52ac162

- filter to genes with `AlteredSamples >1` and re-run

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

ede7838

cbethell requested a review from jharenza May 11, 2021 14:18

fix merge conflicts and re-run

ea6e207

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

44cf639

jaclyn-taroni added 2 commits May 12, 2021 20:13

Try deleting all PNGs currently in the plots directory

f889ad5

Rerun locally and add plots

3ad02de

adjust logic around goi subsetting and re-run

d71f76b

jharenza approved these changes May 13, 2021

View reviewed changes

Merge branch 'master' into cbethell/prep-for-histology-goi-lists

34c3b16

jaclyn-taroni added the merge next label May 13, 2021

jaclyn-taroni merged commit 4139943 into master May 13, 2021

jaclyn-taroni removed the merge next label May 13, 2021

jharenza mentioned this pull request Jul 1, 2021

Proposed Analysis: Create broad histology-specific gene lists for oncoprint #969

Closed

jaclyn-taroni deleted the cbethell/prep-for-histology-goi-lists branch November 17, 2021 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update oncoprints to use histology-specific goi lists #1046

Update oncoprints to use histology-specific goi lists #1046

cbethell commented Apr 29, 2021 •

edited

Loading

jharenza left a comment

cbethell commented May 3, 2021

jharenza commented May 3, 2021

cbethell commented May 3, 2021

jaclyn-taroni May 4, 2021

cbethell May 4, 2021

jaclyn-taroni May 4, 2021

cbethell May 4, 2021 •

edited

Loading

jaclyn-taroni May 4, 2021

cbethell May 4, 2021

jaclyn-taroni May 4, 2021

cbethell commented May 6, 2021

jharenza commented May 6, 2021

cbethell commented May 6, 2021

jaclyn-taroni commented May 7, 2021

jharenza left a comment

cbethell commented May 10, 2021 •

edited

Loading

cbethell commented May 10, 2021 •

edited

Loading

jharenza commented May 10, 2021 •

edited

Loading

cbethell commented May 10, 2021

jharenza commented May 12, 2021

jaclyn-taroni commented May 13, 2021

jharenza commented May 13, 2021

jaclyn-taroni commented May 13, 2021

jaclyn-taroni commented May 13, 2021

jharenza commented May 13, 2021

cbethell commented May 13, 2021

jharenza commented May 13, 2021

jharenza left a comment

Update oncoprints to use histology-specific goi lists #1046

Update oncoprints to use histology-specific goi lists #1046

Conversation

cbethell commented Apr 29, 2021 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza left a comment

Choose a reason for hiding this comment

cbethell commented May 3, 2021

jharenza commented May 3, 2021

cbethell commented May 3, 2021

jaclyn-taroni May 4, 2021

Choose a reason for hiding this comment

cbethell May 4, 2021

Choose a reason for hiding this comment

jaclyn-taroni May 4, 2021

Choose a reason for hiding this comment

cbethell May 4, 2021 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni May 4, 2021

Choose a reason for hiding this comment

cbethell May 4, 2021

Choose a reason for hiding this comment

jaclyn-taroni May 4, 2021

Choose a reason for hiding this comment

cbethell commented May 6, 2021

jharenza commented May 6, 2021

cbethell commented May 6, 2021

jaclyn-taroni commented May 7, 2021

jharenza left a comment

Choose a reason for hiding this comment

cbethell commented May 10, 2021 • edited Loading

cbethell commented May 10, 2021 • edited Loading

jharenza commented May 10, 2021 • edited Loading

cbethell commented May 10, 2021

jharenza commented May 12, 2021

jaclyn-taroni commented May 13, 2021

jharenza commented May 13, 2021

jaclyn-taroni commented May 13, 2021

jaclyn-taroni commented May 13, 2021

jharenza commented May 13, 2021

cbethell commented May 13, 2021

jharenza commented May 13, 2021

jharenza left a comment

Choose a reason for hiding this comment

cbethell commented Apr 29, 2021 •

edited

Loading

cbethell May 4, 2021 •

edited

Loading

cbethell commented May 10, 2021 •

edited

Loading

cbethell commented May 10, 2021 •

edited

Loading

jharenza commented May 10, 2021 •

edited

Loading