-
Notifications
You must be signed in to change notification settings - Fork 83
Update oncoprints to use histology-specific goi lists #1046
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Jo Lynne Rokita <jolynnerokita@d3b.center>
Hi @jharenza, is it the goi oncoprints that appear to not adhering to the expected behavior? If so, as noted in #975 (comment),
In other words, the |
Ah, yes. I remember us discussing this, but could not find it at the time. Looking back at the PPTC paper code, it looks like what I had done to create the oncoprints as a max of N genes while using genelists was to get the summary of mutations per gene first, then read just that gene list into the oncoprint function: What do you think about implementing this? |
Ah yes @jharenza, I can refactor the module to take in GOI lists and top |
# Subset `maf_object` for histology-specific goi list | ||
if (!is.null(opt$goi_list)) { | ||
maf_object = subsetMaf( | ||
maf = maf_object, | ||
tsb = metadata$Tumor_Sample_Barcode, | ||
genes = goi_list, | ||
mafObj = TRUE | ||
) | ||
|
||
# Get top mutated genes per this subset object | ||
gene_sum <- mafSummary(maf_object)$gene.summary | ||
|
||
# Sort to get top altered genes rather than mutated only genes | ||
goi_ordered <- | ||
gene_sum[order(gene_sum$AlteredSamples, decreasing = T),] | ||
|
||
if (!is.null(opt$top_n)) { | ||
|
||
# Select top `n` genes if the argument is provided | ||
top_n <- ifelse(nrow(gene_sum) < opt$top_n, nrow(gene_sum), opt$top_n) | ||
|
||
goi_list <- goi_ordered[1:top_n,] | ||
|
||
} | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual code changes I'm suggesting are untested, but I believe you will only have to take these steps if someone specifies a GOI list and a top n, so you can simplify this a bit:
# Subset `maf_object` for histology-specific goi list | |
if (!is.null(opt$goi_list)) { | |
maf_object = subsetMaf( | |
maf = maf_object, | |
tsb = metadata$Tumor_Sample_Barcode, | |
genes = goi_list, | |
mafObj = TRUE | |
) | |
# Get top mutated genes per this subset object | |
gene_sum <- mafSummary(maf_object)$gene.summary | |
# Sort to get top altered genes rather than mutated only genes | |
goi_ordered <- | |
gene_sum[order(gene_sum$AlteredSamples, decreasing = T),] | |
if (!is.null(opt$top_n)) { | |
# Select top `n` genes if the argument is provided | |
top_n <- ifelse(nrow(gene_sum) < opt$top_n, nrow(gene_sum), opt$top_n) | |
goi_list <- goi_ordered[1:top_n,] | |
} | |
} | |
# Subset `maf_object` for histology-specific goi list | |
if (!is.null(opt$goi_list) & !is.null(opt$top_n)) { | |
maf_object <- subsetMaf( | |
maf = maf_object, | |
tsb = metadata$Tumor_Sample_Barcode, | |
genes = goi_list, | |
mafObj = TRUE | |
) | |
# Get top mutated genes per this subset object | |
gene_sum <- mafSummary(maf_object)$gene.summary | |
# Sort to get top altered genes rather than mutated only genes | |
goi_ordered <- | |
gene_sum[order(gene_sum$AlteredSamples, decreasing = T),] | |
# Select top `n` genes if the argument is provided | |
top_n <- ifelse(nrow(gene_sum) < opt$top_n, nrow(gene_sum), opt$top_n) | |
goi_list <- goi_ordered[1:top_n,] | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon testing this method does not appear to obey the top_n
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you expand on that a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not limit the amount of genes being displayed to top_n
, it instead shows all of the genes listed in the goi list (the behavior it previously exhibited but we did not want).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the two if
way worked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That suggests that the logic isn't quite right because it's using the original list (goi_list
) - the conditions for the if()
are not being met. Since you don't want to subset the MAF and do the mafSummary()
steps unless you have to and you only have to if you have a top_n
argument I would see if you can get to the bottom of it.
* Move goi prep out of util and renumber, etc. * Add a distinct() step * New naming scheme * Use more arrays in the bash script, renumbering * Update oncoprint PNGs
- re-ran to ensure everything is running as expected
As mentioned in #1053 (comment) It appears that in sample See the I guess the question now is should we leave this info as is or should we recode the Otherwise, this PR now appears to do what we expect re incorporating |
hi @cbethell! For |
Great, thanks @jharenza! I'll be sure to re-run with v19 files. |
* Update top_n logic * top_n can not be NULL + rerun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @cbethell and @jaclyn-taroni. I think this generally looks good as far as goi output. There is one thing I noticed - not sure if we want to fix here or elsewhere, and that is the display of genes when there are no mutations - these we should remove from the plots. I wasn't expecting this, but also did not add that to the ticket specifically. I will approve and let you both decide whether we need a new ticket for that.
After some further investigation, it seems that the underlying issue here was the lack of coding for On my local machine, I have updated the coding in |
Given the findings in #1046 (comment), do we want to instead filter the |
Oh, nice find! We definitely do not want to include intronic variants here, so I would suggest we can either remove them from the MAF to lighten that load or remove them from the altered samples table. We should keep the 3' and 5' flank - TERT promoter mutations from #819 will be included as
so, everything else is currently captured |
- filter to genes with `AlteredSamples >1` and re-run
Thanks for the feedback @jharenza! In the most recent commit 52ac162, you'll find that @jaclyn-taroni and I made the decision to filter to genes with We also added a flag called |
Thanks! This looks like it does what it should - filters out introns and removes non-mutated genes, except now I am seeing that the TMB portion of the plot seems to be missing bars. The other thing I noticed is that these oncoprints are missing the reciprocal fusion logic in that commit, but I see it in your latest commit ea6e207. However, in this latest commit, I am seeing the odd behavior of goi and non-goi oncoplot names being reversed, but not even reversed because the goi lists aren't being populated properly. For instance: primary-plus-low-grade-astrocytic-tumor_goi_oncoprint.png: primary-plus-low-grade-astrocytic-tumor_oncoprint.png: The TMB issue seems to be [mostly] resolved in commit ea6e207. I say mostly because there are just a few samples with no bars, but they seem to be very few compared to before and maybe those TMBs are below the visible log10 threshold which would be fine. |
The fact that there are no PNG files in files changed when we are changing how we set file names in the shell script here is certainly suspicious. |
yes, I checked multiple times because I thought I was going crazy... |
Rerunning locally now 🎲 |
Ok great - the oncoprint content looks correct now, but yes, the TMB issue still exists. It looks like the goi subset is resulting in a counting of only those gene alterations for the TMB calculation and in the non-goi plots, you can see that the TMB plots contain all alterations. I may have seen this before when subsetting the maf. I think the only way to perhaps get around this would be to refactor the code in this order:
|
With the changes made and committed in d71f76b, it looks like the missing bars associated with the TMB portion of the goi plots have been retained, would you agree @jharenza? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now!
Purpose/implementation Section
What scientific question is your analysis addressing?
This PR incorporates histology-specific genes of interest lists for the appropriate histology oncoprints.
What was your approach?
Using the oncoprint-goi-lists-OpenPBTA.csv file, linked in #969 (comment), I prepared a script (saved in
util
) to make each a column into its own TSV file, named appropriately for the associated histology and stored inoncoprint-landscape/data
for use with01-plot-oncoprint.R
I then adjusted the file paths to the goi lists in the
run-oncoprint.sh
script.What GitHub issue does your pull request address?
This PR closes #1004
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
The goi plots have been updated so they should receive a close look to ensure that they were updated as expected.
Is there anything that you want to discuss further?
The file path to the main goi file (in the
run-oncoprint.sh
script) will likely need to be updated.Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes, although it will need to be re-ran once v19 is merged and the file path to the main goi file will likely need to be updated if it is planned to be included in the data release.
Results
What types of results are included (e.g., table, figure)?
Updated goi oncoprints
What is your summary of the results?
The plots seems to be updated as expected
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.