-
Notifications
You must be signed in to change notification settings - Fork 83
Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740
Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740
Conversation
…on' into cansavvy/var_class_investigation
make_option( | ||
opt_str = "--nonsynfilter_focr", action = "store_true", | ||
default = FALSE, help = "If TRUE, filter out synonymous mutations, keep | ||
non-synonymous mutations, according to Friends of Cancer Research definition.", | ||
metavar = "character" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this addition!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I have looked at this NB, I will confirm what I think I said in my review of #741, namely that I think these two notebooks can be combined, as this is kind of the crux of things at this point.
If we start with the MAF file, we should be getting the same mutation counts by either script: any discrepancy (even off by 1!) indicates a difference in methodology and/or an error in one script or the other. At this point, we know that the differences are small in counts, and the question is whether the FOCR vs. maftools method is accounting for that difference. I suspect it is, though there may be some remaining edge cases that could result in some remaining differences.
The second part (covered more in #741) is a difference in the definitions of the reference regions. I just looked back at that data and have another comment there...
analyses/snv-callers/explore_variant_classifications/explore_var_class_discrepancies.Rmd
Outdated
Show resolved
Hide resolved
"..", | ||
"results", | ||
"consensus", | ||
"pbta-snv-consensus-mutation.maf.tsv.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we not starting with the full maf file in the data
directory here? Using these sub files seems like it could be a source of inconsistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched this for PBTA, but realized the TCGA tmb data is not in the data release, so I'll file an issue about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
analyses/snv-callers/explore_variant_classifications/explore_var_class_discrepancies.Rmd
Outdated
Show resolved
Hide resolved
analyses/snv-callers/explore_variant_classifications/explore_var_class_discrepancies.Rmd
Show resolved
Hide resolved
I'm not exactly sure if this addresses all your suggestions, but I think it does? I like keeping the notebook here and the one in #741 separate, since they are slightly different questions (even though they are related). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our discussion, the comparison of focr and maftools results will go in #741, so this should be fine as is!
Purpose/implementation Section
What scientific question is your analysis addressing?
This address part 2 of #729
There's not very many mutations that are affected and given the results of part 1 (#739) I do not consider this to be something that should be a big concern moving forward.
What was your approach?
Pretty straightforward:
Variant_Classification
variable.maftools
nonsynonymous definition.I also added an option to
calculate_tmb.R
script that allows us to choose which nonsynonymous mutation definition we want to use for filtering. So now there's--nonsynfilter_maf
or--nonsynfilter_focr
.What GitHub issue does your pull request address?
#729
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
Here's the rendered notebook:
explore_var_class_discrepancies.nb.html.zip
Reproducibility Checklist
Documentation Checklist
I haven't added a README since this is a side analysis. The main information is in the Rmd. It's not a lone standing module so I also didn't add it to the table. But if we would like these items to be added, I can.
README
and it is up to date.analyses/README.md
and the entry is up to date.