Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

cansavvy · 2020-08-20T14:24:22Z

Purpose/implementation Section

⚠️ This PR includes changes filed in #739; that should be reviewed first

What scientific question is your analysis addressing?

This address part 2 of #729

How often do these: Translation_Start_Site, Nonstop_Mutation or Splice_Site come up? (These are not included in the FoCR definitions, but are included in maftools definition.

There's not very many mutations that are affected and given the results of part 1 (#739) I do not consider this to be something that should be a big concern moving forward.

What was your approach?

Pretty straightforward:

Import consensus mutation files for TCGA and PBTA.
Make barplots of the Variant_Classification variable.
Label which are included in the FoCR nonsynonymous definition and which are only noted by maftools nonsynonymous definition.

I also added an option to calculate_tmb.R script that allows us to choose which nonsynonymous mutation definition we want to use for filtering. So now there's --nonsynfilter_maf or --nonsynfilter_focr.

What GitHub issue does your pull request address?

#729

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Note that this uses the consensus file which is based on strelka, mutect, and Lancet and does NOT have coding region filtering done. So it doesn't exactly answer about TMB directly, but is related.
Does this address the scientific questions at hand?
Are there further analyses we'd like to see here?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

Here's the rendered notebook:
explore_var_class_discrepancies.nb.html.zip

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

I haven't added a README since this is a side analysis. The main information is in the Rmd. It's not a lone standing module so I also didn't add it to the table. But if we would like these items to be added, I can.

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

…on' into cansavvy/var_class_investigation

…defs

jashapiro · 2020-09-03T00:45:39Z

analyses/snv-callers/scripts/03-calculate_tmb.R

+  make_option(
+    opt_str = "--nonsynfilter_focr", action = "store_true",
+    default = FALSE, help = "If TRUE, filter out synonymous mutations, keep
+    non-synonymous mutations, according to Friends of Cancer Research definition.",
    metavar = "character"


I like this addition!

jashapiro

Now that I have looked at this NB, I will confirm what I think I said in my review of #741, namely that I think these two notebooks can be combined, as this is kind of the crux of things at this point.

If we start with the MAF file, we should be getting the same mutation counts by either script: any discrepancy (even off by 1!) indicates a difference in methodology and/or an error in one script or the other. At this point, we know that the differences are small in counts, and the question is whether the FOCR vs. maftools method is accounting for that difference. I suspect it is, though there may be some remaining edge cases that could result in some remaining differences.

The second part (covered more in #741) is a difference in the definitions of the reference regions. I just looked back at that data and have another comment there...

analyses/snv-callers/scripts/03-calculate_tmb.R

analyses/snv-callers/explore_variant_classifications/explore_var_class_discrepancies.Rmd

jashapiro · 2020-09-03T00:55:02Z

analyses/snv-callers/explore_variant_classifications/explore_var_class_discrepancies.Rmd

+  "..",
+  "results",
+  "consensus",
+  "pbta-snv-consensus-mutation.maf.tsv.gz"


Why are we not starting with the full maf file in the data directory here? Using these sub files seems like it could be a source of inconsistency.

I switched this for PBTA, but realized the TCGA tmb data is not in the data release, so I'll file an issue about that.

#732 (comment)

analyses/snv-callers/explore_variant_classifications/explore_var_class_discrepancies.Rmd

cansavvy · 2020-09-03T15:58:38Z

I'm not exactly sure if this addresses all your suggestions, but I think it does? I like keeping the notebook here and the one in #741 separate, since they are slightly different questions (even though they are related).

jashapiro

After our discussion, the comparison of focr and maftools results will go in #741, so this should be fine as is!

cansavvy added 15 commits August 13, 2020 13:22

Add script and Rmd basics

c261bb2

Add no nonsyn filter to steps

464d4d2

Got some basic analyses here.

d9630f9

Edits to script. More polishing.

afaceb5

Add some wording

6a085cc

Add focr or maf option

f42e8b2

Add the tmb plot comparison; rename

970bee5

Merge branch 'master' into cansavvy/var_class_investigation

721f1bb

Fix render

08e7224

Merge remote-tracking branch 'cansavvy/cansavvy/var_class_investigati…

9702928

…on' into cansavvy/var_class_investigation

Merge branch 'cansavvy/var_class_investigation' into cansavvy/filter_…

3ed76d8

…defs

This notebook makes some basic plots

aef2da1

Add ToC

cf12487

Fix file path

c7f0497

Merge branch 'master' into cansavvy/filter_defs

77a6cf6

cansavvy added the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Aug 20, 2020

cansavvy changed the title ~~Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up?~~ WIP: Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? Aug 20, 2020

cansavvy added 4 commits August 21, 2020 09:32

Fix typo

abd7131

Update README to discuss nonsynonymous filter

d061da9

Add a few more notes and comments

56ed73a

Merge branch 'master' into cansavvy/filter_defs

4d19c10

cansavvy changed the title ~~WIP: Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up?~~ Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? Aug 24, 2020

cansavvy removed the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Aug 24, 2020

cansavvy added 2 commits September 1, 2020 08:42

Merge branch 'master' into cansavvy/filter_defs

9dc7916

Merge remote-tracking branch 'origin/master' into cansavvy/filter_defs

bc79efe

cansavvy marked this pull request as ready for review September 2, 2020 14:04

cansavvy requested a review from jashapiro September 2, 2020 14:04

jashapiro mentioned this pull request Sep 2, 2020

Compare TMB calculation sets #741

Merged

5 tasks

jashapiro reviewed Sep 3, 2020

View reviewed changes

cansavvy added 2 commits September 3, 2020 09:42

Incorporating Josh review

4e9321f

Merge branch 'master' into cansavvy/filter_defs

c55e5ae

cansavvy requested a review from jashapiro September 3, 2020 15:58

jashapiro approved these changes Sep 3, 2020

View reviewed changes

jaclyn-taroni merged commit b17171a into AlexsLemonade:master Sep 8, 2020

cansavvy deleted the cansavvy/filter_defs branch September 8, 2020 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

cansavvy commented Aug 20, 2020 •

edited

Loading

jashapiro Sep 3, 2020

jashapiro left a comment

jashapiro Sep 3, 2020

cansavvy Sep 3, 2020

cansavvy Sep 3, 2020

cansavvy commented Sep 3, 2020

jashapiro left a comment

Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

Conversation

cansavvy commented Aug 20, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

Reproducibility Checklist

Documentation Checklist

jashapiro Sep 3, 2020

Choose a reason for hiding this comment

jashapiro left a comment

Choose a reason for hiding this comment

jashapiro Sep 3, 2020

Choose a reason for hiding this comment

cansavvy Sep 3, 2020

Choose a reason for hiding this comment

cansavvy Sep 3, 2020

Choose a reason for hiding this comment

cansavvy commented Sep 3, 2020

jashapiro left a comment

Choose a reason for hiding this comment

cansavvy commented Aug 20, 2020 •

edited

Loading