Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? #740

Merged
merged 23 commits into from
Sep 8, 2020

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Aug 20, 2020

Purpose/implementation Section

⚠️ This PR includes changes filed in #739; that should be reviewed first

What scientific question is your analysis addressing?

This address part 2 of #729

  1. How often do these: Translation_Start_Site, Nonstop_Mutation or Splice_Site come up? (These are not included in the FoCR definitions, but are included in maftools definition.

There's not very many mutations that are affected and given the results of part 1 (#739) I do not consider this to be something that should be a big concern moving forward.

What was your approach?

Pretty straightforward:

  • Import consensus mutation files for TCGA and PBTA.
  • Make barplots of the Variant_Classification variable.
  • Label which are included in the FoCR nonsynonymous definition and which are only noted by maftools nonsynonymous definition.

I also added an option to calculate_tmb.R script that allows us to choose which nonsynonymous mutation definition we want to use for filtering. So now there's --nonsynfilter_maf or --nonsynfilter_focr.

What GitHub issue does your pull request address?

#729

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

  • Note that this uses the consensus file which is based on strelka, mutect, and Lancet and does NOT have coding region filtering done. So it doesn't exactly answer about TMB directly, but is related.
  • Does this address the scientific questions at hand?
  • Are there further analyses we'd like to see here?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

Here's the rendered notebook:
explore_var_class_discrepancies.nb.html.zip

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

I haven't added a README since this is a side analysis. The main information is in the Rmd. It's not a lone standing module so I also didn't add it to the table. But if we would like these items to be added, I can.

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@cansavvy cansavvy added the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Aug 20, 2020
@cansavvy cansavvy changed the title Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? WIP: Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? Aug 20, 2020
@cansavvy cansavvy changed the title WIP: Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? Explore Variant_Classification's Impact on TMB: Part 2: Definition discrepancies - how often do they come up? Aug 24, 2020
@cansavvy cansavvy removed the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Aug 24, 2020
@cansavvy cansavvy marked this pull request as ready for review September 2, 2020 14:04
@cansavvy cansavvy requested a review from jashapiro September 2, 2020 14:04
@jashapiro jashapiro mentioned this pull request Sep 2, 2020
5 tasks
Comment on lines +80 to 84
make_option(
opt_str = "--nonsynfilter_focr", action = "store_true",
default = FALSE, help = "If TRUE, filter out synonymous mutations, keep
non-synonymous mutations, according to Friends of Cancer Research definition.",
metavar = "character"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this addition!

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I have looked at this NB, I will confirm what I think I said in my review of #741, namely that I think these two notebooks can be combined, as this is kind of the crux of things at this point.

If we start with the MAF file, we should be getting the same mutation counts by either script: any discrepancy (even off by 1!) indicates a difference in methodology and/or an error in one script or the other. At this point, we know that the differences are small in counts, and the question is whether the FOCR vs. maftools method is accounting for that difference. I suspect it is, though there may be some remaining edge cases that could result in some remaining differences.

The second part (covered more in #741) is a difference in the definitions of the reference regions. I just looked back at that data and have another comment there...

"..",
"results",
"consensus",
"pbta-snv-consensus-mutation.maf.tsv.gz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not starting with the full maf file in the data directory here? Using these sub files seems like it could be a source of inconsistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this for PBTA, but realized the TCGA tmb data is not in the data release, so I'll file an issue about that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cansavvy
Copy link
Collaborator Author

cansavvy commented Sep 3, 2020

I'm not exactly sure if this addresses all your suggestions, but I think it does? I like keeping the notebook here and the one in #741 separate, since they are slightly different questions (even though they are related).

@cansavvy cansavvy requested a review from jashapiro September 3, 2020 15:58
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After our discussion, the comparison of focr and maftools results will go in #741, so this should be fine as is!

@jaclyn-taroni jaclyn-taroni merged commit b17171a into AlexsLemonade:master Sep 8, 2020
@cansavvy cansavvy deleted the cansavvy/filter_defs branch September 8, 2020 18:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants