Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

TCGA Consensus and Comparison Revised (1 of 2) #562

Merged
merged 40 commits into from
Feb 28, 2020

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Feb 26, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

If we run the TCGA data through the same variant callers and consensus methods we used for PBTA data, do we get a comparison that more accurately meets our expectations? (aka that adults have a higher tumor mutation burden).

The next PR in this series will carry over the new dataset to the tmb-compare-tcga analysis.

What was your approach?

  • Both TMB calculations for both PBTA and TCGA have been switched to only use Mutect2 and Strelka2. I had to re-run these on AWS.
  • I ran the TCGA files from v14 through the same pipeline we used for the PBTA data. You will note that there were a few metadata cleaning steps that needed to be adjusted for TCGA so there is a new option (--tcga) that needs to be used for the 03-calculate-tmb.R script.
  • You may note that I removed the word "Consensus" from the results files. This is because these TMB results are no longer using the official consensus maf file. Is there another file notation that would make this more clear?

What GitHub issue does your pull request address?

#257

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

  • The BED files for PBTA do need to be fixed see Updated analysis: bedtools creation of bed files for TMB #564 , but that will be in a different PR.
  • All mentions of WGS related files in the TCGA are placeholders because there is not actually any WGS data in TCGA. I haven't made adjustments to make it so WGS options can be dropped completely. Perhaps this is a thing for the next PR?
  • Is there anything where the TCGA data is not being handled properly by the snv-caller pipeline?

Results

This analysis needs to be run in AWS, and because of issues noted for the bed files for PBTA #564 I will end up re-running this in AWS. So do not be too concerned with the exact results, but more what we need to know at this stage is if the changes in the tmb calculations otherwise seem right.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@@ -1,3 +1,4 @@
# ignore folders with big results files
results
ref_files
ref_files/*
!ref_files/gencode.v19.basic.exome.hg38liftover.bed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this needs to be added in somewhere, but I think it will be in a future data release.

@cansavvy cansavvy marked this pull request as ready for review February 26, 2020 21:57
@cansavvy cansavvy changed the title TCGA Consensus and Comparison Revised TCGA Consensus and Comparison Revised (1 of 2) Feb 27, 2020
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have a couple minor questions and comments. I didn't really look at results, as we know that the bed file creation may require updating before those are final.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made a few changes to the split_mnv function that probably should have been in it from the start, but they affect code here. A few other little questions, one of which (probably not directly stated, is whether the union() (or union_all) call should also join by end position, as well as the other join_cols elements.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but this time I did look at the plots, and I', not sure what is going on with analyses/snv-callers/plots/tcga-comparison/tcga-upset_median_vaf_plot.png.

It also looks like analyses/snv-callers/plots/tcga-comparison/tcga-upset_plot.png, analyses/snv-callers/plots/tcga-comparison/tcga-vaf_correlations_plot.png, analyses/snv-callers/plots/tcga-comparison/tcga-variant_classification_plot.png, and analyses/snv-callers/plots/tcga-comparison/tcga-vaf_correlations_plot.png have some errors now, in particular that the latter three are plotting the same set twice, with no lancet data.

Those changes don't seem to be related to the rest of the changes in this PR, so perhaps those files should be reverted? Or will they be updated later?

@cansavvy
Copy link
Collaborator Author

Those changes don't seem to be related to the rest of the changes in this PR, so perhaps those files should be reverted? Or will they be updated later?

Yeah, that will be in a next PR after I have the real data run from AWS. I didn't realize they were changed, I might have ran the data through as a check at some point. I'll revert remove the TCGA notebook for now and have a next PR with the "real" one.

@jashapiro
Copy link
Member

I'll revert remove the TCGA notebook for now and have a next PR with the "real" one.

While you are at it, probably worth removing the .pngs as well.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jaclyn-taroni jaclyn-taroni merged commit 58f6edd into AlexsLemonade:master Feb 28, 2020
@cansavvy cansavvy deleted the tcga-consensus branch February 28, 2020 14:59
@cansavvy cansavvy mentioned this pull request Feb 28, 2020
5 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants