-
Notifications
You must be signed in to change notification settings - Fork 83
TCGA Consensus and Comparison Revised (1 of 2) #562
Conversation
@@ -1,3 +1,4 @@ | |||
# ignore folders with big results files | |||
results | |||
ref_files | |||
ref_files/* | |||
!ref_files/gencode.v19.basic.exome.hg38liftover.bed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, this needs to be added in somewhere, but I think it will be in a future data release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I have a couple minor questions and comments. I didn't really look at results, as we know that the bed file creation may require updating before those are final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made a few changes to the split_mnv function that probably should have been in it from the start, but they affect code here. A few other little questions, one of which (probably not directly stated, is whether the union() (or union_all) call should also join by end position, as well as the other join_cols
elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but this time I did look at the plots, and I', not sure what is going on with analyses/snv-callers/plots/tcga-comparison/tcga-upset_median_vaf_plot.png
.
It also looks like analyses/snv-callers/plots/tcga-comparison/tcga-upset_plot.png
, analyses/snv-callers/plots/tcga-comparison/tcga-vaf_correlations_plot.png
, analyses/snv-callers/plots/tcga-comparison/tcga-variant_classification_plot.png
, and analyses/snv-callers/plots/tcga-comparison/tcga-vaf_correlations_plot.png
have some errors now, in particular that the latter three are plotting the same set twice, with no lancet data.
Those changes don't seem to be related to the rest of the changes in this PR, so perhaps those files should be reverted? Or will they be updated later?
Yeah, that will be in a next PR after I have the real data run from AWS. I didn't realize they were changed, I might have ran the data through as a check at some point. I'll revert remove the TCGA notebook for now and have a next PR with the "real" one. |
While you are at it, probably worth removing the .pngs as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Purpose/implementation Section
What scientific question is your analysis addressing?
If we run the TCGA data through the same variant callers and consensus methods we used for PBTA data, do we get a comparison that more accurately meets our expectations? (aka that adults have a higher tumor mutation burden).
The next PR in this series will carry over the new dataset to the
tmb-compare-tcga
analysis.What was your approach?
--tcga
) that needs to be used for the 03-calculate-tmb.R script.What GitHub issue does your pull request address?
#257
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Results
This analysis needs to be run in AWS, and because of issues noted for the bed files for PBTA #564 I will end up re-running this in AWS. So do not be too concerned with the exact results, but more what we need to know at this stage is if the changes in the tmb calculations otherwise seem right.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.