Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Lancet exploration notebook series: Part 1 TCGA-PBTA comparison variations #557

Merged
merged 30 commits into from
Mar 6, 2020

Conversation

cansavvy
Copy link
Collaborator

@cansavvy cansavvy commented Feb 25, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

This first notebook of three was an exploration into how different callers were resulting in different PBTA- TCGA comparisons.

There are three questions and plots to accompany them :

  1. Is the read depth different between PBTA and TCGA
  2. Do TMB comparisons results change if we calculate TMB with each caller by itself?
  3. How much do the TCGA and PBTA overlap in their target WXS regions?

What was your approach?

Respectively:

  1. Plot the read densities for TCGA and PBTA
  2. Recreate the TMB comparison CDF plot for each caller
  3. Create Venndiagrams of the overlaps of the target regions

What GitHub issue does your pull request address?

#548

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Two headlining notes before you get into the nitty gritty:

  1. This PR is huge. Let me know if you want me to split it up.
  2. Note that this analysis is gonna be retired almost as soon as we get it in (and its already outdated with the BED file changes that are still in progress Updated analysis: bedtools creation of bed files for TMB #564 and Bed files intersection fixes #566 )

Which areas should receive a particularly close look?

There are some parts of this analysis that could be DRY'ed up, but it also will be retired as soon as we get it confirmed. Let me know what parts of this code clearly need work and which we will leave as is.

Results

What is your summary of the results?

Here's the rendered notebook:
https://cansavvy.github.io/openpbta-notebook-concept/snv-callers/explore-tcga-pbta.nb.html

The results of this notebook were what was added to this PDF report:

TCGAvsPBTAconsensus.pdf

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

This analysis is not permanent and has not been added to the main READMEs

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@cansavvy cansavvy added the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Feb 25, 2020
@cansavvy cansavvy removed the work in progress Used to label (non-draft) pull requests that are not yet ready for review label Mar 3, 2020
Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, and I only have a few places where I think some DRYing would be useful.

My biggest comment is that the graphs here don't look bad? The TCGA data does seem to have higher TMB? Or am I looking at something wrong? If not, what is the difference here from previous analyses?

I am specifically looking at https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/3b3fd1c3b715da590179aa40a8c13c7316c0db18/analyses/snv-callers/lancet-wxs-tests/plots/tcga-vs-pbta-plots/tmb-cdf-Consensus.png, which looks as I would have expected, vs plots that were produced previously.

Comment on lines 105 to 109
if (is_tcga) {
df <- df %>%
# Shorten the Tumor_Sample_Barcode so it matches
dplyr::mutate(Tumor_Sample_Barcode = substr(Tumor_Sample_Barcode, 0, 12))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just do this always, and save an argument. Taking the substr of a 12 character barcode will just return the barcode.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have taken suggestions to my satisfaction, especially given the fact that this analysis is out of date and has been replaced by more current analyses. Given all that, I am happy to approve this PR in its current form. 🤓

@jaclyn-taroni jaclyn-taroni merged commit 787309a into AlexsLemonade:master Mar 6, 2020
@cansavvy cansavvy deleted the lancet-tests-1 branch March 25, 2020 20:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants