Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

TCGA Lancet data has only "calls" with t_alt_count = 0 or NA #512

Closed
cansavvy opened this issue Feb 4, 2020 · 5 comments
Closed

TCGA Lancet data has only "calls" with t_alt_count = 0 or NA #512

cansavvy opened this issue Feb 4, 2020 · 5 comments
Labels

Comments

@cansavvy
Copy link
Collaborator

cansavvy commented Feb 4, 2020

What data file(s) does this issue pertain to?

The TCGA Lancet data : pbta-tcga-snv-lancet.vep.maf.gz of version 14

What release are you using?

v14

Put your question or report your issue here.

I was confused because Lancet's data for TCGA was not agreeing at all with Mutect or Strelka.
I looked into the VAF distributions and saw Lancet was all zeroes, because t_alt_count is only zeroes and NAs. t_alt_count's of 0 shouldn't be calls. Did something happen with a filtering step? It looks like it may have been filtered by n_alt_count > 0 ?

lancet <- data.table::fread("../../data/pbta-tcga-snv-lancet.vep.maf.gz", data.table = FALSE)
summary(lancet$t_alt_count)

  Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0       0       0       0       0       0     622 

As a positive control, and by contrast, Mutect and Strelka show no 0's for t_alt_count

mutect <- data.table::fread("../../data/pbta-tcga-snv-mutect2.vep.maf.gz", data.table = FALSE)
summary(mutect$t_alt_count)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00    4.00    8.00   19.09   21.00 3528.00 
@cansavvy cansavvy added the data label Feb 4, 2020
@cansavvy cansavvy mentioned this issue Feb 4, 2020
4 tasks
@jharenza
Copy link
Collaborator

jharenza commented Feb 4, 2020

Hi @cansavvy - thanks for noticing this - was this also an issue in V13? I am assuming so, since this file should not have changed. I am cc-ing @tkoganti, @migbro, and @yuankunzhu for this.

@jaclyn-taroni
Copy link
Member

I can confirm that it was also an issue for v13:

> lancet <- data.table::fread("data/release-v13-20200116/pbta-tcga-snv-lancet.vep.maf.gz", data.table = FALSE)
> summary(lancet$t_alt_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
      0       0       0       0       0       0     622 

@jharenza
Copy link
Collaborator

jharenza commented Feb 5, 2020

Quick spot check on individual MAFs - seems they are all like this, so not a merge/filter issue (no filtering for these). I asked those above to check out the run, as it appears something went wrong there.

@jharenza
Copy link
Collaborator

jharenza commented Feb 5, 2020

@cansavvy and @jaclyn-taroni it looks like for some reason, all of the lancet tasks were run with the tumor/normal inputs swapped, and the correct IDs were used in VEP/VCF2MAF, and that is why the strange output. @migbro is queueing these up to rerun tonight and I will ask @tkoganti to pick up VCF2MAF/merge in the morning. The strelka2/mutect2 MAF runs were run correctly. Thank you again for finding this!

jharenza pushed a commit that referenced this issue Feb 5, 2020
readme update for upcoming lancet MAF per issue [here](#512)
@jashapiro jashapiro mentioned this issue Feb 6, 2020
5 tasks
jaclyn-taroni added a commit that referenced this issue Feb 7, 2020
* add v14 release docs

-update `release-notes.md`
-update `data-files-description.md`
-update `data-formats.md`

* Update download-data.sh

add new folder for V14 to download scipt

* remove intersect_cds_WXS.bed

per @cansavvy [comment](#432 (comment))

* add intersect_cds_lancet.bed

and description from @cansavvy [comments](#507 (comment))

* Update release-notes.md

- add removal of polyA+stranded samples that were still in file in v13

* Update data-formats.md

add more information on gistic output files, to replace PR [#456](#456)

* Reorganize derived CN section and make formatting consistent

* Add links to relevant subtyping modules

* Update release-notes.md

readme update for upcoming lancet MAF per issue [here](#512)

* Update doc/release-notes.md

yup, nice catch

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update release-notes.md

fix embryonal broad histology

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
@jharenza
Copy link
Collaborator

jharenza commented Feb 8, 2020

closed via #507

@jharenza jharenza closed this as completed Feb 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants