Proposed Analysis: PCAWG WGS Brain samples to run through SNV caller pipeline #551

cansavvy · 2020-02-20T21:31:57Z

What are the scientific goals of the analysis?

Following Grobner et al, 2019 we want to compare tumor mutation burden in our pediatric cohort with adult brain tumors.

This is a continuation of the goals of #257 and #481 that was originally to be used with TCGA data. However, upon running the TCGA data through the pipelines, we have encountered problems we believe may be due to its dated WXS target regions, or short reads, or shallower read depth. This data is documented in these two draft PRs: #548 and #521

Here's a summary report:
TCGAvsPBTAconsensus.pdf

What methods do you plan to use to accomplish the scientific goals?

After our video chat meeting, we discussed switching the comparison adult brain tumor data to the recently published PCAWG data.
This data has WGS samples, and is much more recent, which we hope will minimize the liftover and target region comparison issues we've been having between PBTA and TCGA data.

What input data are required for this analysis?

I'm posting this TSV file with the list of files that I believe we will want for this analysis:
pcawg_brain_wgs_samples.tsv.zip

I believe we would want the bam files listed in this file to be ran through Lancet, Strelka2, and Mutect2 in the same manner that the PBTA data was.

How I obtained this file list:

This data is on ICGC's repositories
I searched for all WGS, PCAWG study, brain samples that have BAM files for both blood and solid primary tumor

SQL Query to get this:

select(*),in(file.experimentalStrategy,'WGS'),in(file.fileFormat,'BAM'),in(file.primarySite,'Brain'),in(file.specimenType,'Primary tumour - solid tissue','Normal - blood derived'),in(file.donorStudy,'PCAWG'),in(file.id,'ES:fd7d16a5-c002-4b68-985c-b44b548a732e'),sort(-ssmAffectedGenes)

This link will also get you to this list: https://icgc.org/4ov
I exported this table as TSV and then removed the mini bam files.
These mini files appear to be file copies of the regular size bams.
I filtered those out with:

grep -vwE "mini" repository_1582233032.tsv > pcawg_brain_wgs_samples.tsv

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

Whoever is going to be running the samples through the caller should probably answer this question.

Who will complete the analysis (please add a GitHub handle here if relevant)?

??

What relevant scientific literature relates to this analysis?

Grobner et al, 2019
PCAWG 2020 paper

The text was updated successfully, but these errors were encountered:

cansavvy · 2020-02-20T21:52:12Z

@migbro @jharenza @tkoganti @yuankunzhu

jharenza · 2020-02-20T23:08:15Z

@yuankunzhu

yuankunzhu · 2020-02-23T23:21:50Z

@cansavvy @jharenza, the requested query contains data hosted on both EGA and PDC, for now, we only have access to PDC which is 110 BAMs from 60 donors (do we know why the total number is not 120 btw?). We can start downloading and looking at those first. And we probably need someone to submit the EGA request. Do we know what's those subjects age at diagnosis/sequencing, we might want to exclude their pediatric samples for the adult TMB calculation.

Also, on the other hand, we had previously downloaded and processed ~~100 WGS BAMs (50 T/N pairs)~~ 84 PDC hosted WGS BAMs (42 T/N pairs) from ICGC-PCWAG before with query of https://icgc.org/ZFF. There're 3 subjects overlapped between the requested list and ours. @tkoganti is looking at those data's VAF and SNV classes.

jharenza · 2020-02-24T15:28:14Z

@yuankunzhu do you have the breakdown of cancer types for the 110 BAMs from 60 donors?

@cansavvy @jaclyn-taroni @cgreene - I can make a request for this data, but I am currently held up with our contracts office in approving an ICGC DACO for another project and don't have a clear idea of how long this will take. Looks like no one at CHOP has ICGC access and the office told me they wanted to make some agreement modifications, so in the meantime, should we just plan to use Mutect2/Strelka2 for these comparisons, using the TCGA data we have access to, and/or add more samples from TCGA if we do not have a good cohort of brain from PCAWG?

yuankunzhu · 2020-02-24T20:09:17Z

@jharenza I can't find the detailed cancer types for those samples. the only thing i can find from the query are their originated projects. looks like they have TCGA-LGG and GBM there?

jharenza · 2020-03-04T01:12:18Z

As an update on this, I am still working with CHOP legal to get this access request documentation approved before I can go back to ICGC to submit the final application. I should know more Thursday.

@yuankunzhu - did you mention that we lost data access to these files?

yuankunzhu · 2020-03-04T20:55:47Z

@jharenza, we still have those data in the bucket, just need the DevOpt team to renew our s3 access credentials, so that we can access them on cavatica

jvlilly · 2020-03-05T21:29:33Z

@stefankies can you work with allison on this^^

yuankunzhu · 2020-03-06T14:39:08Z

@jharenza @cansavvy @jvlilly @tkoganti, quick update on this, we got the data bucket access renewed and mounted that to cavatica ready.

jharenza · 2020-06-12T15:26:40Z

@yuankunzhu - were you able to process any of this data? In the meantime, I CHOP legal was working on this agreement as of 5/19. Just sent a followup.

jharenza · 2020-07-07T22:10:28Z

As an update, CHOP has approved this agreement and it was sent to ICGC on July 3 for final approval. They will respond within 15 business days.

jharenza · 2021-04-01T18:34:47Z

closing, as we still have not gotten access to these data

cansavvy added proposed analysis snv Related to or requires SNV data labels Feb 20, 2020

jashapiro mentioned this issue Feb 21, 2020

TCGA vs PBTA exploratory analysis #548

Closed

5 tasks

jharenza mentioned this issue Feb 25, 2020

Updated analysis: PBTA vs TCGA TMB analysis #556

Closed

jaclyn-taroni mentioned this issue Mar 18, 2020

Mutational Signatures #636

Closed

jharenza closed this as completed Apr 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed Analysis: PCAWG WGS Brain samples to run through SNV caller pipeline #551

Proposed Analysis: PCAWG WGS Brain samples to run through SNV caller pipeline #551

cansavvy commented Feb 20, 2020 •

edited

Loading

cansavvy commented Feb 20, 2020 •

edited by jharenza

Loading

jharenza commented Feb 20, 2020

yuankunzhu commented Feb 23, 2020 •

edited

Loading

jharenza commented Feb 24, 2020

yuankunzhu commented Feb 24, 2020

jharenza commented Mar 4, 2020

yuankunzhu commented Mar 4, 2020 •

edited

Loading

jvlilly commented Mar 5, 2020

yuankunzhu commented Mar 6, 2020

jharenza commented Jun 12, 2020

jharenza commented Jul 7, 2020

jharenza commented Apr 1, 2021

Proposed Analysis: PCAWG WGS Brain samples to run through SNV caller pipeline #551

Proposed Analysis: PCAWG WGS Brain samples to run through SNV caller pipeline #551

Comments

cansavvy commented Feb 20, 2020 • edited Loading

What are the scientific goals of the analysis?

What methods do you plan to use to accomplish the scientific goals?

What input data are required for this analysis?

How I obtained this file list:

How long do you expect is needed to complete the analysis? Will it be a multi-step analysis?

Who will complete the analysis (please add a GitHub handle here if relevant)?

What relevant scientific literature relates to this analysis?

cansavvy commented Feb 20, 2020 • edited by jharenza Loading

jharenza commented Feb 20, 2020

yuankunzhu commented Feb 23, 2020 • edited Loading

jharenza commented Feb 24, 2020

yuankunzhu commented Feb 24, 2020

jharenza commented Mar 4, 2020

yuankunzhu commented Mar 4, 2020 • edited Loading

jvlilly commented Mar 5, 2020

yuankunzhu commented Mar 6, 2020

jharenza commented Jun 12, 2020

jharenza commented Jul 7, 2020

jharenza commented Apr 1, 2021

cansavvy commented Feb 20, 2020 •

edited

Loading

cansavvy commented Feb 20, 2020 •

edited by jharenza

Loading

yuankunzhu commented Feb 23, 2020 •

edited

Loading

yuankunzhu commented Mar 4, 2020 •

edited

Loading