This document contains information about all data files associated with this project. Each file will have the following association information:
- File type will be one of:
- Reference file: Obtained from an external source/database. When known, the obtained data and a link to the external source is included.
- Modified reference file: Obtained from an external source/database but modified for OpenPBTA use.
- Processed data file: Data that are processed upstream of the analysis project, e.g., the output of a somatic single nucleotide variant method. Links to the relevant D3B Center or Kids First workflow (and version where applicable) are included in Origin.
- Analysis file: Any file created by a script in
analyses/*
.
- Origin
- For Processed data files, a link the relevant D3B Center or Kids First workflow (and version where applicable).
- When applicable, a link to the specific script that produced (or modified, for Modified reference file types) the data.
- File description
- A brief one sentence description of what the file contains (e.g., bed files contain coordinates for features XYZ).
File name | File Type | Origin | File Description |
---|---|---|---|
histologies-base.tsv |
Data file | Cohort-specific data files and databases | Clinical and sequencing metadata for each biospecimen |
histologies.tsv |
Modified data file | molecular-subtyping-integrate |
histologies-base.tsv plus molecular_subtype , cancer_group , integrated_diagnosis , and harmonized_diagnosis |
intersect_cds_lancet_strelka_mutect_WGS.bed |
Analysis file | snv-callers |
Intersection of gencode.v27.primary_assembly.annotation.gtf.gz CDS with Lancet, Strelka2, Mutect2 regions |
intersect_strelka_mutect_WGS.bed |
Analysis file | snv-callers |
Intersection of gencode.v27.primary_assembly.annotation.gtf.gz CDS with Strelka2 and Mutect2 regions called |
efo-mondo-map.tsv |
Reference mapping file | Manual collation | Mapping of EFO and MONDO codes to cancer groups |
efo-mondo-map-prefill.tsv |
Modified reference mapping file | Analysis file generated in molecular-subtyping-integrate |
Mapping of EFO and MONDO codes to cancer groups |
ensg-hugo-pmtl-mapping.tsv |
Reference mapping file | Manual curation of PMTLv1.1 by FNL; RNA-Seq pipeline GTF mapping | File which maps Hugo Symbols to ENSEMBL gene IDs an each ENSG to the RMTL curated by FNL |
*.bed |
Reference file | Manual collation | Bed files used for variant calling and are used for tmb calculation |
uberon-map-gtex-group.tsv |
Reference mapping file | Manual collation | Mapping of UBERON codes to tissue types in GTEx broad groups |
uberon-map-gtex-subgroup.tsv |
Reference mapping file | Manual collation | Mapping of UBERON codes to tissue types in GTEx subgroups |
methyl-beta-values.rds |
Processed data file | methylation beta valeues | Methylation beta values |
methyl-m-values.rds |
Processed data file | methylation m valeues | Methylation m values |
rna-isoform-expression-rsem-tpm.rds |
Processed data file | RNA isoform TPM files | RNA isoform TPM files |
snv-dgd.maf.tsv.gz |
Processed data file | DGD merged SNV MAF results | DGD merged SNV MAF results |
fusion-dgd.tsv |
Processed data file | DGD merged fusion results | DGD merged fusion results |
fusion-arriba.tsv.gz |
Processed data file | Gene fusion detection; Workflow | Fusion - Arriba TSV, annotated with FusionAnnotator |
fusion-starfusion.tsv.gz |
Processed data file | Gene fusion detection; Workflow | Fusion - STARFusion TSV |
fusion_summary_embryonal_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of embryonal tumor fusions of interest |
fusion_summary_ependymoma_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of ependymal tumor fusions of interest |
fusion_summary_ewings_foi.tsv |
Analysis file | fusion-summary |
Summary file for presence of Ewing's sarcoma fusions of interest |
fusion_summary_ewings_lgat.tsv |
Analysis file | fusion-summary |
Summary file for presence of LGAT fusions of interest |
fusion-putative-oncogenic.tsv |
Analysis file | fusion_filtering |
Filtered and prioritized fusions |
gene-counts-rsem-expected_count-collapsed.rds |
Analysis file | PBTA+GMKF+TARGET+GTEx collapse-rnaseq ;GTEx v8 release |
Gene expression - RSEM expected_count for each samples collapsed to gene symbol (gene-level) |
gene-expression-rsem-tpm-collapsed.rds |
Analysis file | PBTA+GMKF+TARGET+GTEx collapse-rnaseq ;GTEx v8 release |
Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |
tcga-gene-counts-rsem-expected_count-collapsed.rds |
Modified reference file | TCGA samples - manually curated to include 10414 TCGA RNA samples that are in diseaseXpress and has GDC clinical information | Gene expression - RSEM expected_count for each samples collapsed to gene symbol (gene-level) |
tcga-gene-expression-rsem-tpm-collapsed.rds |
Modified reference file | TCGA samples - manually curated to include 10414 TCGA RNA samples that are in diseaseXpress and has GDC clinical information | Gene expression - RSEM TPM for each samples collapsed to gene symbol (gene-level) |
WGS.hg38.lancet.300bp_padded.bed |
Reference Target/Baits File | SNV and INDEL calling | WGS.hg38.lancet.unpadded.bed file with each region padded by 300 bp |
WGS.hg38.lancet.unpadded.bed |
Reference Regions File | SNV and INDEL calling | hg38 WGS regions created using UTR, exome, and start/stop codon features of the GENCODE 31 reference, augmented with PASS variant calls from Strelka2 and Mutect2 |
WGS.hg38.mutect2.vardict.unpadded.bed |
Reference Regions File | SNV and INDEL calling | hg38 BROAD Institute interval calling list (restricted to Chr1-22,X,Y,M and non-N regions) used for Mutect2 and VarDict variant callers |
WGS.hg38.strelka2.unpadded.bed |
Reference Regions File | SNV and INDEL calling | hg38 BROAD Institute interval calling list (restricted to Chr1-22,X,Y,M) used for Strelka2 variant caller |
WGS.hg38.vardict.100bp_padded.bed |
Reference Regions File | SNV and INDEL calling | WGS.hg38.mutect2.vardict.unpadded.bed with each region padded by 100 bp used for VarDict variant caller |
snv-consensus-plus-hotspots.maf.tsv.gz |
Processed data file | copy_number_consensus_call |
Consensus (2 of 4) maf for PBTA + GMKF + TARGET |
cnv-cnvkit.seg.gz |
Processed data file | Copy number variant calling; Workflow | Somatic Copy Number Variant - CNVkit SEG file |
cnv-consensus.seg.gz |
Analysis file | [copy_number_consensus_call ]](https://github.com/PediatricOpenTargets/OpenPedCan-analysis/tree/dev/analyses/copy_number_consensus_call) |
Somatic Copy Number Variant - WGS samples only |
cnvkit_with_status.tsv consensus_seg_with_status.tsv |
Analysis files | copy_number_consensus_call |
CNVkit calls for WXS or CNV consensus calls for WGS with gain/loss status |
cnv-consensus-gistic.gz |
Analysis file | run-gistic |
GISTIC results - WGS samples only |
cnv-controlfreec.tsv.gz |
Processed data file | Copy number variant calling; Workflow | Somatic Copy Number Variant - TSV file that is a merge of ControlFreeC *_CNVs files |
consensus_wgs_plus_cnvkit_wxs_autosomes.tsv.gz |
Analysis file | focal-cn-file-preparation |
TSV file containing genes with copy number changes per biospecimen; autosomes only |
consensus_wgs_plus_cnvkit_wxs_x_and_y.tsv.gz |
Analysis file | focal-cn-file-preparation |
TSV file containing genes with copy number changes per biospecimen; sex chromosomes only |
consensus_wgs_plus_cnvkit_wxs.tsv.gz |
Analysis file | focal-cn-file-preparation |
TSV file containing genes with copy number changes per biospecimen; both autosomes and sex chromosomes |
snv-mutation-tmb-all.tsv |
Analysis file | tmb-calculation |
TSV file with sample names and their tumor mutation burden counting all variants |
snv-mutation-tmb-coding.tsv |
Analysis file | tmb-calculation |
TSV file with sample names and their tumor mutation burden counting all variants in coding region only |
sv-manta.tsv.gz |
Processed data file | Structural variant calling; Workflow | Somatic Structural Variant - Manta output, annotated with AnnotSV (WGS samples only) |
independent-specimens.methyl.primary-plus.tsv |
|||
independent-specimens.methyl.primary.tsv |
|||
independent-specimens.methyl.relapse.tsv |
|||
independent-specimens.rnaseq.primary.eachcohort.tsv |
|||
independent-specimens.rnaseq.primary.tsv |
|||
independent-specimens.rnaseq.relapse-pre-release.tsv |
|||
independent-specimens.rnaseq.relapse.eachcohort.tsv |
|||
independent-specimens.rnaseq.relapse.tsv |
|||
independent-specimens.rnaseqpanel.primary-plus.eachcohort.tsv |
|||
independent-specimens.rnaseqpanel.primary-plus.pre-release.tsv |
|||
independent-specimens.rnaseqpanel.primary-plus.tsv |
|||
independent-specimens.rnaseqpanel.primary.eachcohort.tsv |
|||
independent-specimens.rnaseqpanel.primary.tsv |
|||
independent-specimens.rnaseqpanel.relapse.eachcohort.tsv |
|||
independent-specimens.rnaseqpanel.relapse.tsv |
|||
independent-specimens.wgs.primary-plus.eachcohort.tsv |
|||
independent-specimens.wgs.primary-plus.tsv |
|||
independent-specimens.wgs.primary.eachcohort.tsv |
|||
independent-specimens.wgs.primary.tsv |
|||
independent-specimens.wgs.relapse.eachcohort.tsv |
|||
independent-specimens.wgs.relapse.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.eachcohort.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.eachcohort.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary-plus.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary.eachcohort.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary.eachcohort.tsv |
|||
independent-specimens.wgswxspanel.primary.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.primary.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.primary.tsv |
|||
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.relapse.eachcohort.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.relapse.eachcohort.tsv |
|||
independent-specimens.wgswxspanel.relapse.prefer.wgs.tsv |
|||
independent-specimens.wgswxspanel.relapse.prefer.wxs.tsv |
|||
independent-specimens.wgswxspanel.relapse.tsv |
Analysis files | independent-samples |
Independent (non-redundant) sample list of DNA, RNA, or methylation samples of all sequencing methods, from primary, primary-plus, or relapse tumors within each or across all cohorts |
independent-specimens.rnaseq.primary-plus-pre-release.tsv |
|||
independent-specimens.rnaseq.primary-pre-release.tsv |
|||
independent-specimens.rnaseqpanel.primary.pre-release.tsv |
|||
independent-specimens.rnaseqpanel.relapse.pre-release.tsv |
Analysis files | independent-samples |
Independent (non-redundant) sample list of RNA samples of all sequencing methods, from primary, primary-plus, or relapse tumors across all cohorts for the purposes of running fusion_filtering pre-release |