-
Notifications
You must be signed in to change notification settings - Fork 83
Planned Data Release: V16 #601
Comments
Because we made changes to the |
For the v16 release, @baileyckelly and @chris-s-friedman are moving the histology file generation over to a new database workflow to make the manual changes more trackable. We are working on replicating v15 first, expected to be done by the end of this week and then will start working on the new issues noted in the ticket early next. Will update if this changes. |
We'll also do #624. |
@tkoganti gave me a heads up that the updated TCGA files are available pre-v16 if folks would like to take a look. Here's the link to the s3 bucket and folder And the file list:
Where the README includes an explanation of what each of these files are. |
[note-to-self] 1. sync data from previous releaselast_release='release-v15-20200228'
new_release='release-v16-20200320'
bucket='s3://kf-openaccess-us-east-1-prd-pbta/data'
aws s3 sync $bucket/$last_release/ $bucket/$new_release/ 2. new data2.1 TCGA MAFget overlapped file names between v15 and ## get all files from v15 and TCGA_mar-12-2020/
all_files=`aws s3 ls --recursive $bucket | awk '{print $4}' | egrep "$new_release|TCGA_mar-12-2020/"`
echo $all_files | xargs -i basename {} | sort | uniq -dc it turns 0 output, no overlaps? check all TCGA MAF files: echo $all_files | grep -i tcga | grep -i maf returns below
MAF files named differently in for caller in 'lancet' 'mutect2' 'strelka2'
do
aws s3 cp $bucket/TCGA_mar-12-2020/pbta-tcga-snv-$caller.maf.gz $bucket/$new_release/pbta-tcga-snv-$caller.vep.maf.gz
done 2.2 TCGA BEDecho $all_files | grep -i bed | xargs -i basename {} | sort | uniq -dc
## no overlaps, copy all bed to release folder
aws s3 sync $bucket/TCGA_mar-12-2020/ $bucket/$new_release --exclude "*" --include "*.bed"
## remove old tcga BED
aws s3 rm $bucket/$new_release/gencode.v19.basic.exome.hg38liftover.100bp_padded.bed 2.3 TCGA Manifest check
looks like it's missing BED information, add it up, modified add that as upload this to release folder and overwrite old aws s3 cp pbta-tcga-manifest.txt $bucket/$new_release/pbta-tcga-manifest.tsv
new fusion results$fusions_path='analyses/fusion_filtering/results'
aws s3 cp $fusions_path/pbta-fusion-putative-oncogenic.tsv $bucket/$new_release/
aws s3 cp $fusions_path/pbta-fusion-recurrently-fused-genes-byhistology.tsv $bucket/$new_release/
aws s3 cp $fusions_path/pbta-fusion-recurrently-fused-genes-bysample.tsv $bucket/$new_release/
## 3. update release doc
- [x] md5sum
- [x] release note
- [ ] download script |
@tkoganti for the new TCGA BED files you put on |
|
Just uploaded This file from March 12 is in the. s3 bucket - |
it looks like it's just uploaded? But anyway thanks for checking, i think we have all the BED files in place now |
Hmm It said March 12 when I first looked and I uploaded again. Not sure if it was overwritten. This is the capture kit info used for the new samples. But might be easier to generate again with all 319 samples. Let me know if you want me to create that - |
@yuankunzhu @jaclyn-taroni Adding files updated from fusion-filtering analysis https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion_filtering/results Changes #621 :
|
Hi there - We will be adding in the following ticket(s), though for the histologies file: |
Yes, sounds good. I will edit the original post to match this. |
D3b data team is still working on clinical data updating, we are suggesting move all changes related
cc @jaclyn-taroni and @baileyckelly |
Closed via #657 |
What data file(s) does this issue pertain to?
pbta-histologies.tsv
pbta-tcga-manifest.tsv
pbta-tcga-snv-lancet.vep.maf.gz
pbta-tcga-snv-mutect2.vep.maf.gz
pbta-tcga-snv-strelka2.vep.maf.gz
What release are you using?
V15
Put your question or report your issue here.
gencode.v19.basic.exome.hg38liftover.100bp_padded.bed
potentially Updated analysis: molecular subtyping for cell lines #509 and Updated analysis: chordoma subtyping #608 Proposed Analysis: Molecularly subtype ependymoma tumors #245(@jaclyn-taroni edit per Planned Data Release: V16 #601 (comment))Placeholder for V16 release to include D3b team: @baileyckelly @chris-s-friedman @yuankunzhu @allisonheath
The text was updated successfully, but these errors were encountered: