Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Planned data release: V14 #432

Closed
2 of 3 tasks
jharenza opened this issue Jan 13, 2020 · 32 comments
Closed
2 of 3 tasks

Planned data release: V14 #432

jharenza opened this issue Jan 13, 2020 · 32 comments

Comments

@jharenza
Copy link
Collaborator

jharenza commented Jan 13, 2020

What data file(s) does this issue pertain to?

additional files to be generated

What release are you using?

v13

Put a link to the relevant analysis here.

#431

Put your question or report your issue here.

@jaclyn-taroni
Copy link
Member

Consensus SEG file generated as part of #441 - analyses/copy_number_consensus_call/results/pbta-cnv-consensus.seg.

@cansavvy
Copy link
Collaborator

The updated Lancet WXS BED file downstream changes have been propagated and now there are updated files on S3 that should be included in the v14 release:

@jaclyn-taroni
Copy link
Member

New fusion summary files as of #478: https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/fusion-summary/results

@jharenza
Copy link
Collaborator Author

The updated Lancet WXS BED file downstream changes have been propagated and now there are updated files on S3 that should be included in the v14 release:

@cansavvy I can download the zip file, but not the first two - getting access denied.

@jharenza
Copy link
Collaborator Author

jharenza commented Jan 28, 2020

Note to self for which have been uploaded to CAVATICA:

@cansavvy
Copy link
Collaborator

Sorry about that @jharenza, apparently after I updated them I didn't realize they wouldn't be public access unless I set it that way. I've set them both to public. You should be good.

@jharenza
Copy link
Collaborator Author

all good, thanks @cansavvy !

@jashapiro
Copy link
Member

I'm going to suggest that cnv-consensus.tsv be removed from the data download. It is present in copy_number_consensus_call/results and I don't expect much use of it, now that pbta-cnv-consensus.seg.gz is available. (I also suggest using the .gz file for the download as it is much smaller!)

@jharenza
Copy link
Collaborator Author

Not going to include germline data in this release. See #451 #431 .

@jharenza
Copy link
Collaborator Author

@jashapiro - are we using cnv-consensus.tsv for any current analyses? Eg - gain/loss info, or is that all being derived from GISTIC at this point?

@jaclyn-taroni
Copy link
Member

@jharenza there's a module for mapping status to gene symbols, cytobands, etc. focal-cn-file-preparation. It is being updated to use the consensus SEG file in #479.

@jashapiro
Copy link
Member

@jharenza before finalizing gistic-results-consensus.zip can you check that the base folder enclosed has the same name? The current pbta-cnv-cnvkit-gistic.zip has a base folder named 2019-12-10-gistic-results-cnvkit which is confusing and seems like a recipe for broken code on updates.

@jharenza
Copy link
Collaborator Author

@jashapiro yep! i attached the files here with dates to keep track of a version since we were running them a few times, but was planning to name them similar to the current naming scheme:
pbta-cnv-cnvkit-gistic.zip
pbta-cnv-consensus-gistic.zip
Does that work?

@jashapiro
Copy link
Member

Yes, that is fine. It was just surprising to me that when the unzipped the folder name was not the same as the zip file.

@jharenza
Copy link
Collaborator Author

Ahh, I see what you're saying - yep, will fix both of those to be generic without dates.

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 3, 2020

Hi @cansavvy and @jaclyn-taroni - these files have changed names since the last release - do you want me to change to what is current or keep named how we had them?
New file and name for intersect_exon_lancet_strelka_mutect_WGS.bed -> intersect_cds_lancet_strelka_mutect_WGS.bed

No new file, keep old name? intersect_exon_WXS.bed
No new file, keep old name? intersect_strelka_mutect_WGS.bed
New file: intersect_cds_lancet_WXS.bed
New file and name for pbta-snv-consensus-mutation-tmb-coding.tsv -> pbta-snv-consensus_snv_tmb_coding_only.tsv

Is this correct?

@cansavvy
Copy link
Collaborator

cansavvy commented Feb 3, 2020

Yes, this is correct. I will change the code to get rid of the underscores for future releases. intersect_exon_WXS.bed although it didn't change, I would still change the filename to better reflect its contents. (I mistakenly called it exon in its filename originally, but it really is based on coding sequences.)

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 3, 2020

So for that one, should it be intersect_exon_WXS.bed -> intersect_cds_WXS.bed?

@jaclyn-taroni
Copy link
Member

Can we change the new file pbta-snv-consensus_snv_tmb_coding_only.tsv to use underscores pbta-snv-consensus-mutation-tmb-coding.tsv so we don't have breaking changes please?

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 3, 2020

@jaclyn-taroni do you mean hyphens or use all underscores? Current filename is the mixed one. Old was only hyphens. Please send the exact name you want :)

@jaclyn-taroni
Copy link
Member

Sorry that should have been to not use underscores. The file name should be pbta-snv-consensus-mutation-tmb-coding.tsv.

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 3, 2020

OK, sure, and get rid of the only to keep it as it was?

@jaclyn-taroni
Copy link
Member

Correct, I want to keep it as it was to minimize breaking changes.

@jharenza jharenza mentioned this issue Feb 3, 2020
4 tasks
@jharenza
Copy link
Collaborator Author

jharenza commented Feb 3, 2020

Also, updating pbta-histologies.tsv to include new molecular_subtype for #249 and #251 tumors.

@cansavvy
Copy link
Collaborator

cansavvy commented Feb 4, 2020

@jharenza

So for that one, should it be intersect_exon_WXS.bed -> intersect_cds_WXS.bed?

Sorry, I misunderstood/lost track of this. This file is no longer necessary, the only WXS bed should be intersect_cds_lancet_WXS.bed

jharenza pushed a commit that referenced this issue Feb 4, 2020
@jharenza
Copy link
Collaborator Author

jharenza commented Feb 4, 2020

got it, thanks!

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 5, 2020

TCGA lancet file, release notes, and md5 are updated on s3 now.

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 6, 2020

I also had to update pbta-histologies.tsv - found an inconsistency when I created one sample's new broad histology. Bo will update in the AM.

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 6, 2020

@cansavvy and @jaclyn-taroni - this is all ready now and downloads pass. sorry about that!

@jaclyn-taroni
Copy link
Member

@jharenza can you post the expected checksum so I can make sure I have what we want to include in testing please?

@jharenza
Copy link
Collaborator Author

jharenza commented Feb 6, 2020

md5sum.txt

jaclyn-taroni added a commit that referenced this issue Feb 7, 2020
* add v14 release docs

-update `release-notes.md`
-update `data-files-description.md`
-update `data-formats.md`

* Update download-data.sh

add new folder for V14 to download scipt

* remove intersect_cds_WXS.bed

per @cansavvy [comment](#432 (comment))

* add intersect_cds_lancet.bed

and description from @cansavvy [comments](#507 (comment))

* Update release-notes.md

- add removal of polyA+stranded samples that were still in file in v13

* Update data-formats.md

add more information on gistic output files, to replace PR [#456](#456)

* Reorganize derived CN section and make formatting consistent

* Add links to relevant subtyping modules

* Update release-notes.md

readme update for upcoming lancet MAF per issue [here](#512)

* Update doc/release-notes.md

yup, nice catch

Co-Authored-By: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update release-notes.md

fix embryonal broad histology

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
@jharenza
Copy link
Collaborator Author

jharenza commented Feb 8, 2020

closed via #507

@jharenza jharenza closed this as completed Feb 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants