-
Notifications
You must be signed in to change notification settings - Fork 83
Update focal-cn-preparation to use consensus SEG file in data download #1130
Update focal-cn-preparation to use consensus SEG file in data download #1130
Conversation
# TODO: the consensus SEG file is not currently in the data download -- when it | ||
# gets included we will have to change the file path here | ||
consensus_seg_file <- file.path("..", "copy_number_consensus_call", "results", | ||
consensus_seg_file <- file.path("..", "..", "data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jharenza and @kgaonkar6 - is there any reason (that you are aware of) that prohibits me from making this change at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think so - the latest file is in the download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to wait until all the v20 CNV changes go through though? Or would just getting #1123 through -> updating the release (if not done yet) be sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated v20 release data with the latest consensus seg with #1123 b9284650be04df3538e6c6dba29b8eb0 pbta-cnv-consensus.seg.gz
I added the relative path so that we can run the code while running all the preprocessing steps of subtyping, so maybe we can add a logic to change to relative path or not within the if (params$base_run ==0)
OpenPBTA-analysis/analyses/focal-cn-file-preparation/02-add-ploidy-consensus.Rmd
Lines 48 to 58 in a6281c2
```{r} | |
# TODO: the consensus SEG file is not currently in the data download -- when it | |
# gets included we will have to change the file path here | |
consensus_seg_file <- file.path("..", "copy_number_consensus_call", "results", | |
"pbta-cnv-consensus.seg.gz") | |
if ( params$base_run ==0 ){ | |
histologies_file <- file.path("..", "..", "data", "pbta-histologies.tsv") | |
} else { | |
histologies_file <- file.path("..", "..", "data", "pbta-histologies-base.tsv") | |
} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which scenario would you use the file in analyses/copy_number_consensus_call/results
? When you use pbta-histologies-base.tsv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we will rerun the consensus seg file module first so the latest consensus seg file can be used as input for the focal-cn module
I'm not going to update the HTML for this notebook, because I expect that will happen as part of getting #1124 through anyway. |
It is curious, though, that the consensus CN steps happens right before this one and I would expect then that the file in |
Closing in favor of #1124 |
The
focal-cn-preparation
module was taking over an hour to run in CI. This is because it was using the consensus SEG file that is committed to the repository, which is not for a subset of samples but instead contains all samples. With the release of v20, there's no reason to continue to use the file inanalyses/copy_number_consensus_call/results
so here I am changing the path in the second notebook offocal-cn-preparation
to use the file in the data download. In the context of CI, this will be the subset file.Closes #1129.