-
Notifications
You must be signed in to change notification settings - Fork 83
Make sure HGG subtyping uses data download #1414
Make sure HGG subtyping uses data download #1414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple places to check:
-
(Note also Update molecular-subtyping-HGG module README #1429) The README contains this likely-now-deprecated bold alert:
Note: The files in the hgg-subset directory were generated via 02-HGG-molecular-subtyping-subset-files.R using the the files in the version 17 data release. When re-running this module, you may want to regenerate the HGG subset files using the most recent data release.
-
In this script, the comment on line 71 is probably old at this point:
OpenPBTA-analysis/analyses/molecular-subtyping-HGG/02-HGG-molecular-subtyping-subset-files.R
Line 71 in e72c11d
## TODO: If annotated files get included in data download -
In this notebook, these lines may need to be updated to read from
data/
:
OpenPBTA-analysis/analyses/molecular-subtyping-HGG/07-HGG-molecular-subtyping-combine-table.Rmd
Lines 166 to 174 in e72c11d
```{r} # read in full file putative_oncogenic_df <- read_tsv(file.path(root_dir, "analyses", "fusion_filtering", "results", "pbta-fusion-putative-oncogenic.tsv")) ```
#### HGAT with `BRAF V600E` mutations clustering ------------------------------ | ||
if [ "$SUBSETTING_ONLY" -eq "0" ]; then | ||
# 1p/19q co-deleted oligodendrogliomas notebook | ||
Rscript -e "rmarkdown::render('08-1p19q-codeleted-oligodendrogliomas.Rmd', clean = TRUE)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it seems to me that this module could in theory be used for subtyping? It's hard-coded to old releases with a big TODO note to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely that it will get used for subtyping in this project – we didn't find any tumors that met the criteria (and we wouldn't necessarily expect to).
|
||
# Add TP53 annotation | ||
Rscript -e "rmarkdown::render('10-HGG-TP53-annotation.Rmd',clean=TRUE)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to note that this isn't in the module README, opened an issue: #1429.
PT_30H1KV15 7316-1059 NA BS_C41DJZ1F HGG, To be classified NF-1 0.44278315076999947 0 0 0 0 NA NA NA NA 0 0 Other | ||
PT_37B5JRP1 7316-2217 BS_EJV0N3BX BS_M0QYNVK8 HGG, H3 wildtype, TP53 loss None documented 0.8835514587511873 1 1 2 0 p.H193Y 1 INV, DEL NA 1 0 loss | ||
PT_30H1KV15 7316-1059 NA BS_C41DJZ1F HGG, To be classified NF-1 0.44278315076999947 0 0 0 0 NA NA NA NA 0 0 other | ||
PT_37B5JRP1 7316-2217 BS_EJV0N3BX BS_M0QYNVK8 HGG, H3 wildtype, TP53 loss None documented 0.8835514587511873 1 0 0 0 p.H193Y NA NA NA 1 0 loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting this line is the only real result diff in this file. All other diffs are either tolerance (the 10th digit after the decimal or so, which is fine) and "Other"-->"other". I'm assuming this has to with a v21 change, since it was last run with v20.
# for subtyping are run. By default (when this is set to 1), these notebooks | ||
# will not be run, i.e., subtyping notebooks only will be run. | ||
# It is not intended to be used in CI. | ||
SUBSETTING_ONLY=${SUBSETTING_ONLY:-1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this setting likely to be used? For example I would have expected to see it in here.
Edit: but the scope of that particular PR may not capture updating the subtyping steps, so much as what needs to be ready for the subtyping steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting would be used if someone was interested in running all notebooks in the module, and I expect it be used less often than when folks just run this for subtyping. So by default, there are two notebooks we're skipping when this is run for subtyping; you would not expect to see it in this script because we're using the default settings.
Splitting up changes described in #1399 (comment).
Here, I'm changing the HGG subtyping module to make sure it uses analysis files
data/
(i.e., in the download), rather than files from results. This also includes skipping steps that were used to make earlier determinations re: subtyping; these steps should not be rerun every time we want to make sure all subtyping is up-to-date.Reviewers: Please review the
molecular-subtyping-HGG
module inmaster
to ensure I didn't miss anything.