Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Make sure HGG subtyping uses data download #1414

Conversation

jaclyn-taroni
Copy link
Member

Splitting up changes described in #1399 (comment).

Here, I'm changing the HGG subtyping module to make sure it uses analysis files data/ (i.e., in the download), rather than files from results. This also includes skipping steps that were used to make earlier determinations re: subtyping; these steps should not be rerun every time we want to make sure all subtyping is up-to-date.

Reviewers: Please review the molecular-subtyping-HGG module in master to ensure I didn't miss anything.

Copy link
Member

@sjspielman sjspielman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple places to check:

  1. (Note also Update molecular-subtyping-HGG module README #1429) The README contains this likely-now-deprecated bold alert:

    Note: The files in the hgg-subset directory were generated via 02-HGG-molecular-subtyping-subset-files.R using the the files in the version 17 data release. When re-running this module, you may want to regenerate the HGG subset files using the most recent data release.

  2. In this script, the comment on line 71 is probably old at this point:

    ## TODO: If annotated files get included in data download

  3. In this notebook, these lines may need to be updated to read from data/:

    ```{r}
    # read in full file
    putative_oncogenic_df <-
    read_tsv(file.path(root_dir,
    "analyses",
    "fusion_filtering",
    "results",
    "pbta-fusion-putative-oncogenic.tsv"))
    ```

#### HGAT with `BRAF V600E` mutations clustering ------------------------------
if [ "$SUBSETTING_ONLY" -eq "0" ]; then
# 1p/19q co-deleted oligodendrogliomas notebook
Rscript -e "rmarkdown::render('08-1p19q-codeleted-oligodendrogliomas.Rmd', clean = TRUE)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it seems to me that this module could in theory be used for subtyping? It's hard-coded to old releases with a big TODO note to update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlikely that it will get used for subtyping in this project – we didn't find any tumors that met the criteria (and we wouldn't necessarily expect to).


# Add TP53 annotation
Rscript -e "rmarkdown::render('10-HGG-TP53-annotation.Rmd',clean=TRUE)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to note that this isn't in the module README, opened an issue: #1429.

PT_30H1KV15 7316-1059 NA BS_C41DJZ1F HGG, To be classified NF-1 0.44278315076999947 0 0 0 0 NA NA NA NA 0 0 Other
PT_37B5JRP1 7316-2217 BS_EJV0N3BX BS_M0QYNVK8 HGG, H3 wildtype, TP53 loss None documented 0.8835514587511873 1 1 2 0 p.H193Y 1 INV, DEL NA 1 0 loss
PT_30H1KV15 7316-1059 NA BS_C41DJZ1F HGG, To be classified NF-1 0.44278315076999947 0 0 0 0 NA NA NA NA 0 0 other
PT_37B5JRP1 7316-2217 BS_EJV0N3BX BS_M0QYNVK8 HGG, H3 wildtype, TP53 loss None documented 0.8835514587511873 1 0 0 0 p.H193Y NA NA NA 1 0 loss
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting this line is the only real result diff in this file. All other diffs are either tolerance (the 10th digit after the decimal or so, which is fine) and "Other"-->"other". I'm assuming this has to with a v21 change, since it was last run with v20.

# for subtyping are run. By default (when this is set to 1), these notebooks
# will not be run, i.e., subtyping notebooks only will be run.
# It is not intended to be used in CI.
SUBSETTING_ONLY=${SUBSETTING_ONLY:-1}
Copy link
Member

@sjspielman sjspielman May 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this setting likely to be used? For example I would have expected to see it in here.

Edit: but the scope of that particular PR may not capture updating the subtyping steps, so much as what needs to be ready for the subtyping steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting would be used if someone was interested in running all notebooks in the module, and I expect it be used less often than when folks just run this for subtyping. So by default, there are two notebooks we're skipping when this is run for subtyping; you would not expect to see it in this script because we're using the default settings.

@sjspielman sjspielman self-requested a review May 23, 2022 13:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants