Make sure HGG subtyping uses data download #1414

jaclyn-taroni · 2022-05-19T14:48:33Z

Splitting up changes described in #1399 (comment).

Here, I'm changing the HGG subtyping module to make sure it uses analysis files data/ (i.e., in the download), rather than files from results. This also includes skipping steps that were used to make earlier determinations re: subtyping; these steps should not be rerun every time we want to make sure all subtyping is up-to-date.

Reviewers: Please review the molecular-subtyping-HGG module in master to ensure I didn't miss anything.

Rerun

sjspielman

A couple places to check:

(Note also Update molecular-subtyping-HGG module README #1429) The README contains this likely-now-deprecated bold alert:

Note: The files in the hgg-subset directory were generated via 02-HGG-molecular-subtyping-subset-files.R using the the files in the version 17 data release. When re-running this module, you may want to regenerate the HGG subset files using the most recent data release.
In this script, the comment on line 71 is probably old at this point:

OpenPBTA-analysis/analyses/molecular-subtyping-HGG/02-HGG-molecular-subtyping-subset-files.R

Line 71 in e72c11d

## TODO: If annotated files get included in data download

In this notebook, these lines may need to be updated to read from data/:

OpenPBTA-analysis/analyses/molecular-subtyping-HGG/07-HGG-molecular-subtyping-combine-table.Rmd

Lines 166 to 174 in e72c11d

    
           ```{r} 
        
           # read in full file 
        
           putative_oncogenic_df <- 
        
             read_tsv(file.path(root_dir, 
        
               "analyses", 
        
               "fusion_filtering", 
        
               "results", 
        
               "pbta-fusion-putative-oncogenic.tsv")) 
        
           ```

sjspielman · 2022-05-20T18:17:53Z

analyses/molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh

-#### HGAT with `BRAF V600E` mutations clustering ------------------------------
+if [ "$SUBSETTING_ONLY" -eq "0" ]; then
+  # 1p/19q co-deleted oligodendrogliomas notebook
+  Rscript -e "rmarkdown::render('08-1p19q-codeleted-oligodendrogliomas.Rmd', clean = TRUE)"


I guess it seems to me that this module could in theory be used for subtyping? It's hard-coded to old releases with a big TODO note to update.

Unlikely that it will get used for subtyping in this project – we didn't find any tumors that met the criteria (and we wouldn't necessarily expect to).

sjspielman · 2022-05-20T18:20:15Z

analyses/molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh


-# Add TP53 annotation
 Rscript -e "rmarkdown::render('10-HGG-TP53-annotation.Rmd',clean=TRUE)"


Just want to note that this isn't in the module README, opened an issue: #1429.

sjspielman · 2022-05-20T18:24:28Z

analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv

-PT_30H1KV15	7316-1059	NA	BS_C41DJZ1F	HGG, To be classified	NF-1	0.44278315076999947	0	0	0	0	NA	NA	NA	NA	0	0	Other
-PT_37B5JRP1	7316-2217	BS_EJV0N3BX	BS_M0QYNVK8	HGG, H3 wildtype, TP53 loss	None documented	0.8835514587511873	1	1	2	0	p.H193Y	1	INV, DEL	NA	1	0	loss
+PT_30H1KV15	7316-1059	NA	BS_C41DJZ1F	HGG, To be classified	NF-1	0.44278315076999947	0	0	0	0	NA	NA	NA	NA	0	0	other
+PT_37B5JRP1	7316-2217	BS_EJV0N3BX	BS_M0QYNVK8	HGG, H3 wildtype, TP53 loss	None documented	0.8835514587511873	1	0	0	0	p.H193Y	NA	NA	NA	1	0	loss


Noting this line is the only real result diff in this file. All other diffs are either tolerance (the 10th digit after the decimal or so, which is fine) and "Other"-->"other". I'm assuming this has to with a v21 change, since it was last run with v20.

sjspielman · 2022-05-20T18:42:25Z

analyses/molecular-subtyping-HGG/run-molecular-subtyping-HGG.sh

+# for subtyping are run. By default (when this is set to 1), these notebooks
+# will not be run, i.e., subtyping notebooks only will be run.
+# It is not intended to be used in CI.
+SUBSETTING_ONLY=${SUBSETTING_ONLY:-1}


Where is this setting likely to be used? For example I would have expected to see it in here.

Edit: but the scope of that particular PR may not capture updating the subtyping steps, so much as what needs to be ready for the subtyping steps.

This setting would be used if someone was interested in running all notebooks in the module, and I expect it be used less often than when folks just run this for subtyping. So by default, there are two notebooks we're skipping when this is run for subtyping; you would not expect to see it in this script because we're using the default settings.

jaclyn-taroni added 8 commits March 24, 2022 12:19

Merge remote-tracking branch 'upstream/master'

3b532e3

Merge remote-tracking branch 'upstream/master'

348b277

Merge remote-tracking branch 'upstream/master'

3decf7c

Merge remote-tracking branch 'upstream/master'

86aabd0

Merge remote-tracking branch 'origin/master'

0c89713

Merge remote-tracking branch 'upstream/master'

bd56016

Merge remote-tracking branch 'upstream/master'

14a329f

Make sure HGG uses data/

fca33e2

Rerun

jaclyn-taroni requested a review from sjspielman May 19, 2022 14:48

This was referenced May 19, 2022

Update logic in modules where analysis files included in the data releases are generated #1419

Merged

WIP: Splitting analysis file generation and subtyping up #1405

Closed

jaclyn-taroni added review before release blocking release labels May 19, 2022

jaclyn-taroni mentioned this pull request May 19, 2022

Plan: Road to v22 #1426

Closed

12 tasks

sjspielman reviewed May 20, 2022

View reviewed changes

jaclyn-taroni added 2 commits May 21, 2022 15:29

Respond to comments

f4c3231

Rerun HGG module with latest changes

8212476

sjspielman self-requested a review May 23, 2022 13:31

sjspielman approved these changes May 23, 2022

View reviewed changes

jaclyn-taroni removed the review before release label May 24, 2022

Merge branch 'master' into jaclyn-taroni/1399-hgg-uses-data

a228184

jaclyn-taroni added the merge next label May 24, 2022

jaclyn-taroni merged commit 7e71153 into AlexsLemonade:master May 24, 2022

jaclyn-taroni removed the merge next label May 24, 2022

jharenza mentioned this pull request Jun 17, 2022

Update molecular-subtyping-HGG module README #1429

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure HGG subtyping uses data download #1414

Make sure HGG subtyping uses data download #1414

jaclyn-taroni commented May 19, 2022

sjspielman left a comment

sjspielman May 20, 2022

jaclyn-taroni May 21, 2022

sjspielman May 20, 2022

sjspielman May 20, 2022

sjspielman May 20, 2022 •

edited

Loading

jaclyn-taroni May 21, 2022

	```{r}
	# read in full file
	putative_oncogenic_df <-
	read_tsv(file.path(root_dir,
	"analyses",
	"fusion_filtering",
	"results",
	"pbta-fusion-putative-oncogenic.tsv"))
	```


		# Add TP53 annotation
		Rscript -e "rmarkdown::render('10-HGG-TP53-annotation.Rmd',clean=TRUE)"

Make sure HGG subtyping uses data download #1414

Make sure HGG subtyping uses data download #1414

Conversation

jaclyn-taroni commented May 19, 2022

sjspielman left a comment

Choose a reason for hiding this comment

sjspielman May 20, 2022

Choose a reason for hiding this comment

jaclyn-taroni May 21, 2022

Choose a reason for hiding this comment

sjspielman May 20, 2022

Choose a reason for hiding this comment

sjspielman May 20, 2022

Choose a reason for hiding this comment

sjspielman May 20, 2022 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni May 21, 2022

Choose a reason for hiding this comment

sjspielman May 20, 2022 •

edited

Loading