PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

kgaonkar6 · 2020-12-11T15:26:18Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Base histology needs to be updated with results from molecular-subtyping-pathology. In this module I'm adding molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes.

What was your approach?

Read pbta-histologies-base.tsv and add molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes. Check for changes, if there are updates we need to re-run summary modules/plots which use this information. Check for duplicates if any, and fix.

What GitHub issue does your pull request address?

#858

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

table

What is your summary of the results?

Final histology file for v18 release

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

jharenza

@kgaonkar6 - This looks good and I fixed a few minor typos.

I want to get a weigh-in from @jaclyn-taroni on finalizing the pbta-histologies.tsv file. Up until now, we left anything not subtyped as NA for integrated_diagnosis because technically, the integrated diagnosis is an integration of the pathology report with molecular data and when clinicians use this data, they will think that the integrated diagnosis considered molecular data. I had some discussions with Cassie Kline (neuro-oncology physician at CHOP) and we had two thoughts/options on this.

For those not subtyped, we can carry over diagnoses information from pathology_diagnosis +/- pathology_free_text_diagnosis into the integrated_diagnosis column so there is complete data for the tumors.
We add to the integrated_diagnosis column only diagnoses derived from molecular data and create another column, perhaps harmonized_diagnosis, which contains either an integrated_diagnosis or a pathology_diagnosis.

I think I favor 2 because it is more "correct".

This being said, one subtyping modules does not use molecular data: neurocytoma.

Thoughts?

analyses/molecular-subtyping-integrate/README.md

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

update text comments

update text

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

…we've established this elsewhere

cansavvy

I think this all seems reasonable. I still don't understand a lot of the background behind the histology labels but it seems like this PR is doing what it set out to do. Some of the wording things were a tad unclear to me so I just tried to rephrase (if I interpreted the explanations incorrectly, let me know).

One important question: I think we'd still want this to be added to CI testing?

cansavvy · 2021-01-07T22:19:55Z

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

+
+### Read molecular-subtyping-pathology results
+
+Reading molecular_subtype,	integrated_diagnosis, short_histology,	broad_histology and	Notes from `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv` 


This spacing is odd. Is there a reason behind this odd spacing?

If you want to do a quick thing to try to get rid of some of the weird spacing in this notebook, you can run styler::style_file("analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd") and see if styler can fix some of these things for you. Beyond that, I don't know that we need to fuss with it.
...
I just checked, and unfortunately we do not have the styler package in the docker container. So don't worry about this, but it could be handy for you elsewhere.

hmm not sure where these spaces came from , checking

the spaces looked fine on localhost, I edited it and looks ok here as well. Let me know if I missed any weird spaces

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

cansavvy · 2021-01-07T22:44:06Z

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

+The purpose of this notebook is to integrate molecular subtyping results from 
+[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`.
+
+Here we are using v18 `pbta-histologies-base.tsv`


Just to clarify, (and sorry if this is written elsewhere, but it may be good to add it here too) is the purpose of pbta-histologies-base.tsv that you write the subtyping results there and that in the next version of the release, the results in pbta-histologies-base.tsv are added to pbta-histologies.tsv?

Aka it seems like pbta-histologies-base.tsv is like the working draft?

Right, we can call pbta-histologies-base.tsv a working draft.
The PBTA Histologies 1-7 PR uses the base histology file which have all integrated_diagnosis,Notes and molecular_subtype as NA through all the molecular-subtyping-XYZ . Through this process we gradually create the pbta-histologies.tsv for the same release.

Thus in each release there will be a pbta-histologies-base.tsv and eventually apbta-histologies.tsv.

You might have noticed that the modules (other than subtyping) were all using RNA data.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/760f7a42d253edcaf3341ac0873634e55590e59e/scripts/run-for-subtyping.sh#L13-L32

We only ran these modules because along with the pbta-histologies-base.tsv we also needed to update these files with the new RNA samples to use in our molecular subtyping modules.

Can you put some form of your description here in the beginning of this notebook?

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

Co-authored-by: Candace Savonen <cansav09@gmail.com>

…penPBTA-analysis into v18_int_dx_histology

cansavvy

I think this is very close. 👍 I just one description add request (see comment) and my outstanding question about this being added to circle CI testing.

cansavvy · 2021-01-08T12:10:51Z

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

+The purpose of this notebook is to integrate molecular subtyping results from 
+[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`.
+
+Here we are using v18 `pbta-histologies-base.tsv`


Can you put some form of your description here in the beginning of this notebook?

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

kgaonkar6 · 2021-01-08T15:50:27Z

Thanks @cansavvy! I added the module to CI and also added code to read pbta-histologies-base.tsv from data directly when running on testing.

jashapiro · 2021-01-08T16:05:31Z

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

+if (running_in_ci) {
+  data_dir <- "../../data/"
+} else {
+data_dir <- "../../data/release-v18-20201123/"
+}


This conditional code should not be needed. The way the system is set up, all files should be linked to data/ with no subfolder. In CI, this will be the testing files, but in regular use (after running download-data.sh) it will be the release version. Unless we want to make this file only work in v18 (I don't think we do) we should not be referring to the subfolder.

Suggested change

if (running_in_ci) {

data_dir <- "../../data/"

} else {

data_dir <- "../../data/release-v18-20201123/"

}

data_dir <- "../../data/"

hmm - @jashapiro I think we did this because it was being run before CI files were generated. However, also thinking about the process, it is a bit circular. This needs to be run for pbta-histologies.tsv to be generated and added to the new release, before the CI files are generated. Thoughts on that?

The biggest concern I have is that somebody might rerun this script when generating v19+ and forget to update this line. I don't know if this is pervasive through the subtyping files.

(Of course, with the linking approach it is possible to have the wrong set of files linked, and get a hidden error that way, so neither one is great.)

@kgaonkar6 @jashapiro what if we add this as a requirement in the bash script in which the user has to put a release version?

I just looked at some of the other subtyping analyses, and we don't do that for those. I think for consistency with the other analyses, it should simply use data/ and expect that the correct files have been placed or linked there.

ok sounds good, I'll update the code to always use "../../data" folder

cansavvy

I think this looks fine to me now. Just noticed some typos so if we get those fixed I think this can be merged.

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

cansavvy · 2021-01-08T18:27:59Z

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

+
+we gathere and update molecular-subtype AND integrated_diagnosis AND broad_histology AND short_histology for these histologies.
+
+In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release. 


Suggested change

In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release.

In this notebook, we will add the molecular subtyping updated in the `pbta-histologies-base.tsv` file to update the `pbta-histologies.tsv` file.

subtyping updated in the pbta-histologies-base.tsv

This is not entirely true, pbta-histologies-base.tsv file is used but is not updated through the subtyping modules. The subtyping file is saved in
compiled_subtyping<-read_tsv(file.path("..", "molecular-subtyping-pathology", "results", "compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv")) as part of the molecular-subtyping-pathology module.

We directly use compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv to update pbta-histologies.tsv file

Ah. Okay. Can you describe that here then?

oh I forgot to commit the description 😅

Let me know if it makes sense, should I add in my workflow diagram if it's useful in the README or code or this looks ok ?

Sure! Why not add the diagram! Would probably be good to have that in this description.

Co-authored-by: Candace Savonen <cansav09@gmail.com>

kgaonkar6 and others added 15 commits December 7, 2020 15:12

update to use base histologies file

bd3a94d

update to use base histology file

bea7b1d

update to use base histology v18

cef95e9

uupdated to filtered fusion v18

ada16ca

rerun tsne for v18 subtyping

cfd1a81

adding all subtyping and gsea re-run

86f39a4

Update README.md

2cb5991

Update README.md

509cecc

Update README.md

7209be1

run-for-subtyping update

7d28e9d

run-for-subtyping update

d926f9b

run-for-subtyping update

3ee5bc7

run-for-subtyping update

8288ddf

run-for-subtyping update

3dc81ed

integrated_dx, mol_sub added to base histology

0850c7d

kgaonkar6 requested review from jaclyn-taroni and jharenza December 11, 2020 15:26

update README

5d97b93

jharenza reviewed Dec 12, 2020

View reviewed changes

analyses/molecular-subtyping-integrate/README.md Outdated Show resolved Hide resolved

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd Outdated Show resolved Hide resolved

analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd Outdated Show resolved Hide resolved

jaclyn-taroni added don't merge review before release labels Dec 13, 2020

jaclyn-taroni removed their request for review December 13, 2020 20:47

jharenza and others added 8 commits December 14, 2020 15:35

Update README.md

d70bef4

update text comments

Update README.md

f2f6576

update text comments

Update README.md

e1052bd

update text comments

Update README.md

33991b0

update text comments

Update README.md

7ff3290

update text

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

adf4dae

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

7f53d8f

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

Update analyses/molecular-subtyping-integrate/README.md

e90dd42

Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>

cansavvy added 2 commits January 7, 2021 17:36

Revert two transcriptomic-dim-red files to what is in master because …

e0b7185

…we've established this elsewhere

Get rid of that sneaky RUN_FOR_SUBTYPING line in dim-red-plots.sh

b6588bf

cansavvy reviewed Jan 7, 2021

View reviewed changes

kgaonkar6 and others added 8 commits January 7, 2021 18:33

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

482b723

Co-authored-by: Candace Savonen <cansav09@gmail.com>

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

97963af

Co-authored-by: Candace Savonen <cansav09@gmail.com>

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

4897260

Co-authored-by: Candace Savonen <cansav09@gmail.com>

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

760f7a4

Co-authored-by: Candace Savonen <cansav09@gmail.com>

weird spaces removed

989b53f

Merge branch 'v18_int_dx_histology' of https://github.com/kgaonkar6/O…

2e96e40

…penPBTA-analysis into v18_int_dx_histology

weird spaces removed

914b80c

weird spaces removed

ac20b04

cansavvy reviewed Jan 8, 2021

View reviewed changes

kgaonkar6 added 2 commits January 8, 2021 09:32

added desc and CI run

2f5ec46

adding to CI

ce32ad9

run CI with OPENPBTA_TESTING=1

ce55a27

jashapiro reviewed Jan 8, 2021

View reviewed changes

remove OPENPBTA_TESTING=1 specific data folder conditions

d40412f

cansavvy approved these changes Jan 8, 2021

View reviewed changes

kgaonkar6 and others added 4 commits January 8, 2021 13:49

Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd

8310388

Co-authored-by: Candace Savonen <cansav09@gmail.com>

adding more description

3a3c185

typos

681d254

added diagram

337e66e

jaclyn-taroni merged commit 6fefa87 into AlexsLemonade:master Jan 9, 2021

jaclyn-taroni removed the don't merge label Jan 9, 2021

kgaonkar6 deleted the v18_int_dx_histology branch January 22, 2021 21:18

jharenza mentioned this pull request Feb 12, 2021

Proposed Analysis: overhaul data release process #858

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

kgaonkar6 commented Dec 11, 2020 •

edited

Loading

jharenza left a comment

cansavvy left a comment

cansavvy Jan 7, 2021

cansavvy Jan 7, 2021

kgaonkar6 Jan 8, 2021

kgaonkar6 Jan 8, 2021

cansavvy Jan 7, 2021

cansavvy Jan 7, 2021

kgaonkar6 Jan 8, 2021 •

edited

Loading

cansavvy Jan 8, 2021

cansavvy left a comment

cansavvy Jan 8, 2021

kgaonkar6 commented Jan 8, 2021

jashapiro Jan 8, 2021

jharenza Jan 8, 2021

jashapiro Jan 8, 2021

jharenza Jan 8, 2021

jashapiro Jan 8, 2021

kgaonkar6 Jan 8, 2021 •

edited

Loading

cansavvy left a comment

cansavvy Jan 8, 2021

kgaonkar6 Jan 8, 2021 •

edited

Loading

cansavvy Jan 8, 2021

kgaonkar6 Jan 8, 2021

cansavvy Jan 8, 2021

kgaonkar6 Jan 8, 2021


		### Read molecular-subtyping-pathology results

		Reading molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes from `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv`


		we gathere and update molecular-subtype AND integrated_diagnosis AND broad_histology AND short_histology for these histologies.

		In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release.

	In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release.
	In this notebook, we will add the molecular subtyping updated in the `pbta-histologies-base.tsv` file to update the `pbta-histologies.tsv` file.

PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

Conversation

kgaonkar6 commented Dec 11, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza left a comment

Choose a reason for hiding this comment

cansavvy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Jan 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cansavvy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 commented Jan 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Jan 8, 2021 • edited Loading

Choose a reason for hiding this comment

cansavvy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 Jan 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgaonkar6 commented Dec 11, 2020 •

edited

Loading

kgaonkar6 Jan 8, 2021 •

edited

Loading

kgaonkar6 Jan 8, 2021 •

edited

Loading

kgaonkar6 Jan 8, 2021 •

edited

Loading