-
Notifications
You must be signed in to change notification settings - Fork 83
PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870
PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgaonkar6 - This looks good and I fixed a few minor typos.
I want to get a weigh-in from @jaclyn-taroni on finalizing the pbta-histologies.tsv
file. Up until now, we left anything not subtyped as NA for integrated_diagnosis
because technically, the integrated diagnosis is an integration of the pathology report with molecular data and when clinicians use this data, they will think that the integrated diagnosis considered molecular data. I had some discussions with Cassie Kline (neuro-oncology physician at CHOP) and we had two thoughts/options on this.
- For those not subtyped, we can carry over diagnoses information from
pathology_diagnosis
+/-pathology_free_text_diagnosis
into theintegrated_diagnosis
column so there is complete data for the tumors. - We add to the
integrated_diagnosis
column only diagnoses derived from molecular data and create another column, perhapsharmonized_diagnosis
, which contains either anintegrated_diagnosis
or apathology_diagnosis
.
I think I favor 2 because it is more "correct".
This being said, one subtyping modules does not use molecular data: neurocytoma.
Thoughts?
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
update text comments
update text comments
update text comments
update text comments
update text
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all seems reasonable. I still don't understand a lot of the background behind the histology labels but it seems like this PR is doing what it set out to do. Some of the wording things were a tad unclear to me so I just tried to rephrase (if I interpreted the explanations incorrectly, let me know).
One important question: I think we'd still want this to be added to CI testing?
|
||
### Read molecular-subtyping-pathology results | ||
|
||
Reading molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes from `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spacing is odd. Is there a reason behind this odd spacing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to do a quick thing to try to get rid of some of the weird spacing in this notebook, you can run styler::style_file("analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd")
and see if styler can fix some of these things for you. Beyond that, I don't know that we need to fuss with it.
...
I just checked, and unfortunately we do not have the styler
package in the docker container. So don't worry about this, but it could be handy for you elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm not sure where these spaces came from , checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spaces looked fine on localhost, I edited it and looks ok here as well. Let me know if I missed any weird spaces
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
The purpose of this notebook is to integrate molecular subtyping results from | ||
[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`. | ||
|
||
Here we are using v18 `pbta-histologies-base.tsv` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, (and sorry if this is written elsewhere, but it may be good to add it here too) is the purpose of pbta-histologies-base.tsv
that you write the subtyping results there and that in the next version of the release, the results in pbta-histologies-base.tsv
are added to pbta-histologies.tsv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aka it seems like pbta-histologies-base.tsv
is like the working draft?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we can call pbta-histologies-base.tsv
a working draft.
The PBTA Histologies 1-7 PR uses the base histology file which have all integrated_diagnosis,Notes and molecular_subtype as NA through all the molecular-subtyping-XYZ . Through this process we gradually create the pbta-histologies.tsv
for the same release.
Thus in each release there will be a pbta-histologies-base.tsv
and eventually apbta-histologies.tsv
.
You might have noticed that the modules (other than subtyping) were all using RNA data.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/760f7a42d253edcaf3341ac0873634e55590e59e/scripts/run-for-subtyping.sh#L13-L32
We only ran these modules because along with the pbta-histologies-base.tsv
we also needed to update these files with the new RNA samples to use in our molecular subtyping modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put some form of your description here in the beginning of this notebook?
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
Co-authored-by: Candace Savonen <cansav09@gmail.com>
Co-authored-by: Candace Savonen <cansav09@gmail.com>
Co-authored-by: Candace Savonen <cansav09@gmail.com>
Co-authored-by: Candace Savonen <cansav09@gmail.com>
…penPBTA-analysis into v18_int_dx_histology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is very close. 👍 I just one description add request (see comment) and my outstanding question about this being added to circle CI testing.
The purpose of this notebook is to integrate molecular subtyping results from | ||
[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`. | ||
|
||
Here we are using v18 `pbta-histologies-base.tsv` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put some form of your description here in the beginning of this notebook?
Thanks @cansavvy! I added the module to CI and also added code to read pbta-histologies-base.tsv from data directly when running on testing. |
if (running_in_ci) { | ||
data_dir <- "../../data/" | ||
} else { | ||
data_dir <- "../../data/release-v18-20201123/" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional code should not be needed. The way the system is set up, all files should be linked to data/
with no subfolder. In CI, this will be the testing files, but in regular use (after running download-data.sh
) it will be the release version. Unless we want to make this file only work in v18 (I don't think we do) we should not be referring to the subfolder.
if (running_in_ci) { | |
data_dir <- "../../data/" | |
} else { | |
data_dir <- "../../data/release-v18-20201123/" | |
} | |
data_dir <- "../../data/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm - @jashapiro I think we did this because it was being run before CI files were generated. However, also thinking about the process, it is a bit circular. This needs to be run for pbta-histologies.tsv
to be generated and added to the new release, before the CI files are generated. Thoughts on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest concern I have is that somebody might rerun this script when generating v19+ and forget to update this line. I don't know if this is pervasive through the subtyping files.
(Of course, with the linking approach it is possible to have the wrong set of files linked, and get a hidden error that way, so neither one is great.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgaonkar6 @jashapiro what if we add this as a requirement in the bash script in which the user has to put a release version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at some of the other subtyping analyses, and we don't do that for those. I think for consistency with the other analyses, it should simply use data/
and expect that the correct files have been placed or linked there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds good, I'll update the code to always use "../../data" folder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks fine to me now. Just noticed some typos so if we get those fixed I think this can be merged.
analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Outdated
Show resolved
Hide resolved
|
||
we gathere and update molecular-subtype AND integrated_diagnosis AND broad_histology AND short_histology for these histologies. | ||
|
||
In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release. | |
In this notebook, we will add the molecular subtyping updated in the `pbta-histologies-base.tsv` file to update the `pbta-histologies.tsv` file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subtyping updated in the
pbta-histologies-base.tsv
This is not entirely true, pbta-histologies-base.tsv
file is used but is not updated through the subtyping modules. The subtyping file is saved in
compiled_subtyping<-read_tsv(file.path("..", "molecular-subtyping-pathology", "results", "compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv")) as part of the molecular-subtyping-pathology module.
We directly use compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv to update pbta-histologies.tsv
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Okay. Can you describe that here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I forgot to commit the description 😅
Let me know if it makes sense, should I add in my workflow diagram if it's useful in the README or code or this looks ok ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Why not add the diagram! Would probably be good to have that in this description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Co-authored-by: Candace Savonen <cansav09@gmail.com>
Purpose/implementation Section
What scientific question is your analysis addressing?
Base histology needs to be updated with results from molecular-subtyping-pathology. In this module I'm adding molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes.
What was your approach?
Read pbta-histologies-base.tsv and add molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes. Check for changes, if there are updates we need to re-run summary modules/plots which use this information. Check for duplicates if any, and fix.
What GitHub issue does your pull request address?
#858
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
table
What is your summary of the results?
Final histology file for v18 release
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.