Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

Merged
merged 68 commits into from
Jan 9, 2021

Conversation

kgaonkar6
Copy link
Collaborator

@kgaonkar6 kgaonkar6 commented Dec 11, 2020

Purpose/implementation Section

What scientific question is your analysis addressing?

Base histology needs to be updated with results from molecular-subtyping-pathology. In this module I'm adding molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes.

What was your approach?

Read pbta-histologies-base.tsv and add molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes. Check for changes, if there are updates we need to re-run summary modules/plots which use this information. Check for duplicates if any, and fix.

What GitHub issue does your pull request address?

#858

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

table

What is your summary of the results?

Final histology file for v18 release

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

Copy link
Collaborator

@jharenza jharenza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgaonkar6 - This looks good and I fixed a few minor typos.

I want to get a weigh-in from @jaclyn-taroni on finalizing the pbta-histologies.tsv file. Up until now, we left anything not subtyped as NA for integrated_diagnosis because technically, the integrated diagnosis is an integration of the pathology report with molecular data and when clinicians use this data, they will think that the integrated diagnosis considered molecular data. I had some discussions with Cassie Kline (neuro-oncology physician at CHOP) and we had two thoughts/options on this.

  1. For those not subtyped, we can carry over diagnoses information from pathology_diagnosis +/- pathology_free_text_diagnosis into the integrated_diagnosis column so there is complete data for the tumors.
  2. We add to the integrated_diagnosis column only diagnoses derived from molecular data and create another column, perhaps harmonized_diagnosis, which contains either an integrated_diagnosis or a pathology_diagnosis.

I think I favor 2 because it is more "correct".

This being said, one subtyping modules does not use molecular data: neurocytoma.

Thoughts?

jharenza and others added 8 commits December 14, 2020 15:35
update text comments
update text comments
update text comments
update text comments
update text
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Co-authored-by: Jo Lynne Rokita <jharenza@gmail.com>
Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all seems reasonable. I still don't understand a lot of the background behind the histology labels but it seems like this PR is doing what it set out to do. Some of the wording things were a tad unclear to me so I just tried to rephrase (if I interpreted the explanations incorrectly, let me know).

One important question: I think we'd still want this to be added to CI testing?


### Read molecular-subtyping-pathology results

Reading molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes from `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spacing is odd. Is there a reason behind this odd spacing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to do a quick thing to try to get rid of some of the weird spacing in this notebook, you can run styler::style_file("analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd") and see if styler can fix some of these things for you. Beyond that, I don't know that we need to fuss with it.
...
I just checked, and unfortunately we do not have the styler package in the docker container. So don't worry about this, but it could be handy for you elsewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure where these spaces came from , checking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the spaces looked fine on localhost, I edited it and looks ok here as well. Let me know if I missed any weird spaces

The purpose of this notebook is to integrate molecular subtyping results from
[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`.

Here we are using v18 `pbta-histologies-base.tsv`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, (and sorry if this is written elsewhere, but it may be good to add it here too) is the purpose of pbta-histologies-base.tsv that you write the subtyping results there and that in the next version of the release, the results in pbta-histologies-base.tsv are added to pbta-histologies.tsv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aka it seems like pbta-histologies-base.tsv is like the working draft?

Copy link
Collaborator Author

@kgaonkar6 kgaonkar6 Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we can call pbta-histologies-base.tsv a working draft.
The PBTA Histologies 1-7 PR uses the base histology file which have all integrated_diagnosis,Notes and molecular_subtype as NA through all the molecular-subtyping-XYZ . Through this process we gradually create the pbta-histologies.tsv for the same release.

Thus in each release there will be a pbta-histologies-base.tsv and eventually apbta-histologies.tsv.

You might have noticed that the modules (other than subtyping) were all using RNA data.
https://github.com/kgaonkar6/OpenPBTA-analysis/blob/760f7a42d253edcaf3341ac0873634e55590e59e/scripts/run-for-subtyping.sh#L13-L32

We only ran these modules because along with the pbta-histologies-base.tsv we also needed to update these files with the new RNA samples to use in our molecular subtyping modules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put some form of your description here in the beginning of this notebook?

Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very close. 👍 I just one description add request (see comment) and my outstanding question about this being added to circle CI testing.

The purpose of this notebook is to integrate molecular subtyping results from
[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`.

Here we are using v18 `pbta-histologies-base.tsv`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put some form of your description here in the beginning of this notebook?

@kgaonkar6
Copy link
Collaborator Author

Thanks @cansavvy! I added the module to CI and also added code to read pbta-histologies-base.tsv from data directly when running on testing.

Comment on lines 52 to 56
if (running_in_ci) {
data_dir <- "../../data/"
} else {
data_dir <- "../../data/release-v18-20201123/"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional code should not be needed. The way the system is set up, all files should be linked to data/ with no subfolder. In CI, this will be the testing files, but in regular use (after running download-data.sh) it will be the release version. Unless we want to make this file only work in v18 (I don't think we do) we should not be referring to the subfolder.

Suggested change
if (running_in_ci) {
data_dir <- "../../data/"
} else {
data_dir <- "../../data/release-v18-20201123/"
}
data_dir <- "../../data/"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm - @jashapiro I think we did this because it was being run before CI files were generated. However, also thinking about the process, it is a bit circular. This needs to be run for pbta-histologies.tsv to be generated and added to the new release, before the CI files are generated. Thoughts on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest concern I have is that somebody might rerun this script when generating v19+ and forget to update this line. I don't know if this is pervasive through the subtyping files.

(Of course, with the linking approach it is possible to have the wrong set of files linked, and get a hidden error that way, so neither one is great.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgaonkar6 @jashapiro what if we add this as a requirement in the bash script in which the user has to put a release version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at some of the other subtyping analyses, and we don't do that for those. I think for consistency with the other analyses, it should simply use data/ and expect that the correct files have been placed or linked there.

Copy link
Collaborator Author

@kgaonkar6 kgaonkar6 Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good, I'll update the code to always use "../../data" folder

Copy link
Collaborator

@cansavvy cansavvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine to me now. Just noticed some typos so if we get those fixed I think this can be merged.


we gathere and update molecular-subtype AND integrated_diagnosis AND broad_histology AND short_histology for these histologies.

In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this notebook we will add the molecular subtyping done as part as OpenPBTA to the base histology file to create the `pbta-histologies.tsv` for the same release.
In this notebook, we will add the molecular subtyping updated in the `pbta-histologies-base.tsv` file to update the `pbta-histologies.tsv` file.

Copy link
Collaborator Author

@kgaonkar6 kgaonkar6 Jan 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtyping updated in the pbta-histologies-base.tsv

This is not entirely true, pbta-histologies-base.tsv file is used but is not updated through the subtyping modules. The subtyping file is saved in
compiled_subtyping<-read_tsv(file.path("..", "molecular-subtyping-pathology", "results", "compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv")) as part of the molecular-subtyping-pathology module.

We directly use compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv to update pbta-histologies.tsv file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. Okay. Can you describe that here then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I forgot to commit the description 😅

Let me know if it makes sense, should I add in my workflow diagram if it's useful in the README or code or this looks ok ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Why not add the diagram! Would probably be good to have that in this description.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants