Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

molecular subtyping EPN update #785

Merged

Conversation

jashapiro
Copy link
Member

Purpose/implementation Section

What scientific question is your analysis addressing?

With updates to pbta-histologies.tsv, we needed to redo molecular subtyping analysis to use the renamed columns and account for any data changes.

What was your approach?

I substituted pathology_diagnosis for integrated_diagnosis wherever it occurred in the scripts, then reran

What GitHub issue does your pull request address?

Closes #755

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The code changes are minimal, but some of the results have changed more than I expected. There are two additional participants included: PT_7E3V3JFX and PT_80NVYCBS

PT_7E3V3JFX was previously classified in integrated_diagnosis as Diffuse midline glioma
PT_80NVYCBS was previously classified in integrated_diagnosis as CNS Embryonal Tumor

Do these changes make sense? Do we expect these samples be reclassified in integrated diagnosis for other reasons? If so, what is the correct order in which these reclassifications should be performed? In other words, should these samples be excluded by some other criteria before proceeding with the subtyping here?

I do not think any final subtyping results here changed, but one could imagine that including or excluding samples could have such an effect, as the RNA expression levels used here are based on within-group z score normalization. (Should the z scores here actually be calculated for all samples, not just EPN samples?)

Is there anything that you want to discuss further?

I did not use pathology_free_text_diagnosis as there did not seem to be any additional information in that field that related to EPN diagnosis: All samples seemed to be captured by pathology_diagnosis. Is it a safe assumption for the future that this will remain the case?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

See above

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza
Copy link
Collaborator

jharenza commented Sep 18, 2020

Hi @jashapiro! Thanks for working on this.

PT_7E3V3JFX was previously classified in integrated_diagnosis as Diffuse midline glioma
PT_80NVYCBS was previously classified in integrated_diagnosis as CNS Embryonal Tumor

Do these changes make sense?

For PT_7E3V3JFX, this makes sense and will stay as EPN. What happened was early on, we found the H3 K28M histone mutation, but after pathology review, it was deemed a rare EPN with H3 K28M. This is reflected in molecular-subtyping-pathology. Since that module is done last, I think this subtype would get updated to include that mutation later.

For PT_80NVYCBS, this sample was re-classified in embryonal subtyping due to the presence of MN1 fusions and subsequently verified as embryonal during pathology review and integrated into molecular-subtyping-pathology. Same sentiment - since pathology subtyping occurs last, this sample would be safely updated later.

Do we expect these samples be reclassified in integrated diagnosis for other reasons? If so, what is the correct order in which these reclassifications should be performed? In other words, should these samples be excluded by some other criteria before proceeding with the subtyping here?

I was briefly discussing an order of subtyping operations with @jaclyn-taroni offline, so maybe we should think about this. Currently, whenever an actual diagnosis changed due to subtyping, I took those samples back to pathology to review and then they got integrated into the pathology subtyping module. Perhaps that is how we can keep track - another file for any changes to diagnosis that will accumulate as we do the modules?

@jaclyn-taroni
Copy link
Member

@jharenza I think what you're describing is at least partly covered by the first point I raised in #784 but if you'd like to check and also expand on

Perhaps that is how we can keep track - another file for any changes to diagnosis that will accumulate as we do the modules?

That issue is probably a better place to keep track of those ideas. @jashapiro your feedback on #784 is welcome, too, of course 😁

@jaclyn-taroni
Copy link
Member

I do not think any final subtyping results here changed, but one could imagine that including or excluding samples could have such an effect, as the RNA expression levels used here are based on within-group z score normalization. (Should the z scores here actually be calculated for all samples, not just EPN samples?)

The original issue (#245) describes certain genes being overexpressed relative to other EPN tumors, which is why within-group makes sense to me. There are about 90 of these samples and I expect only a handful (low single digits) would ever be removed in the context of this project so I'm inclined for the process to remain that we conduct the EPN subtyping upstream of where we compile all the other results - we can discuss that point more on #784! We also need to add EPN in in the first place (#667).

@jashapiro
Copy link
Member Author

The original issue (#245) describes certain genes being overexpressed relative to other EPN tumors, which is why within-group makes sense to me. There are about 90 of these samples and I expect only a handful (low single digits) would ever be removed in the context of this project so I'm inclined for the process to remain that we conduct the EPN subtyping upstream of where we compile all the other results - we can discuss that point more on #784! We also need to add EPN in in the first place (#667).

Sounds good to me. I was mostly just surprised by the extent of changes (of small magnitude), so I was digging a bit to see what had happened. As long as the results still make sense (and as I said, I don't think any subtypes changed), we should be fine.

Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good to me!

@jashapiro jashapiro merged commit 201f116 into AlexsLemonade:master Sep 18, 2020
@jashapiro jashapiro deleted the jashapiro/755-EPN-pathology-update branch September 18, 2020 22:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updated analysis: EPN subtyping to use pathology diagnosis
3 participants