Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

release V22 #1365

Closed
wants to merge 12 commits into from
Closed

release V22 #1365

wants to merge 12 commits into from

Conversation

jharenza
Copy link
Collaborator

@jharenza jharenza commented May 4, 2022

Purpose/implementation Section

What scientific question is your analysis addressing?

This updates the histologies file with MB WGS samples as "To be classified", which were previously missed

What was your approach?

  1. Updated https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/molecular-subtyping-MB/04-no-RNA-samples.R for subtypes to say MB, To be classified instead of To be classified
  2. Updated https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/46a9d5c0656742b79aa472eadbd78f8bdd720fe4/analyses/molecular-subtyping-pathology/pathology_free_text-subtyping-lgat.Rmd to recode LGG, subtype --> SEGA, subtype
  3. Created base-histologies.tsv from v21 and reran molecular-subtype-integrate to get pbta-histologies.tsv.

What GitHub issue does your pull request address?

#1207

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Notes:

  • pbta-histology-base.tsv changes daily, and when rerunning using what I thought might be the base @kgaonkar6 used, I realized this module was recently rerun in this PR by @runjin326, perhaps using a more recent version of the file. Therefore, it looks like there are many more diffs than there really should be here.
  • To make sure none of the data aside from the few samples needing subtype updates changed, I simply created a new base file using v21 pbta-histologies.tsv minus the columns for harmonized_diagnosis and cancer_group. I put this bit of code at the very top of the script, but I think we may want to comment it out before merge? Similarly, I also added some QC into this, but maybe it is fine because this is the last(?) version of the histologies file for OpenPBTA?
  • After checking the diffs in the histologies file within this PR on GitHub, I realized some code which was implemented a while back for cancer groups never made it into the histologies file. This only pertains to the LGAT samples, but that means that it affects multiple figures.
  • Below are the diffs in cancer groups for LGAT broad_histology with the code we had in place (also in 01-integrate-subtyping.nb.html:
v21 %>%
  filter(short_histology == "LGAT") %>%
  select(cancer_group, experimental_strategy) %>%
  table()
                                     experimental_strategy
cancer_group                          RNA-Seq WGS
  Diffuse fibrillary astrocytoma            0   1
  Low-grade glioma astrocytoma            244 234
  Pilocytic astrocytoma                     1   2
  Pleomorphic xanthoastrocytoma             2   1
  Subependymal Giant Cell Astrocytoma       4   3

# v22
histology %>%
  filter(short_histology == "LGAT") %>%
  select(cancer_group, experimental_strategy) %>%
  table()
                                     experimental_strategy
cancer_group                          RNA-Seq WGS
  Diffuse fibrillary astrocytoma            6   6
  Gliomatosis cerebri                       1   1
  Low-grade glioma astrocytoma             94  89
  Oligodendroglioma                         1   1
  Pilocytic astrocytoma                   126 121
  Pleomorphic xanthoastrocytoma            11  11
  Subependymal Giant Cell Astrocytoma      12  12

The idea behind this separation into cancer groups before was to visualize the smaller groups within the oncoprint. The main takeaway, though, is that because there were a handful of pilocytic, and pleomorphic (pxa) not in the Low-grade glioma astrocytoma cancer_group _and there were some SEGA in the Low-grade glioma astrocytoma cancer_group, the analyses are not performed on the exact cohort of interest, so this is not an easy fix by simply recoding the v22 cancer_group back to v21. I also realized that Ganglioglioma is already its own cancer group and has a high enough N, so is in many plots already, but was missed the survival LGG_group.

I suppose my thoughts from all of this are that if we have to remake figures anyway, it probably makes sense to keep the cancer group code as it was added by @kgaonkar6, we may have to make a few more colors in the palette, and update survival to use the relevant cancer groups within LGG. 😭

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

no, but we need to discuss next steps

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jharenza jharenza marked this pull request as draft May 4, 2022 21:01
jharenza added 6 commits May 4, 2022 18:37
- update mb subtyping output of "To be classified" --> "MB, To be classified"
- rerun molecular-subtyping-pathology to compile
- rerun molecular-subtyping-integrate
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple changes for LGAT samples that we need to look into.

Comment on lines 188 to 189
# recode lgat samples back to lgg cancer group for plots/analyses
#short_histology == "LGAT" & ~ "Low-grade glioma astrocytoma",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take this out since it is commented out?

@@ -161,15 +161,15 @@ PT_EZEMVRVT 7316-3154 BS_32HAVF6S LGG, KIAA1549-BRAF Low-grade glioma/astrocytom
PT_PFA762TK 7316-462 BS_32JF8TPP LGG, KIAA1549-BRAF Low-grade glioma/astrocytoma, KIAA1549-BRAF LGAT Low-grade astrocytic tumor Updated via OpenPBTA subtyping Recurrence NA
PT_V0X73HEQ 7316-2182 BS_34201DKZ EPN, To be classified NA Ependymoma Ependymal tumor NA Initial CNS Tumor NA
PT_0F3RHT8J 7316-2673 BS_354DTKHE NA NA Meningioma Meningioma Updated via OpenPBTA subtyping from pathology_free_text_diagnosis Initial CNS Tumor Clear cell meningioma
PT_T5CR5HQZ 7316-33 BS_36HCZ3KW LGG, To be classified NA LGAT Low-grade astrocytic tumor Updated via pathology_free_text_diagnosis Initial CNS Tumor Diffuse fibrillary astrocytoma
PT_T5CR5HQZ 7316-33 BS_36HCZ3KW LGG, To be classified NA LGAT Low-grade astrocytic tumor NA Initial CNS Tumor NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These NAs seem wrong to me – this sample_id has diffuse fibrillary astrocytoma in the pathology_free_text_diagnosis column.

PT_6E8JYRXM 7316-2729 BS_3R0KN6Z4 LGG, KIAA1549-BRAF Low-grade glioma/astrocytoma, KIAA1549-BRAF LGAT Low-grade astrocytic tumor Updated via OpenPBTA subtyping Initial CNS Tumor NA
PT_BYJ428GA 7316-290 BS_3TQSE8HF LGG, To be classified NA LGAT Low-grade astrocytic tumor Updated via pathology_free_text_diagnosis Initial CNS Tumor Subependymal Giant Cell Astrocytoma
PT_BYJ428GA 7316-290 BS_3TQSE8HF LGG, To be classified NA LGAT Low-grade astrocytic tumor NA Initial CNS Tumor NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems wrong subependymal giant cell astrocytoma in pathology_free_text_diagnosis

PT_W6AWJJK7 7316-230 BS_9HEMB7RK GNG, other MAPK, IDH Ganglioglioma, other MAPK, IDH Ganglioglioma Low-grade astrocytic tumor Updated via OpenPBTA subtyping Initial CNS Tumor NA
PT_P75ZXB1S 7316-692 BS_9HSGV9TJ LGG, To be classified NA LGAT Low-grade astrocytic tumor Updated via pathology_free_text_diagnosis Initial CNS Tumor Pleomorphic xanthoastrocytoma
PT_P75ZXB1S 7316-692 BS_9HSGV9TJ LGG, To be classified NA LGAT Low-grade astrocytic tumor NA Initial CNS Tumor NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this one, too

PT_2VSDT9CK 7316-3575 BS_CQ8C6X4X LGG, wildtype Low-grade glioma/astrocytoma, wildtype LGAT Low-grade astrocytic tumor Updated via OpenPBTA subtyping Progressive NA
PT_GQDHZFJP 7316-265 BS_CS170P7T LGG, To be classified NA LGAT Low-grade astrocytic tumor Updated via pathology_free_text_diagnosis Progressive Pilocytic astrocytoma
PT_2VSDT9CK 7316-3575 BS_CQ8C6X4X LGG, wildtype Pilocytic astrocytoma, wildtype LGAT Low-grade astrocytic tumor Updated via OpenPBTA subtyping and pathology_free_text_diagnosis Progressive Pilocytic astrocytoma, wildtype
PT_GQDHZFJP 7316-265 BS_CS170P7T LGG, To be classified NA LGAT Low-grade astrocytic tumor NA Progressive NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging this one

PT_6Q0NPVP3 7316-2255 BS_E5H6CFYT GNG, H3 Ganglioglioma, H3 Ganglioglioma Low-grade astrocytic tumor Updated via OpenPBTA subtyping Initial CNS Tumor NA
PT_Z4PJA6KT 7316-1763 BS_E60JZ9Z3 DMG, H3 K28, TP53 loss Diffuse midline glioma, H3 K28-mutant HGAT Diffuse astrocytic and oligodendroglial tumor Updated via OpenPBTA subtyping Initial CNS Tumor NA
PT_KMHGNCNR 7316-350 BS_E7HEQZ1K LGG, BRAF V600E Low-grade glioma/astrocytoma, BRAF V600E LGAT Low-grade astrocytic tumor Updated via OpenPBTA subtyping Initial CNS Tumor NA
PT_J4MWGQFQ 7316-2071 BS_E8SJ36A6 NA NA Meningioma Meningioma Updated via OpenPBTA subtyping from pathology_free_text_diagnosis Initial CNS Tumor Meningothelial meningioma
PT_20ZM6THA 7316-3629 BS_E94QRJDW GNT, other MAPK Glial-neuronal tumor NOS, other MAPK GNT Neuronal and mixed neuronal-glial tumor Updated via OpenPBTA subtyping and pathology_free_text_diagnosis Recurrence Glial-neuronal tumor NOS
PT_R1DHB0ZT 7316-2642 BS_E9ATGWME LGG, To be classified NA LGAT Low-grade astrocytic tumor Updated via pathology_free_text_diagnosis Initial CNS Tumor Pilocytic astrocytoma
PT_H45M7M2T 7316-3074 BS_E9M7TDB6 LGG, other MAPK Low-grade glioma/astrocytoma, other MAPK LGAT Low-grade astrocytic tumor Updated via OpenPBTA subtyping Initial CNS Tumor NA
PT_R1DHB0ZT 7316-2642 BS_E9ATGWME LGG, To be classified NA LGAT Low-grade astrocytic tumor NA Initial CNS Tumor NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one

@jharenza
Copy link
Collaborator Author

jharenza commented May 6, 2022

Ok, I perhaps need to rerun the LGAT subtyping module as well.

- remove commented out code from integrate molecular subtyping
- re-order and select snv samples to subset maf to avoid crash
@jharenza
Copy link
Collaborator Author

jharenza commented May 6, 2022

@jaclyn-taroni I need some help. I tried rerunning molecular subyping for LGAT, but I am running into errors. First, in ce8dbd2, I am updating the 01 script. It would kill at the rbind step for consensus and hotspot mafs, so I reordered the code to pull LGAT samples out of these files upon reading so that they aren't so big. That worked, and 03 is now giving an error at chunk 7, when making the TxDb from GTF for FGFR1. I saw some perhaps related tickets suggesting this may be due to unstable RefSeq files? I am not sure what to do here.

@sjspielman sjspielman mentioned this pull request May 6, 2022
@jharenza
Copy link
Collaborator Author

jharenza commented May 7, 2022

closing this and will start fresh once some of the code updates are merged.

@jharenza jharenza closed this May 7, 2022
@jharenza jharenza mentioned this pull request May 26, 2022
5 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants