Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
update molecular subtyping pathology (#854)
Browse files Browse the repository at this point in the history
* update molecular subtyping pathology

* update using latest master

-update MB, EPN, HGAT

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd

change ifelse() to case_when()

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* update Notes

update Notes to be in sync with new subtypes

* update comment

update comment about adding int dx/broad/short hist here

* update subtypes due to pathology feedback

update and remove duplicates from final file

* Update analyses/molecular-subtyping-pathology/03-incorporate-pathology-feedback.Rmd

add updates from code review

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/03-incorporate-pathology-feedback.Rmd

add updates from code review

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/03-incorporate-pathology-feedback.Rmd

add updates from code review

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>

* Update analyses/molecular-subtyping-pathology/03-incorporate-pathology-feedback.Rmd

fix typo

* rerun subtyping

 close chunk and rerun subtyping

Co-authored-by: Krutika Gaonkar <34580719+kgaonkar6@users.noreply.github.com>
  • Loading branch information
Jo Lynne Rokita and kgaonkar6 authored Dec 1, 2020
1 parent 0a6e543 commit 83d3e1c
Show file tree
Hide file tree
Showing 8 changed files with 13,359 additions and 3,086 deletions.
179 changes: 160 additions & 19 deletions analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,23 @@ output:
html_notebook:
toc: true
toc_float: true
author: Jaclyn Taroni for CCDL
author: Jaclyn Taroni for CCDL, Jo Lynne Rokita for D3b
date: 2020
params:
is_ci: FALSE
---

The purpose of this notebook is to aggregate molecular subtyping results from the following mature analysis modules:

* [`molecular-subtyping-EWS`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-EWS)
* [`molecular-subtyping-HGG`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-HGG)
* [`molecular-subtyping-LGAT`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-LGAT)
* [`molecular-subtyping-embryonal`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-embryonal)
* [`molecular-subtyping-EWS`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EWS)
* [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG)
* [`molecular-subtyping-LGAT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-LGAT)
* [`molecular-subtyping-embryonal`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-embryonal)
* [`molecular-subtyping-CRANIO`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-CRANIO)
* [`molecular-subtyping-EPN`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EPN\)
* [`molecular-subtyping-MB`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-MB)
* [`molecular-subtyping-neurocytoma`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-neurocytoma)


## Set up

Expand Down Expand Up @@ -74,10 +79,14 @@ data_dir <- file.path("..", "..", "data")
analyses_dir <- ".."
# directories for upstream subtyping modules
cranio_dir <- file.path(analyses_dir, "molecular-subtyping-CRANIO")
ews_dir <- file.path(analyses_dir, "molecular-subtyping-EWS")
epn_dir <- file.path(analyses_dir, "molecular-subtyping-EPN")
hgg_dir <- file.path(analyses_dir, "molecular-subtyping-HGG")
lgat_dir <- file.path(analyses_dir, "molecular-subtyping-LGAT")
mb_dir <- file.path(analyses_dir, "molecular-subtyping-MB")
embryonal_dir <- file.path(analyses_dir, "molecular-subtyping-embryonal")
neurocytoma_dir <- file.path(analyses_dir, "molecular-subtyping-neurocytoma")
# the folder that contains the tabular results is standardized across modules
results_dir <- "results"
Expand All @@ -95,19 +104,23 @@ When we run this locally, we want to tie it to a specific version of the histolo
if (running_in_ci) {
histologies_file <- file.path(data_dir, "pbta-histologies.tsv")
} else {
histologies_file <- file.path(data_dir, "release-v15-20200228",
histologies_file <- file.path(data_dir, "release-v17-20200908",
"pbta-histologies.tsv")
}
```

Results files from individual modules.

```{r}
cranio_results_file <- file.path(cranio_dir, results_dir, "CRANIO_molecular_subtype.tsv")
ews_results_file <- file.path(ews_dir, results_dir, "EWS_results.tsv")
epn_results_file <- file.path(epn_dir, results_dir, "EPN_all_data_withsubgroup.tsv")
hgg_results_file <- file.path(hgg_dir, results_dir, "HGG_molecular_subtype.tsv")
lgat_results_file <- file.path(lgat_dir, results_dir, "lgat_subtyping.tsv")
mb_results_file <- file.path(mb_dir, results_dir, "MB_molecular_subtype.tsv")
embryonal_results_file <- file.path(embryonal_dir, results_dir,
"embryonal_tumor_molecular_subtypes.tsv")
neurocytoma_results_file <- file.path(neurocytoma_dir, results_dir, "neurocytoma_subtyping.tsv")
```

#### Output file
Expand All @@ -119,18 +132,23 @@ output_file <- file.path(results_dir, "compiled_molecular_subtypes.tsv")
## Read in data

```{r message=FALSE}
# split
histologies_df <- read_tsv(histologies_file, guess_max = 10000)
cranio_results_df <- read_tsv(cranio_results_file)
ews_results_df <- read_tsv(ews_results_file)
epn_results_df <- read_tsv(epn_results_file)
hgg_results_df <- read_tsv(hgg_results_file)
lgat_results_df <- read_tsv(lgat_results_file)
mb_results_df <- read_tsv(mb_results_file)
neurocytoma_results_df <- read_tsv(neurocytoma_results_file)
embryonal_results_df <- read_tsv(embryonal_results_file)
```

## Compile the subtyping resutls

### Handling non-ATRT/non-MB embryonal tumors

The molecular subtyping information from these tumors went into the v15 release, so we can use the `integrated_diagnosis`, `short_histology`, `broad_histology`, and `Notes` columns from the histologies file from that release.
The molecular subtyping information from these tumors will go into the v18 release, and we will update `integrated_diagnosis`, `short_histology`, `broad_histology`, and `Notes` columns now, until SQL rules PR [#748](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/748) goes in later.

```{r}
embryonal_results_df <- bind_exp_strategies(embryonal_results_df) %>%
Expand All @@ -140,23 +158,44 @@ embryonal_results_df <- bind_exp_strategies(embryonal_results_df) %>%
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID")
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = case_when(molecular_subtype == "CNS Embryonal, NOS"~ "CNS Embryonal Tumor, NOS",
molecular_subtype == "CNS HGNET-MN1"~ "CNS Embryonal Tumor, HGNET-MN1",
molecular_subtype == "CNS NB-FOXR2" ~ "CNS neuroblastoma",
molecular_subtype == "ETMR, C19MC-altered"~ "Embryonal tumor with multilayer rosettes, C19MC-altered",
molecular_subtype == "ETMR, NOS"~"Embryonal tumor with multilayer rosettes, NOS",
TRUE ~ NA_character_),
short_histology =
if_else(molecular_subtype %in% c("ETMR, C19MC-altered", "ETMR, NOS"),
"ETMR", "Embryonal Tumor"),
broad_histology = "Embryonal Tumor",
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```

### Handling EWS

The EWS results were post-v15 and come with their own `Notes` column.
The EWS results were updated in V18. Adding integrated dx, broad hist, short hist here for now.

```{r}
# Add EWS integrated diagnosis, broad histology, short histology
ews_results_df <- bind_exp_strategies(ews_results_df) %>%
rename(integrated_diagnosis = integrated_diagnosis_reclassified,
short_histology = short_histology_reclassified,
broad_histology = broad_histology_reclassified)
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = "Ewing sarcoma",
broad_histology = "Mesenchymal non-meningothelial tumor",
short_histology = "EWS",
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```

### Handling HGG

Like the non-ATRT/non-MB embryonal tumors, HGG subtyping was performed prior to v15.
HGG subtyping was updated with V18.

```{r}
hgg_results_df <- bind_exp_strategies(hgg_results_df) %>%
Expand All @@ -166,28 +205,120 @@ hgg_results_df <- bind_exp_strategies(hgg_results_df) %>%
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID")
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = case_when(molecular_subtype == "DMG, H3 K28"~ "Diffuse midline glioma, H3 K28-mutant",
molecular_subtype == "DMG, H3 K28, BRAF V600E"~ "Diffuse midline glioma, H3 K28-mutant, BRAF V600E",
molecular_subtype == "BRAF V600E"~ "High-grade glioma/astrocytoma, BRAF V600E",
molecular_subtype == "HGG, H3 G35" ~ "High-grade glioma/astrocytoma, H3 G35-mutant",
molecular_subtype == "HGG, H3 wildtype" ~ "High-grade glioma/astrocytoma, H3 wildtype",
molecular_subtype == "HGG, IDH"~ "High-grade glioma/astrocytoma, IDH-mutant",
TRUE~ NA_character_),
broad_histology = "Diffuse astrocytic and oligodendroglial tumor",
short_histology = "HGAT",
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```

### Handling LGAT

No columns that are disease labels have been changed yet.

```{r}
lgat_results_df <- bind_exp_strategies(lgat_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
pathology_diagnosis,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID")
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = case_when(molecular_subtype == "LGG, BRAF fusion"~ "Low-grade glioma/astrocytoma, BRAF fusion",
molecular_subtype == "LGG, BRAF V600E" ~ "Low-grade glioma/astrocytoma, BRAF V600E",
molecular_subtype == "LGG, BRAF wildtype"~ "Low-grade glioma/astrocytoma, BRAF wildtype",
TRUE ~NA_character_),
broad_histology = "Low-grade astrocytic tumor",
short_histology = if_else(pathology_diagnosis == "Ganglioglioma", "Ganglioglioma", "LGAT"),
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```

### Handling EPN

```{r}
epn_results_df <- bind_exp_strategies(epn_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(molecular_subtype = subgroup) %>%
mutate(molecular_subtype = if_else(is.na(molecular_subtype), "EPN, To be classified", molecular_subtype),
integrated_diagnosis = case_when(molecular_subtype == "PT_EPN_A" ~ "Posterior Fossa Ependymoma, Type A",
molecular_subtype == "ST_EPN_RELA" ~ "Supratentorial Ependymoma, RELA fusion positive",
molecular_subtype == "ST_EPN_YAP1" ~ "Supratentorial Ependymoma, YAP1 fusion positive",
TRUE~ NA_character_),
broad_histology = "Ependymal Tumor",
short_histology = "Ependymoma",
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```
# TODO

### Handling MB

```{r}
mb_results_df <- bind_exp_strategies(mb_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = case_when(molecular_subtype == "MB, SHH"~"Medulloblastoma, SHH-activated",
molecular_subtype == "MB, WNT"~"Medulloblastoma, WNT-activated",
molecular_subtype == "MB, Group3"~"Medulloblastoma, group 3",
molecular_subtype == "MB, Group4"~ "Medulloblastoma, group 4",
TRUE ~NA_character_),
broad_histology = "Embryonal Tumor",
short_histology = "Medulloblastoma",
Notes = if_else(!is.na(integrated_diagnosis), "Subtype based on prediction;Updated via OpenPBTA subtyping", Notes))
```


### Handling CRANIO

```{r}
cranio_results_df <- bind_exp_strategies(cranio_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = case_when(molecular_subtype == "CRANIO, ADAM" ~"Adamantimomatous craniopharyngioma",
molecular_subtype == "CRANIO, PAP" ~"Papillary craniopharyngioma",
TRUE ~ NA_character_),
broad_histology = "Tumors of sellar region",
short_histology = "Craniopharyngioma",
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```

### Handling Neurocytoma
```{r}
neurocytoma_results_df <- neurocytoma_results_df %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = case_when(molecular_subtype == "CNC"~ "Central Neurocytoma",
molecular_subtype == "EVN" ~"Extraventricular Neurocytoma",
TRUE ~ NA_character_),
broad_histology = "Neuronal and mixed neuronal-glial tumor",
short_histology = "Neurocytoma",
Notes = if_else(!is.na(integrated_diagnosis), "Updated via OpenPBTA subtyping", Notes))
```

### All results
Expand All @@ -198,7 +329,18 @@ Compile results, sort, and write to file
all_results_df <- bind_rows(embryonal_results_df,
ews_results_df,
hgg_results_df,
lgat_results_df) %>%
lgat_results_df,
epn_results_df,
cranio_results_df,
mb_results_df,
neurocytoma_results_df) %>%
select(Kids_First_Participant_ID, sample_id, Kids_First_Biospecimen_ID, molecular_subtype,
integrated_diagnosis, short_histology, broad_histology, Notes) %>%
# Cleanup a few Notes which have changed since last time
# Remove this because those were taken out of EWS
mutate(Notes = case_when(Notes == "Reclassified due to presence of hallmark EWS fusions"~ NA_character_,
Notes == "Subtype based on prediction"~ NA_character_,
TRUE ~ Notes)) %>%
arrange(Kids_First_Participant_ID, sample_id) %>%
write_tsv(output_file)
```
Expand All @@ -208,4 +350,3 @@ all_results_df <- bind_rows(embryonal_results_df,
```{r}
sessionInfo()
```

2,940 changes: 2,834 additions & 106 deletions analyses/molecular-subtyping-pathology/01-compile-subtyping-results.nb.html

Large diffs are not rendered by default.

2,677 changes: 2,625 additions & 52 deletions analyses/molecular-subtyping-pathology/02-incorporate-clinical-feedback.nb.html

Large diffs are not rendered by default.

Loading

0 comments on commit 83d3e1c

Please sign in to comment.