Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

update molecular subtyping pathology #854

Merged
merged 18 commits into from
Dec 1, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 151 additions & 19 deletions analyses/molecular-subtyping-pathology/01-compile-subtyping-results.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,23 @@ output:
html_notebook:
toc: true
toc_float: true
author: Jaclyn Taroni for CCDL
author: Jaclyn Taroni for CCDL, Jo Lynne Rokita for D3b
date: 2020
params:
is_ci: FALSE
---

The purpose of this notebook is to aggregate molecular subtyping results from the following mature analysis modules:

* [`molecular-subtyping-EWS`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-EWS)
* [`molecular-subtyping-HGG`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-HGG)
* [`molecular-subtyping-LGAT`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-LGAT)
* [`molecular-subtyping-embryonal`](https://github.com/jaclyn-taroni/OpenPBTA-analysis/tree/645-pathology-feedback/analyses/molecular-subtyping-embryonal)
* [`molecular-subtyping-EWS`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EWS)
* [`molecular-subtyping-HGG`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-HGG)
* [`molecular-subtyping-LGAT`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-LGAT)
* [`molecular-subtyping-embryonal`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-embryonal)
* [`molecular-subtyping-CRANIO`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-CRANIO)
* [`molecular-subtyping-EPN`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-EPN\)
* [`molecular-subtyping-MB`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-MB)
* [`molecular-subtyping-neurocytoma`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-neurocytoma)


## Set up

Expand Down Expand Up @@ -74,10 +79,14 @@ data_dir <- file.path("..", "..", "data")
analyses_dir <- ".."

# directories for upstream subtyping modules
cranio_dir <- file.path(analyses_dir, "molecular-subtyping-CRANIO")
ews_dir <- file.path(analyses_dir, "molecular-subtyping-EWS")
epn_dir <- file.path(analyses_dir, "molecular-subtyping-EPN")
hgg_dir <- file.path(analyses_dir, "molecular-subtyping-HGG")
lgat_dir <- file.path(analyses_dir, "molecular-subtyping-LGAT")
mb_dir <- file.path(analyses_dir, "molecular-subtyping-MB")
embryonal_dir <- file.path(analyses_dir, "molecular-subtyping-embryonal")
neurocytoma_dir <- file.path(analyses_dir, "molecular-subtyping-neurocytoma")

# the folder that contains the tabular results is standardized across modules
results_dir <- "results"
Expand All @@ -95,19 +104,23 @@ When we run this locally, we want to tie it to a specific version of the histolo
if (running_in_ci) {
histologies_file <- file.path(data_dir, "pbta-histologies.tsv")
} else {
histologies_file <- file.path(data_dir, "release-v15-20200228",
histologies_file <- file.path(data_dir, "release-v17-20200908",
"pbta-histologies.tsv")
}
```

Results files from individual modules.

```{r}
cranio_results_file <- file.path(cranio_dir, results_dir, "CRANIO_molecular_subtype.tsv")
ews_results_file <- file.path(ews_dir, results_dir, "EWS_results.tsv")
epn_results_file <- file.path(epn_dir, results_dir, "EPN_all_data_withsubgroup.tsv")
hgg_results_file <- file.path(hgg_dir, results_dir, "HGG_molecular_subtype.tsv")
lgat_results_file <- file.path(lgat_dir, results_dir, "lgat_subtyping.tsv")
mb_results_file <- file.path(mb_dir, results_dir, "MB_molecular_subtype.tsv")
embryonal_results_file <- file.path(embryonal_dir, results_dir,
"embryonal_tumor_molecular_subtypes.tsv")
neurocytoma_results_file <- file.path(neurocytoma_dir, results_dir, "neurocytoma_subtyping.tsv")
```

#### Output file
Expand All @@ -119,18 +132,23 @@ output_file <- file.path(results_dir, "compiled_molecular_subtypes.tsv")
## Read in data

```{r message=FALSE}
# split
histologies_df <- read_tsv(histologies_file, guess_max = 10000)
cranio_results_df <- read_tsv(cranio_results_file)
ews_results_df <- read_tsv(ews_results_file)
epn_results_df <- read_tsv(epn_results_file)
hgg_results_df <- read_tsv(hgg_results_file)
lgat_results_df <- read_tsv(lgat_results_file)
mb_results_df <- read_tsv(mb_results_file)
neurocytoma_results_df <- read_tsv(neurocytoma_results_file)
embryonal_results_df <- read_tsv(embryonal_results_file)
```

## Compile the subtyping resutls

### Handling non-ATRT/non-MB embryonal tumors

The molecular subtyping information from these tumors went into the v15 release, so we can use the `integrated_diagnosis`, `short_histology`, `broad_histology`, and `Notes` columns from the histologies file from that release.
The molecular subtyping information from these tumors will go into the v18 release, but we will use the `integrated_diagnosis`, `short_histology`, `broad_histology`, and `Notes` columns from the v17 histologies file until SQL rules PR goes in later.

```{r}
embryonal_results_df <- bind_exp_strategies(embryonal_results_df) %>%
Expand All @@ -140,23 +158,42 @@ embryonal_results_df <- bind_exp_strategies(embryonal_results_df) %>%
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID")
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis =
ifelse(molecular_subtype == "CNS Embryonal, NOS", "CNS Embryonal Tumor, NOS",
ifelse(molecular_subtype == "CNS HGNET-MN1", "CNS Embryonal Tumor, HGNET-MN1",
ifelse(molecular_subtype == "CNS NB-FOXR2", "CNS neuroblastoma",
ifelse(molecular_subtype == "ETMR, C19MC-altered", "Embryonal tumor with multilayer rosettes, C19MC-altered",
ifelse(molecular_subtype == "ETMR, NOS",
"Embryonal tumor with multilayer rosettes, NOS", NA))))),
short_histology =
ifelse(molecular_subtype %in% c("ETMR, C19MC-altered", "ETMR, NOS"),
"ETMR", "Embryonal Tumor"),
broad_histology = "Embryonal Tumor")
```

### Handling EWS

The EWS results were post-v15 and come with their own `Notes` column.
The EWS results were updated in V18. Adding integrated dx, broad hist, short hist here for now.

```{r}
# Add EWS integrated diagnosis, broad histology, short histology
ews_results_df <- bind_exp_strategies(ews_results_df) %>%
rename(integrated_diagnosis = integrated_diagnosis_reclassified,
short_histology = short_histology_reclassified,
broad_histology = broad_histology_reclassified)
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = "Ewing sarcoma",
broad_histology = "Mesenchymal non-meningothelial tumor",
short_histology = "EWS")
```

### Handling HGG

Like the non-ATRT/non-MB embryonal tumors, HGG subtyping was performed prior to v15.
HGG subtyping was updated with V18.

```{r}
hgg_results_df <- bind_exp_strategies(hgg_results_df) %>%
Expand All @@ -166,30 +203,119 @@ hgg_results_df <- bind_exp_strategies(hgg_results_df) %>%
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID")
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis =
ifelse(molecular_subtype == "DMG, H3 K28", "Diffuse midline glioma, H3 K28-mutant",
ifelse(molecular_subtype == "DMG, H3 K28, BRAF V600E", "Diffuse midline glioma, H3 K28-mutant, BRAF V600E",
ifelse(molecular_subtype == "HGG, BRAF V600E", "High-grade glioma/astrocytoma, BRAF V600E",
ifelse(molecular_subtype == "HGG, H3 G35", "High-grade glioma/astrocytoma, H3 G35-mutant",
ifelse(molecular_subtype == "HGG, H3 wildtype", "High-grade glioma/astrocytoma, H3 wildtype",
ifelse(molecular_subtype == "HGG, IDH", "High-grade glioma/astrocytoma, IDH-mutant", NA)))))),
broad_histology == "Diffuse astrocytic and oligodendroglial tumor",
short_histology == "HGAT")
```

### Handling LGAT

No columns that are disease labels have been changed yet.

```{r}
lgat_results_df <- bind_exp_strategies(lgat_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
pathology_diagnosis,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID")
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis =
ifelse(molecular_subtype == "LGG, BRAF fusion", "Low-grade glioma/astrocytoma, BRAF fusion",
ifelse(molecular_subtype == "LGG, BRAF V600E", "Low-grade glioma/astrocytoma, BRAF V600E",
ifelse(molecular_subtype == "LGG, BRAF wildtype", "Low-grade glioma/astrocytoma, BRAF wildtype", NA))),
broad_histology == "Low-grade astrocytic tumor",
short_histology == ifelse(pathology_diagnosis == "Ganglioglioma", "Ganglioglioma", "LGAT"))
```

### Handling EPN

```{r}
epn_results_df <- bind_exp_strategies(epn_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(molecular_subtype = subgroup) %>%
mutate(molecular_subtype = ifelse(is.na(molecular_subtype), "EPN, To be classified", molecular_subtype),
integrated_diagnosis = ifelse(molecular_subtype == "PT_EPN_A",
"Posterior Fossa Ependymoma, Type A",
ifelse(molecular_subtype == "ST_EPN_RELA",
"Supratentorial Ependymoma, RELA fusion positive",
ifelse(molecular_subtype == "ST_EPN_YAP1", "Supratentorial Ependymoma, YAP1 fusion positive", NA))),
broad_histology == "Ependymal Tumor",
short_histology == "Ependymoma")
```
# TODO

### Handling MB

```{r}
mb_results_df <- bind_exp_strategies(mb_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = ifelse(molecular_subtype == "MB, SHH",
"Medulloblastoma, SHH-activated",
ifelse(molecular_subtype == "MB, WNT","Medulloblastoma, WNT-activated",
ifelse(molecular_subtype == "MB, Group3",
"Medulloblastoma, group 3",
ifelse(molecular_subtype == "MB, Group4",
"Medulloblastoma, group 4", NA)))),
broad_histology == "Embryonal Tumor",
short_histology == "Medulloblastoma")
```


### Handling CRANIO

```{r}
cranio_results_df <- bind_exp_strategies(cranio_results_df) %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = ifelse(molecular_subtype == "CRANIO, ADAM",
"Adamantimomatous craniopharyngioma",
ifelse(molecular_subtype == "CRANIO, PAP","Papillary craniopharyngioma", NA)),
broad_histology == "Tumors of sellar region",
short_histology == "Craniopharyngioma")
```

### Handling Neurocytoma
```{r}
neurocytoma_results_df <- neurocytoma_results_df %>%
inner_join(select(histologies_df,
Kids_First_Biospecimen_ID,
integrated_diagnosis,
short_histology,
broad_histology,
Notes),
by = "Kids_First_Biospecimen_ID") %>%
mutate(integrated_diagnosis = ifelse(molecular_subtype == "CNC",
"Central Neurocytoma",
ifelse(molecular_subtype == "EVN","Extraventricular Neurocytoma", NA)),
broad_histology == "Neuronal and mixed neuronal-glial tumor",
short_histology == "Neurocytoma")
```


### All results

Compile results, sort, and write to file
Expand All @@ -198,7 +324,13 @@ Compile results, sort, and write to file
all_results_df <- bind_rows(embryonal_results_df,
ews_results_df,
hgg_results_df,
lgat_results_df) %>%
lgat_results_df,
epn_results_df,
cranio_results_df,
mb_results_df,
neurocytoma_results_df) %>%
select(Kids_First_Participant_ID, sample_id, Kids_First_Biospecimen_ID, molecular_subtype,
integrated_diagnosis, short_histology, broad_histology, Notes) %>%
arrange(Kids_First_Participant_ID, sample_id) %>%
write_tsv(output_file)
```
Expand Down
2,929 changes: 2,825 additions & 104 deletions analyses/molecular-subtyping-pathology/01-compile-subtyping-results.nb.html

Large diffs are not rendered by default.

2,677 changes: 2,625 additions & 52 deletions analyses/molecular-subtyping-pathology/02-incorporate-clinical-feedback.nb.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ output:
html_notebook:
toc: TRUE
toc_float: TRUE
author: Jaclyn Taroni for ALSF CCDL
author: Jaclyn Taroni for ALSF CCDL, Jo Lynne Rokita for D3b
date: 2020
params:
is_ci: FALSE
Expand Down Expand Up @@ -51,7 +51,7 @@ And the notes:
> #### Few notes:
> 1. `PT_7E3V3JFX` specimens were consistent with the original EPN dx, so pathology would call this a rare EPN, H3 K28 mutated tumor, rather than DMG.
> 2. `PT_AQWDQW27` specimen was consistent with meningioma, even though it has a hallmark EPN fusion, so pathology would also call this a rare meningioma with a _YAP1_ fusion.
> 3. Because 1 is a rare tumor (maybe first seen), the logic of searching for all H3 K28 mutations in [HGG subtyping](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) would convert this sample by default - how to handle this?
> 3. Because 1 is a rare tumor (maybe first seen), the logic of searching for all H3 K28 mutations in [HGG subtyping](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/249) would convert this sample by default.
> 4. Pathology confirmed this HGG BRAF V600E mutant tumor, [`BS_H1XPVS9A`](https://cbethell.github.io/open-pbta-output/09-HGG-with-braf-clustering.nb.html#identify_sample_that_clusters_with_lgat), to be a LGAT (PXA). I updated `molecular_subtype` here based on what it would look like, but this should come through via the LGAT [subtyping](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/631) ticket. How should we add this info?

## Set up
Expand Down Expand Up @@ -92,7 +92,7 @@ When we run this locally, we want to tie it to a specific version of the histolo
if (running_in_ci) {
histologies_file <- file.path(data_dir, "pbta-histologies.tsv")
} else {
histologies_file <- file.path(data_dir, "release-v15-20200228",
histologies_file <- file.path(data_dir, "release-v17-20200908",
"pbta-histologies.tsv")
}
```
Expand Down Expand Up @@ -217,7 +217,9 @@ compiled_df <- compiled_df %>%
)
```

### HGG BRAF V600E
### HGG BRAF V600E

We will be removing this from subtyping, so this can be left for now

The follow point comes from another issue [#627 (comment)](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/627#issuecomment-598789232):

Expand All @@ -244,9 +246,28 @@ compiled_df <- compiled_df %>%

The `molecular-subtyping-EPN` module has not been completed yet, but the logic that is in that module may mean that we need to include revising the labels of `PT_AQWDQW27`.

```{r}
compiled_df %>%
filter(Kids_First_Participant_ID == "PT_AQWDQW27")
# This sample is missing from the EPN table, but it should be there - will have to investigate and update this later.
```

### `PT_6Q0NPVP3`

The specimens for this patient, BS_5JM573JC and BS_E5H6CFYT, were classified as HGAT due to the presence of a histone mutation, but with the removal of LGAT from the HGAT module, this sample will no longer show up in two modules.
```{r}
compiled_df %>%
filter(Kids_First_Participant_ID == "PT_6Q0NPVP3")
```
# TODO: do we need to update PT_AQWDQW27 once molecular-subtyping-EPN is
# complete?

### Are there any other duplicate subtypes?
```{r}
unique_subtypes <- compiled_df %>%
select(Kids_First_Participant_ID, sample_id, molecular_subtype) %>%
distinct()

unique_subtypes[duplicated(unique_subtypes$sample_id),]
#PT_KTRJ8TFY (fixed in clinical feedback) and PT_6Q0NPVP3 (fixed in HGG module removing LGAT)
```

### Write revised table to file
Expand Down
2,788 changes: 2,714 additions & 74 deletions analyses/molecular-subtyping-pathology/03-incorporate-pathology-feedback.nb.html

Large diffs are not rendered by default.

Loading