Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Recode adamantinomatous craniopharyngiomas per #994 #1016

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
title: "Recoding adamantinomatous craniopharyngiomas"
output:
html_notebook:
toc: true
toc_float: true
author: JN Taroni for ALSF CCDL (code)
date: 2021
---

_Background adapted from [#994](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/994)_

There are Craniopharyngioma samples which may have been annotated as "To be classified" in `molecular-subtyping-CRANIO` because they lack canonical mutations.
However, for adamantinomatous craniopharyngioma, the b-catenin SNV is not present in all samples ([ref](https://doi.org/10.1093/jnen/nlw116)):

> In our cohort of [adamantinomatous craniopharyngiomas] specimens from 117 patients we found _CTNNB1_ mutations in 89 cases (76.1%).

There are samples described as `Adamantinomatous` in pathology reports, so we can update the `harmonized_diagnosis` and `molecular_subtype` information accordingly.

## Set up

### Libraries

```{r}
library(tidyverse)
```

### Input

```{r}
data_dir <- file.path("..", "..", "data")
results_dir <- "results"
histologies_file <- file.path(data_dir, "pbta-histologies-base.tsv")
compiled_subtypes_file <- file.path(results_dir, "compiled_molecular_subtypes.tsv")
```

### Output

```{r}
output_file <- file.path(results_dir, "cranio_adam_subtypes.tsv")
```

## Read in data

```{r}
histologies_df <- read_tsv(histologies_file)
```

```{r}
subtypes_df <- read_tsv(compiled_subtypes_file,
guess_max = 10000)
```

### Samples to be reclassified

The instructions on [#994](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/994) are to use specific sample identifiers to do the reclassification because it is on the basis of pathology report review.

```{r}
joined_df <- histologies_df %>%
select(Kids_First_Biospecimen_ID,
Kids_First_Participant_ID,
sample_id,
sample_type, # required to filter out normal samples
pathology_diagnosis,
pathology_free_text_diagnosis,
tumor_descriptor) %>%
inner_join(subtypes_df,
by = c("Kids_First_Biospecimen_ID",
"Kids_First_Participant_ID",
"sample_id"))
```

Here's the same filtering steps that are performed on the issue itself that we'll save in a new data frame.

```{r}
acp_df <- joined_df %>%
# Same logic as on the issue!
filter(pathology_diagnosis == "Craniopharyngioma",
molecular_subtype == "CRANIO, To be classified",
str_detect(str_to_lower(pathology_free_text_diagnosis),
"adamantinomatous")) %>%
select(sample_id,
pathology_free_text_diagnosis,
molecular_subtype) %>%
distinct()

acp_df
```

We can pull the `sample_id` values out and use that in our next steps.

```{r}
sample_ids_reclassification <- acp_df %>%
pull(sample_id)
```

## Recode adamantinomatous craniopharyngiomas

We can use the following table from [#994](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/994) to guide how we recode the labels for these samples.

| broad_histology | short_histology | harmonized_diagnosis | molecular_subtype |
|-------------------------|-------------------|------------------------------------|-------------------|
| Tumors of sellar region | Craniopharyngioma | Adamantinomatous craniopharyngioma | CRANIO, ADAM |

```{r}
cranio_adam_df <- joined_df %>%
filter(sample_id %in% sample_ids_reclassification,
# Exclude normal samples
sample_type != "Normal") %>%
# Filter to relevant ID and disease type label columns
select(Kids_First_Biospecimen_ID,
Kids_First_Participant_ID,
sample_id,
broad_histology,
short_histology,
molecular_subtype,
tumor_descriptor) %>%
# Code the values that are in the table above
mutate(
broad_histology = "Tumors of sellar region",
short_histology = "Craniopharyngioma",
harmonized_diagnosis = "Adamantinomatous craniopharyngioma",
molecular_subtype = "CRANIO, ADAM",
Notes = "Updated via OpenPBTA subtyping from pathology_free_text_diagnosis"
)
```

Write to file!

```{r}
write_tsv(cranio_adam_df, output_file)
```

## Session Info

```{r}
sessionInfo()
```

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Kids_First_Biospecimen_ID Kids_First_Participant_ID sample_id broad_histology short_histology molecular_subtype tumor_descriptor harmonized_diagnosis Notes
BS_QSDKJM8T PT_WPYCXMDA 7316-3891 Tumors of sellar region Craniopharyngioma CRANIO, ADAM Progressive Adamantinomatous craniopharyngioma Updated via OpenPBTA subtyping from pathology_free_text_diagnosis
BS_T8C13KNH PT_WE7HQN0C 7316-1795 Tumors of sellar region Craniopharyngioma CRANIO, ADAM Initial CNS Tumor Adamantinomatous craniopharyngioma Updated via OpenPBTA subtyping from pathology_free_text_diagnosis
BS_VPMKHW9X PT_WPYCXMDA 7316-3891 Tumors of sellar region Craniopharyngioma CRANIO, ADAM Progressive Adamantinomatous craniopharyngioma Updated via OpenPBTA subtyping from pathology_free_text_diagnosis
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ cd "$(dirname "${BASH_SOURCE[0]}")"
# a single table
Rscript -e "rmarkdown::render('01-compile-subtyping-results.Rmd', params=list(is_ci = ${IS_CI}), clean = TRUE)"

# Recoding ACP samples
Rscript -e "rmarkdown::render('pathology-subtyping-craniopharyngioma.Rmd', clean = TRUE)"

# Run the second notebook to incorporate clinical review to the compiled subtyping
Rscript -e "rmarkdown::render('02-incorporate-clinical-feedback.Rmd', clean = TRUE)"

Expand Down