Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

PBTA Histologies: Integrated molecular subtyping to base histology (7 of N) #870

Merged
merged 68 commits into from
Jan 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
bd3a94d
update to use base histologies file
kgaonkar6 Dec 7, 2020
bea7b1d
update to use base histology file
kgaonkar6 Dec 7, 2020
cef95e9
update to use base histology v18
kgaonkar6 Dec 7, 2020
ada16ca
uupdated to filtered fusion v18
kgaonkar6 Dec 7, 2020
cfd1a81
rerun tsne for v18 subtyping
kgaonkar6 Dec 7, 2020
86f39a4
adding all subtyping and gsea re-run
kgaonkar6 Dec 7, 2020
2cb5991
Update README.md
kgaonkar6 Dec 9, 2020
509cecc
Update README.md
kgaonkar6 Dec 10, 2020
7209be1
Update README.md
kgaonkar6 Dec 10, 2020
7d28e9d
run-for-subtyping update
kgaonkar6 Dec 10, 2020
d926f9b
run-for-subtyping update
kgaonkar6 Dec 10, 2020
3ee5bc7
run-for-subtyping update
kgaonkar6 Dec 10, 2020
8288ddf
run-for-subtyping update
kgaonkar6 Dec 10, 2020
3dc81ed
run-for-subtyping update
kgaonkar6 Dec 10, 2020
0850c7d
integrated_dx, mol_sub added to base histology
kgaonkar6 Dec 11, 2020
5d97b93
update README
kgaonkar6 Dec 11, 2020
d70bef4
Update README.md
jharenza Dec 14, 2020
f2f6576
Update README.md
jharenza Dec 14, 2020
e1052bd
Update README.md
jharenza Dec 14, 2020
33991b0
Update README.md
jharenza Dec 14, 2020
7ff3290
Update README.md
jharenza Dec 14, 2020
adf4dae
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Dec 15, 2020
7f53d8f
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Dec 15, 2020
e90dd42
Update analyses/molecular-subtyping-integrate/README.md
kgaonkar6 Dec 15, 2020
0eab36a
re-run with udpated pbta-histologies-base.tsv
kgaonkar6 Dec 16, 2020
5802095
Merge branch 'all_subtyping_gsea_rerun' of https://github.com/kgaonka…
kgaonkar6 Dec 16, 2020
ae749e2
Merge branch 'all_subtyping_gsea_rerun' of https://github.com/kgaonka…
kgaonkar6 Dec 16, 2020
94c99b2
re-run with udpated pbta-histologies-base.tsv
kgaonkar6 Dec 16, 2020
26b9520
Merge branch 'v18_int_dx_histology' of https://github.com/kgaonkar6/O…
kgaonkar6 Dec 16, 2020
7267aa5
re-run with pull changes
kgaonkar6 Dec 16, 2020
45d5f41
int-dx had .x and .y removing
kgaonkar6 Dec 16, 2020
95e5873
Update README.md
kgaonkar6 Dec 17, 2020
a9224aa
Update config.yml
kgaonkar6 Dec 17, 2020
f1e69f0
Update run-gsea.sh
kgaonkar6 Dec 17, 2020
acc1dc6
Update run-for-subtyping.sh
kgaonkar6 Dec 17, 2020
300019b
Update run-gsea.sh
kgaonkar6 Dec 17, 2020
83da77b
add histology <- base_histology %>%
kgaonkar6 Dec 17, 2020
efb6b6b
add harmonized_diagnosis
kgaonkar6 Dec 17, 2020
018db25
update, checks for final file before saving
kgaonkar6 Dec 22, 2020
aa65fcb
remove duplicates
kgaonkar6 Dec 22, 2020
f6506c9
Merge branch 'all_subtyping_gsea_rerun' of https://github.com/kgaonka…
kgaonkar6 Dec 22, 2020
04a0252
Merge branch 'all_subtyping_gsea_rerun' of https://github.com/kgaonka…
kgaonkar6 Dec 22, 2020
2279738
remove duplicate from final
kgaonkar6 Dec 22, 2020
d305a8a
Other pathology_diagnosis samples broad/short and harm_dx update
kgaonkar6 Dec 24, 2020
82fe828
re-run with updated CI and molecular-subtyping-pathology
kgaonkar6 Jan 5, 2021
3e89789
Merge branch 'all_subtyping_gsea_rerun' into v18_int_dx_histology
kgaonkar6 Jan 5, 2021
dcb5679
rerun with updated CI and molecular-subtyping-pathology
kgaonkar6 Jan 5, 2021
9069b5e
Merge remote-tracking branch 'origin/master' into v18_int_dx_histology
cansavvy Jan 6, 2021
42839cb
Revert fusion_filtering to what is in master
cansavvy Jan 6, 2021
660dcc6
Merge remote-tracking branch 'origin/master' into v18_int_dx_histology
cansavvy Jan 7, 2021
e0b7185
Revert two transcriptomic-dim-red files to what is in master because …
cansavvy Jan 7, 2021
b6588bf
Get rid of that sneaky RUN_FOR_SUBTYPING line in dim-red-plots.sh
cansavvy Jan 7, 2021
482b723
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Jan 7, 2021
97963af
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Jan 7, 2021
4897260
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Jan 7, 2021
760f7a4
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Jan 7, 2021
989b53f
weird spaces removed
kgaonkar6 Jan 8, 2021
2e96e40
Merge branch 'v18_int_dx_histology' of https://github.com/kgaonkar6/O…
kgaonkar6 Jan 8, 2021
914b80c
weird spaces removed
kgaonkar6 Jan 8, 2021
ac20b04
weird spaces removed
kgaonkar6 Jan 8, 2021
2f5ec46
added desc and CI run
kgaonkar6 Jan 8, 2021
ce32ad9
adding to CI
kgaonkar6 Jan 8, 2021
ce55a27
run CI with OPENPBTA_TESTING=1
kgaonkar6 Jan 8, 2021
d40412f
remove OPENPBTA_TESTING=1 specific data folder conditions
kgaonkar6 Jan 8, 2021
8310388
Update analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
kgaonkar6 Jan 8, 2021
3a3c185
adding more description
kgaonkar6 Jan 8, 2021
681d254
typos
kgaonkar6 Jan 8, 2021
337e66e
added diagram
kgaonkar6 Jan 8, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ jobs:
name: Molecular Subtyping - CRANIO
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-CRANIO/run-molecular-subtyping-cranio.sh

- run:
name: Molecular Subtyping - INTEGRATE to BASE histology
command: ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-integrate/run-subtyping-integrate.sh

# Deprecated - these results do not include germline calls and therefore are insufficient by subtyping
# - run:
# name: SHH TP53 Molecular Subtyping
Expand Down
3 changes: 2 additions & 1 deletion analyses/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ Note that _nearly all_ modules use the harmonized clinical data file (`pbta-hist
| [`molecular-subtyping-SHH-tp53`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-SHH-tp53) | `pbta-histologies` <br> `pbta-snv-consensus-mutation.maf.tsv.gz` | *Deprecated*; Identify the SHH-classified medulloblastoma samples that have TP53 mutations [#247](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/247) | N/A
| [`molecular-subtyping-chordoma`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-chordoma) | `analyses/focal-cn-file-preparation/results/consensus_seg_annotated_cn_autosomes.tsv.gz` <br> `pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` | *In progress*; identifying poorly-differentiated chordoma samples per [#250](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/250) | N/A
| [`molecular-subtyping-embryonal`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-embryonal) | `pbta-histologies-base.tsv` <br> `analyses/fusion-summary/fusion_summary_embryonal_foi.tsv` <br> `pbta-sv-manta.tsv.gz` <br> `analyses/focal-cn-file-preparation/consensus_seg_annotated_cn_x_and_y.tsv.gz` <br> `analyses/focal-cn-file-preparation/cnvkit_annotated_cn_x_and_y.tsv.gz` <br> `analyses/focal-cn-file-preparation/controlfreec_annotated_cn_x_and_y.tsv.gz` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds` <br> `analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.polya.rds` | Molecular subtyping of non-medulloblastoma, non-ATRT embryonal tumors [#251](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/251) | `results/embryonal_tumor_molecular_subtypes.tsv`
| [`molecular-subtyping-integrate`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-integrate) | `pbta-histologies-base.tsv` <br> `results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv` | Add molecular subtype information to base histology | `results/pbta-histologies.tsv`
| [`molecular-subtyping-neurocytoma`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-neurocytoma) | `pbta-histologies-base.tsv` | Molecular subtyping of Neurocytoma samples [#805](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/805) | `results/neurocytoma_subtyping.tsv`
| [`molecular-subtyping-pathology`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) | `analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv` <br> `analyses/molecular-subtyping-EWS/results/EWS_samples.tsv` <br> `analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv` <br> `analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv` | Compile output from other molecular subtyping modules and incorporate pathology feedback [#645](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/645) | `results/compiled_molecular_subtyping_with_pathology_feedback.tsv`
| [`molecular-subtyping-pathology`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) | `analyses/molecular-subtyping-CRANIO/results/CRANIO_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-EPN/results/CRANIO_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-MB/results/MB_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-neurocytoma/results/neurocytoma_subtyping.tsv` <br> `analyses/molecular-subtyping-EWS/results/EWS_samples.tsv` <br> `analyses/molecular-subtyping-HGG/results/HGG_molecular_subtype.tsv` <br> `analyses/molecular-subtyping-LGAT/results/lgat_subtyping.tsv` <br> `analyses/molecular-subtyping-embryonal/results/embryonal_tumor_molecular_subtypes.tsv` | Compile output from other molecular subtyping modules and incorporate pathology feedback [#645](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/645) | `results/compiled_molecular_subtyping_with_clinical_feedback.tsv` <br> `results/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv`
| [`mutational-signatures`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutational-signatures) | `pbta-snv-consensus-mutation.maf.tsv.gz` | Performs COSMIC and Alexandrov et al. mutational signature analysis using the consensus SNV data | N/A
| [`mutect2-vs-strelka2`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/mutect2-vs-strelka2) | `pbta-snv-mutect2.vep.maf.gz` <br> `pbta-snv-strelka2.vep.maf.gz` | *Deprecated*; comparison of only two SNV callers, subsumed by `snv-callers` | N/A
| [`oncoprint-landscape`](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/oncoprint-landscape) | `pbta-snv-consensus-mutation.maf.tsv.gz` <br> `pbta-fusion-putative-oncogenic.tsv` <br> `analyses/focal-cn-file-preparation/results/controlfreec_annotated_cn_autosomes.tsv.gz` <br> `independent-specimens.*` | Combines mutation, copy number, and fusion data into an OncoPrint plot ([#6](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/6)); will need to be updated as all data types are refined | N/A
Expand Down
213 changes: 213 additions & 0 deletions analyses/molecular-subtyping-integrate/01-integrate-subtyping.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
---
title: "Integrate molecular subtyping results"
output:
html_notebook:
toc: true
toc_float: true
author: Krutika Gaonkar for D3b
date: 2020
---

The purpose of this notebook is to integrate molecular subtyping results from
[molecular-subtyping-pathology](https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/master/analyses/molecular-subtyping-pathology) with `pbta-histologies-base.tsv`.

Here we will use `pbta-histologies-base.tsv` in which integrated_diagnosis,Notes and molecular_subtype are all NA. Through all the following molecular subtyping modules:

- molecular-subtyping-MB
- molecular-subtyping-CRANIO
- molecular-subtyping-EPN
- molecular-subtyping-embryonal
- molecular-subtyping-EWS
- molecular-subtyping-neurocytoma
- molecular-subtyping-HGG
- molecular-subtyping-LGAT
- molecular-subtyping-pathology


We gathered and updated molecular-subtype AND integrated_diagnosis AND broad_histology AND short_histology for these histologies.

In this notebook we will add the molecular subtyping information compiled and updated by pathology review in `molecular-subtyping-pathology/compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv` to create the `pbta-histologies.tsv` for the same release. If samples are not processed by a molecular-subtyping-* module then the br

In adddition, for samples that where pathology_diagnosis is "Other" we also update the file broad_histology and short_histology from a manual review of WHO terms.

![](https://user-images.githubusercontent.com/34580719/103105428-c63e1f80-45fb-11eb-8548-28bcba0b2dba.png)

## Set up

```{r}
library(tidyverse)
data_dir <- "../../data/"


base_histology <- read_tsv(file.path(data_dir,"pbta-histologies-base.tsv"),
col_types = readr::cols(molecular_subtype = readr::col_character(),
short_histology = readr::col_character(),
broad_histology = readr::col_character(),
Notes = readr::col_character())) %>%
unique()
```

### Read molecular-subtyping-pathology results

Reading molecular_subtype, integrated_diagnosis, short_histology, broad_histology and Notes from `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv`

```{r}

compiled_subtyping<-read_tsv(file.path("..", "molecular-subtyping-pathology", "results", "compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv"))

```

Update "Other" sample broad/short histology and harmonized_diagnosis and add to `compiled_subtyping`

```{r}

Other_subtypes <- read_tsv(file.path("input","pathology_dx and pathology_free_text_diagnosis to broad_histology for subtyping module - rules_without_subtype_WIP.tsv"))

compiled_subtyping_other <- base_histology %>%
select(
Kids_First_Participant_ID ,
sample_id,
Kids_First_Biospecimen_ID,
molecular_subtype,
integrated_diagnosis,
pathology_diagnosis,
pathology_free_text_diagnosis,
tumor_descriptor
) %>%
# remove Kids_First_Biospecimen_ID which are subtypes "ETMR/Embryonal"
filter(!Kids_First_Biospecimen_ID %in% compiled_subtyping$Kids_First_Biospecimen_ID) %>%
# get pathology_diagnosis=="Other"
filter(pathology_diagnosis=="Other") %>%
left_join(Other_subtypes,by=c("pathology_free_text_diagnosis")) %>%
mutate(Notes = "Updated by manual review of WHO diagnosis") %>%
select(
# gather only columns needed to format as `compiled_subtyping`
Kids_First_Participant_ID ,
sample_id,
Kids_First_Biospecimen_ID,
molecular_subtype,
integrated_diagnosis,
tumor_descriptor,
broad_histology,
short_histology,
Notes,
# adding harmonized_diagnosis from manual review
# for pathology_diagnosis=="Other"
harmonized_diagnosis
) %>%
unique()


# combined OpenPBTA subtypes and manual "Other" subtypes
compiled_subtyping <- compiled_subtyping_other %>%
bind_rows(compiled_subtyping)

```



### Add molecular-subtyping-pathology results

We will add molecular_subtype, integrated_diagnosis and Notes from `compiled_subtyping`

short_histology and broad_histology will be added from base histology for samples that are not subtyped as part of `molecular-subtype-pathology`

```{r}

histology <- base_histology %>%
select(-Notes,-molecular_subtype,-integrated_diagnosis) %>%
left_join(compiled_subtyping,by=c("Kids_First_Biospecimen_ID","sample_id","Kids_First_Participant_ID","tumor_descriptor"),suffix=c(".base",".subtyped")) %>%
unique() %>%
mutate(
broad_histology = if_else(!is.na(broad_histology.subtyped),
broad_histology.subtyped,
broad_histology.base),
short_histology = if_else(!is.na(short_histology.subtyped),
short_histology.subtyped,
short_histology.base),
harmonized_diagnosis =
case_when(!is.na(integrated_diagnosis) ~ integrated_diagnosis,
is.na(integrated_diagnosis) &
!is.na(harmonized_diagnosis) ~ harmonized_diagnosis,
is.na(integrated_diagnosis) &
is.na(harmonized_diagnosis) &
!is.na(pathology_diagnosis) ~ pathology_diagnosis
))
```



### Check if any duplicates

```{r}
dup_ids<-histology$Kids_First_Biospecimen_ID[duplicated(histology$Kids_First_Biospecimen_ID)]

histology[which(histology$Kids_First_Biospecimen_ID %in% dup_ids),]
```

No duplicates

### Check if broad_histology, short_histology or harmonized_diagnosis

Are there NA in broad_histology, short_histology or harmonized_diagnosis

```{r}
histology %>%
filter(sample_type=="Tumor",
(is.na(broad_histology)| is.na(short_histology)| is.na(harmonized_diagnosis))) %>%
tally()

```

No NAs in broad_histology, short_histology or harmonized_diagnosis

Just a note, integrated_diagnosis is expected to be `NA` for samples where subtyping is not performed or if molecular_subtype is "XYZ,To be classified".
This means no evidence was provided/available for these samples so we are not able to add integrated_diagnosis.

#### Check differences in broad_histology
Checking for differences in broad_histology to look for changes in molecular_subtype


```{r}
diff_broad_histology<- histology %>%
filter(toupper(broad_histology.base) != toupper(broad_histology.subtyped)) %>%
select(Kids_First_Biospecimen_ID,starts_with("broad_histology"),starts_with("short_histology")) %>%
unique()

diff_broad_histology
```

#### Check differences in short_histology
Here we want to check for short_histology changes not part of `Check differences in broad_histology` chunk.
This will help us check what string assignment path_dx to short_histology mapping has changed from `molecular-subtyping-pathology`

```{r}

histology %>%
filter(!Kids_First_Biospecimen_ID %in% diff_broad_histology$Kids_First_Biospecimen_ID) %>%
filter(toupper(short_histology.base) != toupper(short_histology.subtyped)
) %>%
select(Kids_First_Biospecimen_ID,starts_with("broad_histology"),starts_with("short_histology")) %>%
unique()

```
The above 172 changes occurred because of changes in string value assignment
In `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv` a sample with broad_histology that is `Ependymal tumor` is `EPN`, but in `pbta-histologies-base.tsv` it is `Ependymoma`.
For samples where broad_histology is `Embryonal tumor`, short_histology is also `Embryonal tumor` but in base histology it was `ETMR`.

45 Benign ,Non-(CNS) tumor and other samples where pathology_diagnosis == "Other",have short_histology updated from manual review of WHO diagnosis terms.

### Save
Let's save the final file.

But first need to remove broad_histology.base, broad_histology.subtyped and short_histology.base
and short_histology.subtyped

```{r}
histology %>%
select(-broad_histology.base,
-broad_histology.subtyped,
- short_histology.base,
-short_histology.subtyped) %>%
write_tsv("results/pbta-histologies.tsv")
```
3,197 changes: 3,197 additions & 0 deletions analyses/molecular-subtyping-integrate/01-integrate-subtyping.nb.html

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions analyses/molecular-subtyping-integrate/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Integrate molecular subtyping output from pathology feedback

**Author of code and documentation:** [@kgaonkar6](https://github.com/kgaonkar6)

In this repo, we add molecular subtype for molecular_subtype from all subtyping modules and integrated_diagnosis, short_histology, broad_histology, and Notes from `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv`.

### Usage
```sh
bash run-subtyping-integrate.sh
```

### Module contents

`01-integrate-subtyping.Rmd` integrates results from compiled results in `compiled_molecular_subtypes_with_clinical_pathology_feedback.tsv` to `pbta-histologies-base.tsv`
Loading