diff --git a/.circleci/config.yml b/.circleci/config.yml index 7d3faf9859..0a8ba1231d 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -20,10 +20,6 @@ jobs: name: Check python packages command: ./scripts/run_in_ci.sh bash scripts/check-python.sh - - run: - name: High level histology grouping for plot labels - command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', clean = TRUE)" - - run: name: Sample Distribution Analyses command: ./scripts/run_in_ci.sh bash "analyses/sample-distribution-analysis/run-sample-distribution.sh" diff --git a/figures/mapping-histology-labels.Rmd b/figures/mapping-histology-labels.Rmd index ac24d02a36..2faa73c6b8 100644 --- a/figures/mapping-histology-labels.Rmd +++ b/figures/mapping-histology-labels.Rmd @@ -1,52 +1,42 @@ --- -title: "Mapping histology labels for plots" +title: "Create a minimal palette for displaying multiple disease labels" output: html_notebook: toc: true toc_float: true -author: Candace Savonen for ALSF - CCDL +author: Candace Savonen, Krutika Gaonkar, and Jaclyn Taroni date: 2021 --- -# Purpose: +## Purpose -The histology label variables included in `pbta-histologies.tsv` from data releases are not always useful for visualizing the full set of biospecimens due to the large number of different values. -Having too many different possible values makes the colors harder to distinguish. -In addition, there are some groups that are represented by only a very few samples; giving such groups a distinct color may be counterproductive. +There are multiple "disease labels" in the `pbta-histologies.tsv` file, including (from most broad to most narrow) `broad_histology`, `cancer_group`, and `harmonized_diagnosis`. +For context, it is helpful to note that an individual `cancer_group` will be nested under a single `broad_histology` and that `cancer_group` is a shorter form of `harmonized_diagnosis` with the following edits: -The goal of this notebook is to use the currently existing `broad_histology` groups from `pbta-histologies.tsv`, to form 10-15 "high level histology" group labels that can used for plotting purposes. - -## The output table - -The output of this notebook is a TSV file: `palettes/histology_label_color_table.tsv` that contains the following fields: - -**Copied from `pbta-histologies.tsv`**: -- `Kids_First_Biospecimen_ID` (from `pbta-histologies.tsv`) -- All the original histology label variables (`broad_histology`, `short_histology`, etc.) - -**Created in this notebook**: -- `display_group` - the high-level histology labels that should be used for plotting -- `hex_codes` the direct colors that correspond to display_groups -- `cancer_group_hex_codes` the direct colors that correspond to cancer_groups - -With this info, `histology_label_color_table.tsv` can be used by all plots and figures that summarize high level data while displaying histology information. +- Other, Benign tumor and Dysplasia/Gliosis, Dysplasia/Gliosis-Glial-neuronal tumor NOS removed from `cancer_group` +- Neurofibroma/Plexiform;Other updated to Neurofibroma/Plexiform +- Non-germinomatous germ cell tumor;Teratoma updated to Teratoma +- Anaplastic (malignant) meningioma, Meningothelial meningioma and Clear cell meningioma updated to Meningioma +- Embryonal Tumor with Multilayered Rosettes updated to Embryonal tumor with multilayer rosettes -# How `display_group` is made: +It is often useful to use color to indicate disease label in a plot where multiple groups are visualized when we can not rely particularly heavily on labels (e.g., scatter plots). +Unfortunately, there are too many potential labels for us to generate an effective color palette (e.g., of sufficiently distinct colors). +In addition, some groupings will contain very few samples. -Here's how `broad-histology` groups are [combined into the higher-level groupings of `display_group`](#declare-new-equivalent-groups). +The purpose of this notebook is to create color palettes for the following: -1) "Lymphoma", "Melanocytic tumor", "Other tumor", "Metastatic tumors", "Non-CNS tumor" are combined into a `Other tumor` in `display_group`. +* `broad_histology` values, where a `broad_histology` contains at least one `cancer_group` with n >= 10 +* `cancer_group` values with n >= 10 -2) `Benign tumor` and `Non-tumor` biospecimens are combined into a `Benign` group. +**Note: This is tied to `release-v21-20210820`.** -3) `Other astrocytic tumor` biospecimens are combined into the existing `Low-grade astrocytic tumor`. These biospecimens in `other astrocytic tumors` were low-grade SEGA tumors. +### Background -4) Anything not in the above categories gets its `broad_histology` label carried over. +You may find [#1174](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174) to be helpful context. -# Usage +## Usage -This notebook can be run via the command line from the top directory of the -repository as follows: +This notebook can be run via the command line from the top directory of the repository as follows: ``` Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', @@ -56,284 +46,346 @@ Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', ## Set Up ```{r} -# Magrittr pipe -`%>%` <- dplyr::`%>%` +library(tidyverse) +library(RColorBrewer) ``` ### Directories and Files ```{r} # Path to input directory -input_dir <- file.path("..", "data") +input_dir <- file.path("..", "data", "release-v21-20210820") output_dir <- "palettes" ``` -# Read in metadata - -Which variables are we keeping for this table? - -```{r} -histology_variables <- - c("integrated_diagnosis", - "Notes", - "harmonized_diagnosis", - "broad_histology", - "short_histology", - "cancer_group") -``` +## Read in metadata -Let's read in the current release's `pbta-histologies.tsv` file. +Let's read in the `pbta-histologies.tsv` file from `release-v21-20210820`. ```{r} -metadata <- +histologies_df <- readr::read_tsv(file.path(input_dir, "pbta-histologies.tsv"), guess_max = 10000) ``` -Now we'll select histology variables we mentioned above and so capitalization differences don't get in the way with this process, we will change everything to lower case for now. - -```{r} -working_metadata <- metadata %>% - dplyr::select(Kids_First_Biospecimen_ID, sample_type, histology_variables) %>% - dplyr::mutate(broad_histology_lower = tolower(broad_histology)) -``` - -# Take a look at how many biospecimens per `broad_histology` group +## Identify values to include in palettes -Let's summarize `broad_histology`. -Because the `Normal` samples don't have histologies, we'll look at just the `Tumor` samples at for this summary. +We will use `cancer_group` with n >= 10 to guide what values to include in both our `cancer_group` and `broad_histology` palettes. ```{r} -broad_summary <- working_metadata %>% - dplyr::filter(sample_type == "Tumor") %>% - dplyr::count(broad_histology_lower) %>% - dplyr::arrange(n) +included_labels_df <- histologies_df %>% + # Exclude normal samples + filter(sample_type == "Tumor") %>% + # Filter to unique sample--"disease label" pairs + select(sample_id, + broad_histology, + cancer_group) %>% + distinct() %>% + # Count samples (e.g., sample_id) + group_by(broad_histology, cancer_group) %>% + tally() %>% + # Add a column called included which is a logical that can be used as + # a sample size filter & also to drop the NA values + filter(n >= 10, + !is.na(cancer_group)) + +included_labels_df ``` -Let's print out the summary. +So the unique values for `broad_histology` and `cancer_group` above are what we need to take into account for our palette. -```{r} -broad_summary %>% - knitr::kable() -``` - -There's handful of very small groups (many are n = 2). +## Create palettes -## Declare new equivalent groups +Outside of this notebook, we've done quite a bit of work to identify suitable palettes using http://phrogz.net/css/distinct-colors.html as a reference/starting point. +Check out the discussion on [#1174](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174)! -These groups we'll combine into a non-CNS/other tumor group. +### `broad_histology` ```{r} -other_tumor <- c("lymphoma", "melanocytic tumor", "other tumor", "metastatic tumors", "non-cns tumor") -``` - -These groups we'll combine as a benign. +broad_histology_df <- data.frame( + broad_histology = c("Benign tumor", + "Diffuse astrocytic and oligodendroglial tumor", + "Embryonal tumor", + "Ependymal tumor", + "Germ cell tumor", + "Low-grade astrocytic tumor", + "Meningioma", + "Mesenchymal non-meningothelial tumor", + "Neuronal and mixed neuronal-glial tumor", + "Tumor of cranial and paraspinal nerves", + "Tumors of sellar region"), + broad_histology_hex = c("#590024", + "#ff80e5", + "#220040", + "#2200ff", + "#0074d9", + "#8f8fbf", + "#2db398", + "#7fbf00", + "#685815", + "#ffaa00", + "#b2502d"), + stringsAsFactors = FALSE +) -```{r} -benign <- c("benign tumor", "non-tumor") +# value for "other" histologies +broad_histology_other_hex <- "#808080" ``` -Add in the `Other astrocytic tumor` in with the LGAT group. +Now to create a legend with `legend()` (h/t [this StackOverflow answer](https://stackoverflow.com/questions/48966645/how-can-i-create-a-legend-without-a-plot-in-r/48966924)) ```{r} -lgat <- c("other astrocytic tumor", "low-grade astrocytic tumor") +plot(NULL, xaxt = "n", yaxt = "n", bty = "n", ylab = "", xlab = "", + xlim = 0:1, ylim = 0:1) +legend("topleft", + legend = c(broad_histology_df$broad_histology, "Other"), + col = c(broad_histology_df$broad_histology_hex, + broad_histology_other_hex), + pch = 15, pt.cex = 2, cex = 1, bty = "n") +mtext("Broad Histology", at = 0.135, cex = 1.5) ``` -# Make new `display_group` +### `cancer_group` -```{r} -histology_table <- working_metadata %>% - dplyr::mutate( - # NAs are really Normals - display_group = tidyr::replace_na(broad_histology_lower, "normal"), - # Now do the group combining - display_group = forcats::fct_collapse(display_group, - "low-grade astrocytic tumor" = lgat, - "other tumor" = other_tumor, - "benign" = benign - ), - # Put this as a character for later handling - display_group = as.character(display_group) - ) -``` +There are 17 `cancer_group` values that we need to account for. +These are best used in conjunction with _labels_ in figures, but are intended to allow readers to "track" labels _across figures_. -Print out the number of `display_group` (including `normal`)! +Where there's a 1:1 mapping between `broad_histology` and `cancer_group`, the hex codes will be the same. ```{r} -display_group_df <- histology_table %>% - dplyr::count(display_group) %>% - dplyr::arrange(n) - -knitr::kable(display_group_df) +cancer_group_df <- data.frame( + cancer_group = c("Choroid plexus papilloma", + "Diffuse intrinsic pontine glioma", + "Diffuse midline glioma", + "High-grade glioma astrocytoma", + "Atypical Teratoid Rhabdoid Tumor", + "CNS Embryonal tumor", + "Medulloblastoma", + "Ependymoma", + "Teratoma", + "Ganglioglioma", + "Low-grade glioma astrocytoma", + "Meningioma", + "Ewing sarcoma", + "Dysembryoplastic neuroepithelial tumor", + "Neurofibroma Plexiform", + "Schwannoma", + "Craniopharyngioma"), + cancer_group_hex = c("#4d2635", + "#bf0099", + "#ff40d9", + "#ffccf5", + "#4d0d85", + "#b08ccf", + "#a340ff", + "#2200ff", + "#058aff", + "#8c8cff", + "#000080", + "#2db398", + "#9fbf60", + "#614e01", + "#e6ac39", + "#ab7200", + "#b33000"), + stringsAsFactors = FALSE +) +# Value for "other" groups +cancer_group_other_hex <- "#b5b5b5" ``` -Make this notebook stop if there are more than 16 histology groups + `Normal`. +And again, we'll create a legend with `legend()` ```{r} -if (nrow(display_group_df) > 18) { - stop("There are more than 18 categories in `display_group`. We may want to re-evaluate the high-level histology groupings") -} +plot(NULL, xaxt = "n", yaxt = "n", bty = "n", ylab = "", xlab = "", + xlim = 0:1, ylim = 0:1) +legend("topleft", + legend = c(cancer_group_df$cancer_group, "Other"), + col = c(cancer_group_df$cancer_group_hex, cancer_group_other_hex), + pch = 15, pt.cex = 1.5, cex = 0.75, bty = "n") +mtext("Cancer Group", at = 0.0625, cex = 1) ``` -# Make `display_order` +### Output -Get ranks in order of big to small and make them into a new column in `display_group_df`. -We will always want the "normal", "benign", "other_tumor" groups to come last so we will push then to the end of the factor order. +We can create a data frame that contains both palettes with a series of left joins, where we will then fill the NA values with a single (gray) hex code per column (`r broad_histology_other_hex` for `broad_histology`, `r cancer_group_other_hex` for `cancer_group`.) ```{r} -display_order_df <- display_group_df %>% - dplyr::mutate(display_group = forcats::fct_reorder(display_group, n, .desc = TRUE) %>% - forcats::fct_relevel("benign", "other tumor", "normal", after = Inf), - display_order = as.numeric(display_group)) # save the factor order for text table export +palette_df <- histologies_df %>% + # Exclude normal samples + filter(sample_type == "Tumor") %>% + # Filter to unique broad histology--cancer group pairs + select(broad_histology, + cancer_group) %>% + distinct() %>% + # Add broad histology palette + left_join(broad_histology_df, by = "broad_histology") %>% + # Add cancer group palette + left_join(cancer_group_df, by = "cancer_group") %>% + # Fill all other values with gray colors + replace_na(list(broad_histology_hex = broad_histology_other_hex, + cancer_group_hex = cancer_group_other_hex)) %>% + # The exception being - if cancer_group == NA, so should cancer_group_hex! + mutate(cancer_group_hex = if_else(is.na(cancer_group), + NA_character_, + cancer_group_hex)) %>% + # Sort by broad_histology for easy browsing + arrange(broad_histology) ``` -# Make `cancer_group_order` +And now let's take a look! -`cancer_group` is a shorter form of `harmonized_diagnosis` with the following edits: -- Removed Other, Benign tumor and Dysplasia/Gliosis,Dysplasia/Gliosis-Glial-neuronal tumor NOS removed from `cancer_group` -- Neurofibroma/Plexiform;Other updated to Neurofibroma/Plexiform -- Non-germinomatous germ cell tumor;Teratoma updated to Teratoma -- Anaplastic (malignant) meningioma, Meningothelial meningioma and Clear cell meningioma updated to Meningioma -- Embryonal Tumor with Multilayered Rosettes updated to Embryonal tumor with multilayer rosettes - -Get ranks in order of big to small and make them into a new dataframe `cancer_group_order_df`. - ```{r} -cancer_group_order_df <- histology_table %>% - dplyr::count(cancer_group,name = "cancer_group_n") %>% - dplyr::mutate( - cancer_group = forcats::fct_reorder(cancer_group, cancer_group_n, .desc = TRUE), - cancer_group_order = as.numeric(cancer_group)) # save the factor order for text table export +palette_df ``` +### Add display names for convenience -Add on the `display_order` column using `inner_join` +When multiple values are using the same color, it can be helpful to have a separate value for the legend, e.g., for all `#808080` broad histologies, we may want to display `Other`. +We'll add a couple columns for legend-making convenience. ```{r} -histology_table <- histology_table %>% - # Join on the display orders - dplyr::inner_join(display_order_df, by = "display_group") %>% - # Join on the cancer_group orders - dplyr::inner_join(cancer_group_order_df, by = "cancer_group") +palette_df <- palette_df %>% + mutate(broad_histology_display = if_else(broad_histology_hex == broad_histology_other_hex, + "Other", + broad_histology), + cancer_group_display = if_else(cancer_group_hex == cancer_group_other_hex, + "Other", + cancer_group)) ``` -# Add hex codes for display_group and cancer_group +### Add `broad_histology_order` -These hex codes were retrieved from http://phrogz.net/css/distinct-colors.html with the settings on default for 18 colors. +Previously, we had a concept known as `display_order` where we ordered categories based on their number of samples (from large to small). +Now that we've dropped `display_group`, let's apply this same concept to `broad_histology`. ```{r} -color_palette_display <- - c("#ff0000", "#cc0000", "#995200", "#bfb300", "#fffbbf", - "#2e7300", "#00e65c", "#00ffee", "#103d40", "#0085a6", - "#003380", "#4073ff", "#737899", "#70008c", "#f2b6ee", - "#ff40bf", "#8c0038", "#330d12" -) - -color_palette_cancer_group <- - c("#ff0000", "#f20000", "#997373", "#403030", "#330700", - "#ff9180", "#591800", "#b2502d", "#cca799", "#ff6600", - "#ffb380", "#7f5940", "#cc6d00", "#331b00", "#ccb499", - "#ffaa00", "#996600", "#594316", "#ffd580", "#ffee00", - "#998f00", "#999673", "#303300", "#fbffbf", "#ccff00", - "#494d39", "#b5d96c", "#6a8040", "#66ff00", "#42a600", - "#bfffbf", "#003307", "#00661b", "#00ff88", "#86b39e", - "#00b377", "#006652", "#00ffee", "#00a7b3", "#bffbff", - "#567173", "#00ccff", "#003d4d", "#00aaff", "#267399", - "#0088ff", "#0042a6", "#001a40", "#bfd9ff", "#0044ff", - "#394973", "#000e66", "#bfbfff", "#9180ff", "#5800a6", - "#754d99", "#aa00ff", "#3a3040", "#aa86b3", "#530059", - "#ff00ee", "#a60085", "#330022", "#ff80d5", "#ff0088", - "#804062", "#a60042", "#590024", "#ffbfd9", "#ff0044", - "#990014", "#ff8091" -) - +broad_histology_order_df <- histologies_df %>% + # Exclude normal samples + filter(sample_type == "Tumor", + # Only count histologies that we'll have a hex code for + broad_histology %in% included_labels_df$broad_histology) %>% + # Filter to unique sample--broad_histology pairs + select(sample_id, + broad_histology) %>% + distinct() %>% + # Count samples within a broad histology + count(broad_histology) %>% + # Add Other placeholder + bind_rows(data.frame(broad_histology = "Other", + n = 0, + stringsAsFactors = FALSE)) %>% + # Reorder based on sample size except Benign tumor and Other should come last + # And then add numeric column with the order + mutate(broad_histology = forcats::fct_reorder(broad_histology, + n, + .desc = TRUE) %>% + forcats::fct_relevel("Benign tumor", + "Other", + after = Inf), + broad_histology_order = as.numeric(broad_histology)) %>% + # No longer require the sample size + select(-n) + +broad_histology_order_df ``` -Declare how many colors we need. +And now we're ready to add this to the palette data frame. ```{r} -n_colors_display <- nrow(display_group_df) -n_colors_cancer_group <- nrow(cancer_group_order_df) +palette_df <- palette_df %>% + left_join(broad_histology_order_df, + by = c("broad_histology_display" = "broad_histology")) ``` -Make a named list color key where histologies are the names. +### Add `oncoprint_group` and `oncoprint_hex` -```{r} -# Set seed so the colors are consistent upon re-run -set.seed(2021) +For most plots that make use of the `cancer_group` palette, such as a box or violin plot, we will rely heavily on labels and therefore using the gray hex code for multiple groups will not be a problem. -# Sample from the 18 colors for display_group -subset_colors_display <- sample(color_palette_display, n_colors_display) -names(subset_colors_display) <- display_order_df$display_group +We will have four panels of individual oncoprints, where many `broad_histology` values will get grouped together into the `Other CNS` panel which you can see [here](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/d31c927a27813ec0b8032fbe768002f31723636f/analyses/oncoprint-landscape/02-plot-oncoprint.R#L181). +We can move this information into our palette data frame. -# Sample from the 62 colors for cancer_group -subset_colors_cancer_group <- sample(color_palette_cancer_group, n_colors_cancer_group) -names(subset_colors_cancer_group) <- cancer_group_order_df$cancer_group -``` - -Remove from subset_colors_cancer_group ```{r} -# We will assign a gray color for NA below -subset_colors_cancer_group <- subset_colors_cancer_group[!is.na(names(subset_colors_cancer_group))] -``` - -We want `Other tumor` and the `Benign` in display_group to both always be gray. +# Taken from the current plot oncoprint script as of the writing of this +# See permalink above +other_cns_broad_histologies <- c( + "Ependymal tumor", + "Tumors of sellar region", + "Neuronal and mixed neuronal-glial tumor", + "Tumor of cranial and paraspinal nerves", + "Meningioma", + "Mesenchymal non-meningothelial tumor", + "Germ cell tumor", + "Choroid plexus tumor", + "Histiocytic tumor", + "Tumor of pineal region", + "Metastatic tumors", + "Other astrocytic tumor", + "Lymphoma", + "Melanocytic tumor", + "Other tumor" +) -```{r} -subset_colors_display[names(subset_colors_display) == 'other tumor'] <- "#808080" -subset_colors_display[names(subset_colors_display) == 'benign'] <- "#D3D3D3" +palette_df <- palette_df %>% + mutate(oncoprint_group = case_when( + broad_histology %in% other_cns_broad_histologies ~ "Other CNS", + broad_histology %in% c( + "Low-grade astrocytic tumor", + "Embryonal tumor", + "Diffuse astrocytic and oligodendroglial tumor" + ) ~ broad_histology, + TRUE ~ NA_character_ + )) ``` -Normal biospecimens should not get plotted in display_group, so we will put their hex code as black. +For cancer groups that do not get their own hex code for display (i.e., due to small sample sizes), we'll use a selection of grey colors as a palette and rely heavily on the ordering of the OncoPrint legend. +Unfortunately there are over 20 "Other CNS" cancer groups that meet this criterion, so it is not feasible to have a color for each of them and they will not be included in the "Other CNS" OncoPrint. ```{r} -subset_colors_display[names(subset_colors_display) == 'normal'] <- "#000000" -``` +greys_df <- palette_df %>% + filter(cancer_group_display == "Other", + !is.na(oncoprint_group), + oncoprint_group != "Other CNS") -Use `pie` function to preview what display_group these look like. - -```{r} -pie(rep(1, n_colors_display), - col = subset_colors_display, - labels = names(subset_colors_display)) +# Sample the greys sequential palette from color brewer +set.seed(2021) +greys_df <- greys_df %>% + mutate(oncoprint_hex = sample(brewer.pal(nrow(greys_df), "Greys"))) ``` -Use `pie` function to preview what cancer_group these look like. - -```{r} -pie(rep(1, n_colors_cancer_group), - col = subset_colors_cancer_group, - labels = names(subset_colors_cancer_group)) -``` +Add `oncoprint_hex` & `oncoprint_include`, the latter will be `FALSE` when the former is `NA`. +Add `oncoprint_hex` for all other cancer groups and `oncoprint_include`. +The latter will be FALSE when the former is NA. -Add the hex codes for display_group and cancer_group to the `histology_table`. -Add gray color if cancer_group==NA ```{r} -histology_table <- histology_table %>% - # We don't need this anymore - dplyr::select(-broad_histology_lower) %>% - # Add the hex_codes - dplyr::mutate(hex_codes = dplyr::recode(display_group, !!!subset_colors_display), - cancer_group_hex_codes = dplyr::recode(cancer_group, !!!subset_colors_cancer_group), - # if cancer_group is NA for tumor sample add gray color - cancer_group_hex_codes = dplyr::if_else(is.na(cancer_group_hex_codes) & sample_type=="Tumor" - ,"#808080", - cancer_group_hex_codes) - ) %>% - # Restore capitalization so its pretty for labeling - dplyr::mutate(display_group = stringr::str_to_sentence(display_group), - # Deal with CNS exception - display_group = stringr::str_replace(display_group, "cns", "CNS") - ) +palette_df <- palette_df %>% + left_join(greys_df) %>% + # When there's an oncoprint group and a specific cancer group display color, + # use the cancer group display color as the oncoprint color + mutate(oncoprint_hex = if_else( + cancer_group_hex != cancer_group_other_hex & !is.na(oncoprint_group), + cancer_group_hex, + oncoprint_hex + ), + # Only when there's a specific oncoprint color -- even if that is a grey + # selected only for the oncoprint -- will a cancer group be included in the + # oncoprint + oncoprint_include = if_else( + is.na(oncoprint_hex), + FALSE, + TRUE + )) ``` -## Save to TSV +### Save to TSV ```{r} -readr::write_tsv(histology_table, file.path(output_dir, "histology_label_color_table.tsv")) +readr::write_tsv(palette_df, + file.path(output_dir, + "broad_histology_cancer_group_palette.tsv")) ``` -# Session Info +## Session Info ```{r} sessionInfo() diff --git a/figures/mapping-histology-labels.nb.html b/figures/mapping-histology-labels.nb.html index d8d06fdd0d..6a04b14e0a 100644 --- a/figures/mapping-histology-labels.nb.html +++ b/figures/mapping-histology-labels.nb.html @@ -9,11 +9,11 @@ - + -Mapping histology labels for plots +Create a minimal palette for displaying multiple disease labels + + +

So the unique values for broad_histology and cancer_group above are what we need to take into account for our palette.

-
-

Take a look at how many biospecimens per broad_histology group

-

Let’s summarize broad_histology. Because the Normal samples don’t have histologies, we’ll look at just the Tumor samples at for this summary.

+
+

Create palettes

+

Outside of this notebook, we’ve done quite a bit of work to identify suitable palettes using http://phrogz.net/css/distinct-colors.html as a reference/starting point. Check out the discussion on #1174!

+
+

broad_histology

- -
broad_summary <- working_metadata %>% 
-  dplyr::filter(sample_type == "Tumor") %>%
-  dplyr::count(broad_histology_lower) %>% 
-  dplyr::arrange(n) 
+ +
broad_histology_df <- data.frame(
+  broad_histology = c("Benign tumor",
+                      "Diffuse astrocytic and oligodendroglial tumor",
+                      "Embryonal tumor",
+                      "Ependymal tumor",
+                      "Germ cell tumor",
+                      "Low-grade astrocytic tumor",
+                      "Meningioma",
+                      "Mesenchymal non-meningothelial tumor",
+                      "Neuronal and mixed neuronal-glial tumor",
+                      "Tumor of cranial and paraspinal nerves",
+                      "Tumors of sellar region"), 
+  broad_histology_hex = c("#590024",
+                          "#ff80e5",
+                          "#220040",
+                          "#2200ff",
+                          "#0074d9",
+                          "#8f8fbf",
+                          "#2db398",
+                          "#7fbf00",
+                          "#685815",
+                          "#ffaa00",
+                          "#b2502d"),
+  stringsAsFactors = FALSE
+)
+
+# value for "other" histologies
+broad_histology_other_hex <- "#808080"
-

Let’s print out the summary.

+

Now to create a legend with legend() (h/t this StackOverflow answer)

- -
broad_summary %>% 
-  knitr::kable()
+ +
plot(NULL, xaxt = "n", yaxt = "n", bty = "n", ylab = "", xlab = "", 
+     xlim = 0:1, ylim = 0:1)
+legend("topleft", 
+       legend = c(broad_histology_df$broad_histology, "Other"),
+       col = c(broad_histology_df$broad_histology_hex, 
+               broad_histology_other_hex),
+       pch = 15, pt.cex = 2, cex = 1, bty = "n")
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
broad_histology_lowern
lymphoma2
melanocytic tumor2
non-cns tumor2
other tumor2
non-tumor6
choroid plexus tumor9
tumor of pineal region10
chordoma12
metastatic tumors13
histiocytic tumor14
germ cell tumor27
pre-cancerous lesion27
mesenchymal non-meningothelial tumor49
meningioma59
benign tumor70
tumors of sellar region73
neuronal and mixed neuronal-glial tumor83
tumor of cranial and paraspinal nerves83
ependymal tumor174
embryonal tumor333
diffuse astrocytic and oligodendroglial tumor370
low-grade astrocytic tumor587
- - -

There’s handful of very small groups (many are n = 2).

-
-

Declare new equivalent groups

-

These groups we’ll combine into a non-CNS/other tumor group.

- - - -
other_tumor <- c("lymphoma", "melanocytic tumor", "other tumor", "metastatic tumors", "non-cns tumor")
+ +
mtext("Broad Histology", at = 0.135, cex = 1.5)
+ +

+ -

These groups we’ll combine as a benign.

+
+
+

cancer_group

+

There are 17 cancer_group values that we need to account for. These are best used in conjunction with labels in figures, but are intended to allow readers to “track” labels across figures.

+

Where there’s a 1:1 mapping between broad_histology and cancer_group, the hex codes will be the same.

- -
benign <- c("benign tumor", "non-tumor")
+ +
cancer_group_df <- data.frame(
+  cancer_group = c("Choroid plexus papilloma",
+                   "Diffuse intrinsic pontine glioma",
+                   "Diffuse midline glioma",
+                   "High-grade glioma astrocytoma",
+                   "Atypical Teratoid Rhabdoid Tumor",
+                   "CNS Embryonal tumor",
+                   "Medulloblastoma",
+                   "Ependymoma",
+                   "Teratoma",
+                   "Ganglioglioma",
+                   "Low-grade glioma astrocytoma",
+                   "Meningioma",
+                   "Ewing sarcoma",
+                   "Dysembryoplastic neuroepithelial tumor",
+                   "Neurofibroma Plexiform",
+                   "Schwannoma",
+                   "Craniopharyngioma"),
+  cancer_group_hex = c("#4d2635",
+                       "#bf0099",
+                       "#ff40d9",
+                       "#ffccf5",
+                       "#4d0d85",
+                       "#b08ccf",
+                       "#a340ff",
+                       "#2200ff",
+                       "#058aff",
+                       "#8c8cff",
+                       "#000080",
+                       "#2db398",
+                       "#9fbf60",
+                       "#614e01",
+                       "#e6ac39",
+                       "#ab7200",
+                       "#b33000"),
+  stringsAsFactors = FALSE
+)
+# Value for "other" groups
+cancer_group_other_hex <- "#b5b5b5"
-

Add in the Other astrocytic tumor in with the LGAT group.

+

And again, we’ll create a legend with legend()

- -
lgat <- c("other astrocytic tumor", "low-grade astrocytic tumor")
+ +
plot(NULL, xaxt = "n", yaxt = "n", bty = "n", ylab = "", xlab = "", 
+     xlim = 0:1, ylim = 0:1)
+legend("topleft", 
+       legend = c(cancer_group_df$cancer_group, "Other"),
+       col = c(cancer_group_df$cancer_group_hex, cancer_group_other_hex),
+       pch = 15, pt.cex = 1.5, cex = 0.75, bty = "n")
- - -
-
-
-

Make new display_group

- - - -
histology_table <- working_metadata %>% 
-  dplyr::mutate(
-    # NAs are really Normals
-    display_group = tidyr::replace_na(broad_histology_lower, "normal"),
-    # Now do the group combining
-    display_group = forcats::fct_collapse(display_group,
-      "low-grade astrocytic tumor" = lgat,
-      "other tumor" = other_tumor,
-      "benign" = benign
-    ),
-    # Put this as a character for later handling
-    display_group = as.character(display_group)
-    )
+ +
mtext("Cancer Group", at = 0.0625, cex = 1)
- -
Warning: Unknown levels in `f`: other astrocytic tumor
- + +

+ -

Print out the number of display_group (including normal)!

+
+
+

Output

+

We can create a data frame that contains both palettes with a series of left joins, where we will then fill the NA values with a single (gray) hex code per column (#808080 for broad_histology, #b5b5b5 for cancer_group.)

- -
display_group_df <- histology_table %>% 
-  dplyr::count(display_group) %>% 
-  dplyr::arrange(n)
-
-knitr::kable(display_group_df)
+ +
palette_df <- histologies_df %>%
+  # Exclude normal samples
+  filter(sample_type == "Tumor") %>%
+  # Filter to unique broad histology--cancer group pairs
+  select(broad_histology, 
+         cancer_group) %>% 
+  distinct() %>%
+  # Add broad histology palette
+  left_join(broad_histology_df, by = "broad_histology") %>%
+  # Add cancer group palette
+  left_join(cancer_group_df, by = "cancer_group") %>%
+  # Fill all other values with gray colors
+  replace_na(list(broad_histology_hex = broad_histology_other_hex,
+                  cancer_group_hex = cancer_group_other_hex)) %>%
+  # The exception being - if cancer_group == NA, so should cancer_group_hex!
+  mutate(cancer_group_hex = if_else(is.na(cancer_group), 
+                                    NA_character_, 
+                                    cancer_group_hex)) %>%
+  # Sort by broad_histology for easy browsing
+  arrange(broad_histology)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
display_groupn
choroid plexus tumor9
tumor of pineal region10
chordoma12
histiocytic tumor14
other tumor21
germ cell tumor27
pre-cancerous lesion27
mesenchymal non-meningothelial tumor49
meningioma59
tumors of sellar region73
benign76
neuronal and mixed neuronal-glial tumor83
tumor of cranial and paraspinal nerves83
ependymal tumor174
embryonal tumor333
diffuse astrocytic and oligodendroglial tumor370
low-grade astrocytic tumor587
normal833
-

Make this notebook stop if there are more than 16 histology groups + Normal.

+

And now let’s take a look!

- -
if (nrow(display_group_df) > 18) {
-  stop("There are more than 18 categories in `display_group`. We may want to re-evaluate the high-level histology groupings")
-}
+ +
palette_df
+ +
+ +
+
-
-

Make display_order

-

Get ranks in order of big to small and make them into a new column in display_group_df. We will always want the “normal”, “benign”, “other_tumor” groups to come last so we will push then to the end of the factor order.

+
+

Add display names for convenience

+

When multiple values are using the same color, it can be helpful to have a separate value for the legend, e.g., for all #808080 broad histologies, we may want to display Other. We’ll add a couple columns for legend-making convenience.

- -
display_order_df <- display_group_df %>% 
-  dplyr::mutate(display_group = forcats::fct_reorder(display_group, n, .desc = TRUE) %>%
-                  forcats::fct_relevel("benign", "other tumor", "normal", after = Inf),
-                display_order = as.numeric(display_group)) # save the factor order for text table export
+ +
palette_df <- palette_df %>%
+  mutate(broad_histology_display = if_else(broad_histology_hex == broad_histology_other_hex,
+                                           "Other",
+                                           broad_histology),
+         cancer_group_display = if_else(cancer_group_hex == cancer_group_other_hex,
+                                        "Other",
+                                        cancer_group))
-
-

Make cancer_group_order

-

cancer_group is a shorter form of harmonized_diagnosis with the following edits: - Removed Other, Benign tumor and Dysplasia/Gliosis,Dysplasia/Gliosis-Glial-neuronal tumor NOS removed from cancer_group - Neurofibroma/Plexiform;Other updated to Neurofibroma/Plexiform - Non-germinomatous germ cell tumor;Teratoma updated to Teratoma - Anaplastic (malignant) meningioma, Meningothelial meningioma and Clear cell meningioma updated to Meningioma - Embryonal Tumor with Multilayered Rosettes updated to Embryonal tumor with multilayer rosettes

-

Get ranks in order of big to small and make them into a new dataframe cancer_group_order_df.

+
+

Add broad_histology_order

+

Previously, we had a concept known as display_order where we ordered categories based on their number of samples (from large to small). Now that we’ve dropped display_group, let’s apply this same concept to broad_histology.

- -
cancer_group_order_df <- histology_table %>% 
-  dplyr::count(cancer_group,name = "cancer_group_n") %>% 
-  dplyr::mutate(
-    cancer_group = forcats::fct_reorder(cancer_group, cancer_group_n, .desc = TRUE),
-                cancer_group_order = as.numeric(cancer_group)) # save the factor order for text table export
+ +
broad_histology_order_df <- histologies_df %>%  
+  # Exclude normal samples
+  filter(sample_type == "Tumor",
+         # Only count histologies that we'll have a hex code for
+         broad_histology %in% included_labels_df$broad_histology) %>%
+  # Filter to unique sample--broad_histology pairs
+  select(sample_id, 
+         broad_histology) %>%
+  distinct() %>%
+  # Count samples within a broad histology
+  count(broad_histology) %>%
+  # Add Other placeholder
+  bind_rows(data.frame(broad_histology = "Other",
+                       n = 0, 
+                       stringsAsFactors = FALSE)) %>%
+  # Reorder based on sample size except Benign tumor and Other should come last
+  # And then add numeric column with the order
+  mutate(broad_histology = forcats::fct_reorder(broad_histology,
+                                                n,
+                                                .desc = TRUE) %>%
+           forcats::fct_relevel("Benign tumor",
+                                "Other",
+                                after = Inf),
+         broad_histology_order = as.numeric(broad_histology)) %>%
+  # No longer require the sample size
+  select(-n)
+
+broad_histology_order_df
+ +
+ +
+ -

Add on the display_order column using inner_join

+

And now we’re ready to add this to the palette data frame.

- -
histology_table <- histology_table %>%
-  # Join on the display orders
-  dplyr::inner_join(display_order_df, by = "display_group") %>%
-  # Join on the cancer_group orders
-  dplyr::inner_join(cancer_group_order_df, by = "cancer_group") 
+ +
palette_df <- palette_df %>%
+  left_join(broad_histology_order_df, 
+            by = c("broad_histology_display" = "broad_histology"))
- -
Warning: Column `display_group` joining character vector and factor,
-coercing into character vector
- - -
Warning: Column `cancer_group` joining character vector and factor,
-coercing into character vector
- + +
Column `broad_histology_display`/`broad_histology` joining character vector and factor, coercing into character vector
+
-
-

Add hex codes for display_group and cancer_group

-

These hex codes were retrieved from http://phrogz.net/css/distinct-colors.html with the settings on default for 18 colors.

+
+

Add oncoprint_group and oncoprint_hex

+

For most plots that make use of the cancer_group palette, such as a box or violin plot, we will rely heavily on labels and therefore using the gray hex code for multiple groups will not be a problem.

+

We will have four panels of individual oncoprints, where many broad_histology values will get grouped together into the Other CNS panel which you can see here. We can move this information into our palette data frame.

- -
color_palette_display <- 
-  c("#ff0000", "#cc0000", "#995200", "#bfb300", "#fffbbf", 
-    "#2e7300", "#00e65c", "#00ffee", "#103d40", "#0085a6", 
-    "#003380", "#4073ff", "#737899", "#70008c", "#f2b6ee", 
-    "#ff40bf", "#8c0038", "#330d12"
+
+
# Taken from the current plot oncoprint script as of the writing of this
+# See permalink above
+other_cns_broad_histologies <- c(
+  "Ependymal tumor",
+  "Tumors of sellar region",
+  "Neuronal and mixed neuronal-glial tumor",
+  "Tumor of cranial and paraspinal nerves",
+  "Meningioma",
+  "Mesenchymal non-meningothelial tumor",
+  "Germ cell tumor",
+  "Choroid plexus tumor",
+  "Histiocytic tumor",
+  "Tumor of pineal region",
+  "Metastatic tumors",
+  "Other astrocytic tumor",
+  "Lymphoma",
+  "Melanocytic tumor",
+  "Other tumor"
 )
 
-color_palette_cancer_group <-
-  c("#ff0000", "#f20000", "#997373", "#403030", "#330700",
-    "#ff9180", "#591800", "#b2502d", "#cca799", "#ff6600",
-    "#ffb380", "#7f5940", "#cc6d00", "#331b00", "#ccb499",
-    "#ffaa00", "#996600", "#594316", "#ffd580", "#ffee00",
-    "#998f00", "#999673", "#303300", "#fbffbf", "#ccff00",
-    "#494d39", "#b5d96c", "#6a8040", "#66ff00", "#42a600",
-    "#bfffbf", "#003307", "#00661b", "#00ff88", "#86b39e",
-    "#00b377", "#006652", "#00ffee", "#00a7b3", "#bffbff",
-    "#567173", "#00ccff", "#003d4d", "#00aaff", "#267399",
-    "#0088ff", "#0042a6", "#001a40", "#bfd9ff", "#0044ff",
-    "#394973", "#000e66", "#bfbfff", "#9180ff", "#5800a6",
-    "#754d99", "#aa00ff", "#3a3040", "#aa86b3", "#530059",
-    "#ff00ee", "#a60085", "#330022", "#ff80d5", "#ff0088",
-    "#804062", "#a60042", "#590024", "#ffbfd9", "#ff0044",
-    "#990014", "#ff8091"
-)
+palette_df <- palette_df %>% + mutate(oncoprint_group = case_when( + broad_histology %in% other_cns_broad_histologies ~ "Other CNS", + broad_histology %in% c( + "Low-grade astrocytic tumor", + "Embryonal tumor", + "Diffuse astrocytic and oligodendroglial tumor" + ) ~ broad_histology, + TRUE ~ NA_character_ + ))
-

Declare how many colors we need.

+

For cancer groups that do not get their own hex code for display (i.e., due to small sample sizes), we’ll use a selection of grey colors as a palette and rely heavily on the ordering of the OncoPrint legend. Unfortunately there are over 20 “Other CNS” cancer groups that meet this criterion, so it is not feasible to have a color for each of them and they will not be included in the “Other CNS” OncoPrint.

- -
n_colors_display <- nrow(display_group_df)
-n_colors_cancer_group <- nrow(cancer_group_order_df)
- - - -

Make a named list color key where histologies are the names.

- - - -
# Set seed so the colors are consistent upon re-run
-set.seed(2021)
+
+
greys_df <- palette_df %>%
+  filter(cancer_group_display == "Other",
+         !is.na(oncoprint_group),
+         oncoprint_group != "Other CNS")
 
-# Sample from the 18 colors for display_group
-subset_colors_display <- sample(color_palette_display, n_colors_display)
-names(subset_colors_display) <- display_order_df$display_group
-
-# Sample from the 62 colors for cancer_group
-subset_colors_cancer_group <- sample(color_palette_cancer_group, n_colors_cancer_group)
-names(subset_colors_cancer_group) <- cancer_group_order_df$cancer_group
- - - -

Remove from subset_colors_cancer_group

- - - -
# We will assign a gray color for NA below
-subset_colors_cancer_group <- subset_colors_cancer_group[!is.na(names(subset_colors_cancer_group))]
- - - -

We want Other tumor and the Benign in display_group to both always be gray.

- - - -
subset_colors_display[names(subset_colors_display) == 'other tumor'] <- "#808080" 
-subset_colors_display[names(subset_colors_display) == 'benign'] <-  "#D3D3D3"
- - - -

Normal biospecimens should not get plotted in display_group, so we will put their hex code as black.

- - - -
subset_colors_display[names(subset_colors_display) == 'normal'] <- "#000000"
- - - -

Use pie function to preview what display_group these look like.

- - - -
pie(rep(1, n_colors_display), 
-    col = subset_colors_display, 
-    labels = names(subset_colors_display))
- - -

- - - -

Use pie function to preview what cancer_group these look like.

- - - -
pie(rep(1, n_colors_cancer_group), 
-    col = subset_colors_cancer_group, 
-    labels = names(subset_colors_cancer_group))
+# Sample the greys sequential palette from color brewer +set.seed(2021) +greys_df <- greys_df %>% + mutate(oncoprint_hex = sample(brewer.pal(nrow(greys_df), "Greys")))
- -

- -

Add the hex codes for display_group and cancer_group to the histology_table. Add gray color if cancer_group==NA

+

Add oncoprint_hex & oncoprint_include, the latter will be FALSE when the former is NA.

+

Add oncoprint_hex for all other cancer groups and oncoprint_include. The latter will be FALSE when the former is NA.

- -
histology_table <- histology_table %>%
-  # We don't need this anymore
-  dplyr::select(-broad_histology_lower) %>%
-  # Add the hex_codes
-  dplyr::mutate(hex_codes = dplyr::recode(display_group, !!!subset_colors_display),
-        cancer_group_hex_codes = dplyr::recode(cancer_group, !!!subset_colors_cancer_group),
-                # if cancer_group is NA for tumor sample add gray color
-        cancer_group_hex_codes = dplyr::if_else(is.na(cancer_group_hex_codes) & sample_type=="Tumor"
-                                                 ,"#808080",
-                                                 cancer_group_hex_codes)
-            ) %>% 
-  # Restore capitalization so its pretty for labeling
-  dplyr::mutate(display_group = stringr::str_to_sentence(display_group),
-                # Deal with CNS exception
-                display_group = stringr::str_replace(display_group, "cns", "CNS")
-                )
+ +
palette_df <- palette_df %>%
+  left_join(greys_df) %>%
+  # When there's an oncoprint group and a specific cancer group display color,
+  # use the cancer group display color as the oncoprint color
+  mutate(oncoprint_hex = if_else(
+    cancer_group_hex != cancer_group_other_hex & !is.na(oncoprint_group),
+    cancer_group_hex,
+    oncoprint_hex
+  ),
+  # Only when there's a specific oncoprint color -- even if that is a grey 
+  # selected only for the oncoprint -- will a cancer group be included in the
+  # oncoprint
+  oncoprint_include = if_else(
+    is.na(oncoprint_hex),
+    FALSE,
+    TRUE
+  ))
+ +
Joining, by = c("broad_histology", "cancer_group", "broad_histology_hex", "cancer_group_hex", "broad_histology_display", "cancer_group_display", "broad_histology_order", "oncoprint_group")
+ -
-

Save to TSV

+
+
+

Save to TSV

- -
readr::write_tsv(histology_table, file.path(output_dir, "histology_label_color_table.tsv"))
+ +
readr::write_tsv(palette_df, 
+                 file.path(output_dir, 
+                           "broad_histology_cancer_group_palette.tsv"))
-
-

Session Info

+
+

Session Info

- +
sessionInfo()
- +
R version 3.6.0 (2019-04-26)
 Platform: x86_64-pc-linux-gnu (64-bit)
 Running under: Debian GNU/Linux 9 (stretch)
@@ -3543,20 +3437,28 @@ 

Session Info

attached base packages: [1] stats graphics grDevices utils datasets methods base +other attached packages: + [1] RColorBrewer_1.1-2 forcats_0.4.0 stringr_1.4.0 + [4] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 + [7] tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0 +[10] tidyverse_1.2.1 + loaded via a namespace (and not attached): - [1] Rcpp_1.0.1 knitr_1.23 magrittr_1.5 hms_0.4.2 - [5] tidyselect_0.2.5 R6_2.4.0 rlang_0.4.0 stringr_1.4.0 - [9] highr_0.8 dplyr_0.8.3 tools_3.6.0 xfun_0.8 -[13] ellipsis_0.2.0.1 htmltools_0.3.6 yaml_2.2.0 assertthat_0.2.1 -[17] digest_0.6.20 tibble_2.1.3 crayon_1.3.4 purrr_0.3.2 -[21] readr_1.3.1 tidyr_0.8.3 base64enc_0.1-3 glue_1.3.1 -[25] evaluate_0.14 rmarkdown_1.13 stringi_1.4.3 compiler_3.6.0 -[29] pillar_1.4.2 forcats_0.4.0 jsonlite_1.6 pkgconfig_2.0.2
+ [1] Rcpp_1.0.1 cellranger_1.1.0 pillar_1.4.2 compiler_3.6.0 + [5] tools_3.6.0 jsonlite_1.6 lubridate_1.7.4 nlme_3.1-140 + [9] gtable_0.3.0 lattice_0.20-38 pkgconfig_2.0.2 rlang_0.4.0 +[13] cli_1.1.0 rstudioapi_0.10 haven_2.1.1 xfun_0.8 +[17] withr_2.1.2 xml2_1.2.0 httr_1.4.0 knitr_1.23 +[21] generics_0.0.2 hms_0.4.2 grid_3.6.0 tidyselect_0.2.5 +[25] glue_1.3.1 R6_2.4.0 readxl_1.3.1 modelr_0.1.4 +[29] magrittr_1.5 ellipsis_0.2.0.1 backports_1.1.4 scales_1.0.0 +[33] rvest_0.3.4 assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3 +[37] lazyeval_0.2.2 munsell_0.5.0 broom_0.5.2 crayon_1.3.4
-
---
title: "Mapping histology labels for plots"
output:   
  html_notebook: 
    toc: true
    toc_float: true
author: Candace Savonen for ALSF - CCDL
date: 2021
---

# Purpose: 

The histology label variables included in `pbta-histologies.tsv` from data releases are not always useful for visualizing the full set of biospecimens due to the large number of different values.
Having too many different possible values makes the colors harder to distinguish.
In addition, there are some groups that are represented by only a very few samples; giving such groups a distinct color may be counterproductive.

The goal of this notebook is to use the currently existing `broad_histology` groups from `pbta-histologies.tsv`, to form 10-15 "high level histology" group labels that can used for plotting purposes.

## The output table

The output of this notebook is a TSV file: `palettes/histology_label_color_table.tsv` that contains the following fields:

**Copied from `pbta-histologies.tsv`**:    
- `Kids_First_Biospecimen_ID` (from `pbta-histologies.tsv`)  
- All the original histology label variables (`broad_histology`, `short_histology`, etc.)  
  
**Created in this notebook**:  
- `display_group` - the high-level histology labels that should be used for plotting   
- `hex_codes` the direct colors that correspond to display_groups
- `cancer_group_hex_codes` the direct colors that correspond to cancer_groups

With this info, `histology_label_color_table.tsv` can be used by all plots and figures that summarize high level data  while displaying histology information. 

# How `display_group` is made:

Here's how `broad-histology` groups are [combined into the higher-level groupings of `display_group`](#declare-new-equivalent-groups).

1) "Lymphoma", "Melanocytic tumor", "Other tumor", "Metastatic tumors", "Non-CNS tumor" are combined into a `Other tumor` in `display_group`. 

2) `Benign tumor` and `Non-tumor` biospecimens are combined into a `Benign` group. 

3) `Other astrocytic tumor` biospecimens are combined into the existing `Low-grade astrocytic tumor`. These biospecimens  in `other astrocytic tumors` were low-grade SEGA tumors. 

4) Anything not in the above categories gets its `broad_histology` label carried over. 

# Usage

This notebook can be run via the command line from the top directory of the 
repository as follows:

```
Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', 
                              clean = TRUE)"
```

## Set Up

```{r}
# Magrittr pipe
`%>%` <- dplyr::`%>%`
```

### Directories and Files

```{r}
# Path to input directory
input_dir <- file.path("..", "data")
output_dir <- "palettes"
```

# Read in metadata 

Which variables are we keeping for this table? 

```{r}
histology_variables <- 
  c("integrated_diagnosis", 
    "Notes", 
    "harmonized_diagnosis",
    "broad_histology", 
    "short_histology",
    "cancer_group")
```

Let's read in the current release's `pbta-histologies.tsv` file. 

```{r}
metadata <-
  readr::read_tsv(file.path(input_dir, "pbta-histologies.tsv"), guess_max = 10000)
```

Now we'll select histology variables we mentioned above and so capitalization differences don't get in the way with this process, we will change everything to lower case for now. 

```{r}
working_metadata <- metadata %>% 
  dplyr::select(Kids_First_Biospecimen_ID, sample_type, histology_variables) %>% 
  dplyr::mutate(broad_histology_lower = tolower(broad_histology))
```

# Take a look at how many biospecimens per `broad_histology` group

Let's summarize `broad_histology`. 
Because the `Normal` samples don't have histologies, we'll look at just the `Tumor` samples at for this summary. 

```{r}
broad_summary <- working_metadata %>% 
  dplyr::filter(sample_type == "Tumor") %>%
  dplyr::count(broad_histology_lower) %>% 
  dplyr::arrange(n) 
```

Let's print out the summary. 

```{r}
broad_summary %>% 
  knitr::kable()
```

There's handful of very small groups (many are n = 2). 

## Declare new equivalent groups

These groups we'll combine into a non-CNS/other tumor group.

```{r}
other_tumor <- c("lymphoma", "melanocytic tumor", "other tumor", "metastatic tumors", "non-cns tumor")
```

These groups we'll combine as a benign.

```{r}
benign <- c("benign tumor", "non-tumor")
```

Add in the `Other astrocytic tumor` in with the LGAT group. 

```{r}
lgat <- c("other astrocytic tumor", "low-grade astrocytic tumor")
```

# Make new `display_group`

```{r}
histology_table <- working_metadata %>% 
  dplyr::mutate(
    # NAs are really Normals
    display_group = tidyr::replace_na(broad_histology_lower, "normal"),
    # Now do the group combining
    display_group = forcats::fct_collapse(display_group,
      "low-grade astrocytic tumor" = lgat,
      "other tumor" = other_tumor,
      "benign" = benign
    ),
    # Put this as a character for later handling
    display_group = as.character(display_group)
    )
```

Print out the number of `display_group` (including `normal`)!

```{r}
display_group_df <- histology_table %>% 
  dplyr::count(display_group) %>% 
  dplyr::arrange(n)

knitr::kable(display_group_df)
```

Make this notebook stop if there are more than 16 histology groups + `Normal`. 

```{r}
if (nrow(display_group_df) > 18) {
  stop("There are more than 18 categories in `display_group`. We may want to re-evaluate the high-level histology groupings")
}
```

# Make `display_order`

Get ranks in order of big to small and make them into a new column in `display_group_df`. 
We will always want the "normal", "benign", "other_tumor" groups to come last so we will push then to the end of the factor order. 

```{r}
display_order_df <- display_group_df %>% 
  dplyr::mutate(display_group = forcats::fct_reorder(display_group, n, .desc = TRUE) %>%
                  forcats::fct_relevel("benign", "other tumor", "normal", after = Inf),
                display_order = as.numeric(display_group)) # save the factor order for text table export
```

# Make `cancer_group_order`

`cancer_group` is a shorter form of `harmonized_diagnosis` with the following edits:
- Removed Other, Benign tumor and Dysplasia/Gliosis,Dysplasia/Gliosis-Glial-neuronal tumor NOS removed from `cancer_group`
- Neurofibroma/Plexiform;Other updated to Neurofibroma/Plexiform
- Non-germinomatous germ cell tumor;Teratoma updated to Teratoma
- Anaplastic (malignant) meningioma, Meningothelial meningioma and Clear cell meningioma updated to Meningioma
- Embryonal Tumor with Multilayered Rosettes updated to Embryonal tumor with multilayer rosettes

Get ranks in order of big to small and make them into a new dataframe `cancer_group_order_df`. 
   
```{r}
cancer_group_order_df <- histology_table %>% 
  dplyr::count(cancer_group,name = "cancer_group_n") %>% 
  dplyr::mutate(
    cancer_group = forcats::fct_reorder(cancer_group, cancer_group_n, .desc = TRUE),
                cancer_group_order = as.numeric(cancer_group)) # save the factor order for text table export
```


Add on the `display_order` column using `inner_join` 

```{r}
histology_table <- histology_table %>%
  # Join on the display orders
  dplyr::inner_join(display_order_df, by = "display_group") %>%
  # Join on the cancer_group orders
  dplyr::inner_join(cancer_group_order_df, by = "cancer_group") 
```

# Add hex codes for display_group and cancer_group

These hex codes were retrieved from http://phrogz.net/css/distinct-colors.html with the settings on default for 18 colors.

```{r}
color_palette_display <- 
  c("#ff0000", "#cc0000", "#995200", "#bfb300", "#fffbbf", 
    "#2e7300", "#00e65c", "#00ffee", "#103d40", "#0085a6", 
    "#003380", "#4073ff", "#737899", "#70008c", "#f2b6ee", 
    "#ff40bf", "#8c0038", "#330d12"
)

color_palette_cancer_group <-
  c("#ff0000", "#f20000", "#997373", "#403030", "#330700",
    "#ff9180", "#591800", "#b2502d", "#cca799", "#ff6600",
    "#ffb380", "#7f5940", "#cc6d00", "#331b00", "#ccb499",
    "#ffaa00", "#996600", "#594316", "#ffd580", "#ffee00",
    "#998f00", "#999673", "#303300", "#fbffbf", "#ccff00",
    "#494d39", "#b5d96c", "#6a8040", "#66ff00", "#42a600",
    "#bfffbf", "#003307", "#00661b", "#00ff88", "#86b39e",
    "#00b377", "#006652", "#00ffee", "#00a7b3", "#bffbff",
    "#567173", "#00ccff", "#003d4d", "#00aaff", "#267399",
    "#0088ff", "#0042a6", "#001a40", "#bfd9ff", "#0044ff",
    "#394973", "#000e66", "#bfbfff", "#9180ff", "#5800a6",
    "#754d99", "#aa00ff", "#3a3040", "#aa86b3", "#530059",
    "#ff00ee", "#a60085", "#330022", "#ff80d5", "#ff0088",
    "#804062", "#a60042", "#590024", "#ffbfd9", "#ff0044",
    "#990014", "#ff8091"
)

```

Declare how many colors we need. 

```{r}
n_colors_display <- nrow(display_group_df)
n_colors_cancer_group <- nrow(cancer_group_order_df)
```

Make a named list color key where histologies are the names. 

```{r}
# Set seed so the colors are consistent upon re-run
set.seed(2021)

# Sample from the 18 colors for display_group
subset_colors_display <- sample(color_palette_display, n_colors_display)
names(subset_colors_display) <- display_order_df$display_group

# Sample from the 62 colors for cancer_group
subset_colors_cancer_group <- sample(color_palette_cancer_group, n_colors_cancer_group)
names(subset_colors_cancer_group) <- cancer_group_order_df$cancer_group
```

Remove <NA> from subset_colors_cancer_group 
```{r}
# We will assign a gray color for NA below
subset_colors_cancer_group <- subset_colors_cancer_group[!is.na(names(subset_colors_cancer_group))]
```

We want `Other tumor` and the `Benign` in display_group to both always be gray. 

```{r}
subset_colors_display[names(subset_colors_display) == 'other tumor'] <- "#808080" 
subset_colors_display[names(subset_colors_display) == 'benign'] <-  "#D3D3D3"
```

Normal biospecimens should not get plotted in display_group, so we will put their hex code as black. 

```{r}
subset_colors_display[names(subset_colors_display) == 'normal'] <- "#000000"
```

Use `pie` function to preview what display_group these look like.

```{r}
pie(rep(1, n_colors_display), 
    col = subset_colors_display, 
    labels = names(subset_colors_display))
```

Use `pie` function to preview what cancer_group these look like.

```{r}
pie(rep(1, n_colors_cancer_group), 
    col = subset_colors_cancer_group, 
    labels = names(subset_colors_cancer_group))
```


Add the hex codes for display_group and cancer_group to the `histology_table`. 
Add gray color if cancer_group==NA
```{r}
histology_table <- histology_table %>%
  # We don't need this anymore
  dplyr::select(-broad_histology_lower) %>%
  # Add the hex_codes
  dplyr::mutate(hex_codes = dplyr::recode(display_group, !!!subset_colors_display),
		cancer_group_hex_codes = dplyr::recode(cancer_group, !!!subset_colors_cancer_group),
                # if cancer_group is NA for tumor sample add gray color
		cancer_group_hex_codes = dplyr::if_else(is.na(cancer_group_hex_codes) & sample_type=="Tumor"
                                                 ,"#808080",
                                                 cancer_group_hex_codes)
	       	) %>% 
  # Restore capitalization so its pretty for labeling
  dplyr::mutate(display_group = stringr::str_to_sentence(display_group),
                # Deal with CNS exception
                display_group = stringr::str_replace(display_group, "cns", "CNS")
                )
```

## Save to TSV 

```{r}
readr::write_tsv(histology_table, file.path(output_dir, "histology_label_color_table.tsv"))
```

# Session Info

```{r}
sessionInfo()
```

+
---
title: "Create a minimal palette for displaying multiple disease labels"
output:   
  html_notebook: 
    toc: true
    toc_float: true
author: Candace Savonen, Krutika Gaonkar, and Jaclyn Taroni
date: 2021
---

## Purpose

There are multiple "disease labels" in the `pbta-histologies.tsv` file, including (from most broad to most narrow) `broad_histology`, `cancer_group`, and `harmonized_diagnosis`.
For context, it is helpful to note that an individual `cancer_group` will be nested under a single `broad_histology` and that `cancer_group` is a shorter form of `harmonized_diagnosis` with the following edits:

- Other, Benign tumor and Dysplasia/Gliosis, Dysplasia/Gliosis-Glial-neuronal tumor NOS removed from `cancer_group`
- Neurofibroma/Plexiform;Other updated to Neurofibroma/Plexiform
- Non-germinomatous germ cell tumor;Teratoma updated to Teratoma
- Anaplastic (malignant) meningioma, Meningothelial meningioma and Clear cell meningioma updated to Meningioma
- Embryonal Tumor with Multilayered Rosettes updated to Embryonal tumor with multilayer rosettes

It is often useful to use color to indicate disease label in a plot where multiple groups are visualized when we can not rely particularly heavily on labels (e.g., scatter plots).
Unfortunately, there are too many potential labels for us to generate an effective color palette (e.g., of sufficiently distinct colors).
In addition, some groupings will contain very few samples.

The purpose of this notebook is to create color palettes for the following:

* `broad_histology` values, where a `broad_histology` contains at least one `cancer_group` with n >= 10
* `cancer_group` values with n >= 10

**Note: This is tied to `release-v21-20210820`.**

### Background

You may find [#1174](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174) to be helpful context.

## Usage

This notebook can be run via the command line from the top directory of the repository as follows:

```
Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', 
                              clean = TRUE)"
```

## Set Up

```{r}
library(tidyverse)
library(RColorBrewer)
```

### Directories and Files

```{r}
# Path to input directory
input_dir <- file.path("..", "data", "release-v21-20210820")
output_dir <- "palettes"
```

## Read in metadata 

Let's read in the `pbta-histologies.tsv` file from `release-v21-20210820`.

```{r}
histologies_df <-
  readr::read_tsv(file.path(input_dir, "pbta-histologies.tsv"), guess_max = 10000)
```

## Identify values to include in palettes

We will use `cancer_group` with n >= 10 to guide what values to include in both our `cancer_group` and `broad_histology` palettes.

```{r}
included_labels_df <- histologies_df %>% 
  # Exclude normal samples
  filter(sample_type == "Tumor") %>%
  # Filter to unique sample--"disease label" pairs
  select(sample_id, 
         broad_histology, 
         cancer_group) %>% 
  distinct() %>%
  # Count samples (e.g., sample_id)
  group_by(broad_histology, cancer_group) %>% 
  tally() %>%
  # Add a column called included which is a logical that can be used as 
  # a sample size filter & also to drop the NA values
  filter(n >= 10, 
         !is.na(cancer_group))

included_labels_df
```

So the unique values for `broad_histology` and `cancer_group` above are what we need to take into account for our palette.

## Create palettes

Outside of this notebook, we've done quite a bit of work to identify suitable palettes using http://phrogz.net/css/distinct-colors.html as a reference/starting point.
Check out the discussion on [#1174](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1174)!

### `broad_histology`

```{r}
broad_histology_df <- data.frame(
  broad_histology = c("Benign tumor",
                      "Diffuse astrocytic and oligodendroglial tumor",
                      "Embryonal tumor",
                      "Ependymal tumor",
                      "Germ cell tumor",
                      "Low-grade astrocytic tumor",
                      "Meningioma",
                      "Mesenchymal non-meningothelial tumor",
                      "Neuronal and mixed neuronal-glial tumor",
                      "Tumor of cranial and paraspinal nerves",
                      "Tumors of sellar region"), 
  broad_histology_hex = c("#590024",
                          "#ff80e5",
                          "#220040",
                          "#2200ff",
                          "#0074d9",
                          "#8f8fbf",
                          "#2db398",
                          "#7fbf00",
                          "#685815",
                          "#ffaa00",
                          "#b2502d"),
  stringsAsFactors = FALSE
)

# value for "other" histologies
broad_histology_other_hex <- "#808080"
```

Now to create a legend with `legend()` (h/t [this StackOverflow answer](https://stackoverflow.com/questions/48966645/how-can-i-create-a-legend-without-a-plot-in-r/48966924))

```{r}
plot(NULL, xaxt = "n", yaxt = "n", bty = "n", ylab = "", xlab = "", 
     xlim = 0:1, ylim = 0:1)
legend("topleft", 
       legend = c(broad_histology_df$broad_histology, "Other"),
       col = c(broad_histology_df$broad_histology_hex, 
               broad_histology_other_hex),
       pch = 15, pt.cex = 2, cex = 1, bty = "n")
mtext("Broad Histology", at = 0.135, cex = 1.5)
```

### `cancer_group`

There are 17 `cancer_group` values that we need to account for.
These are best used in conjunction with _labels_ in figures, but are intended to allow readers to "track" labels _across figures_.

Where there's a 1:1 mapping between `broad_histology` and `cancer_group`, the hex codes will be the same.

```{r}
cancer_group_df <- data.frame(
  cancer_group = c("Choroid plexus papilloma",
                   "Diffuse intrinsic pontine glioma",
                   "Diffuse midline glioma",
                   "High-grade glioma astrocytoma",
                   "Atypical Teratoid Rhabdoid Tumor",
                   "CNS Embryonal tumor",
                   "Medulloblastoma",
                   "Ependymoma",
                   "Teratoma",
                   "Ganglioglioma",
                   "Low-grade glioma astrocytoma",
                   "Meningioma",
                   "Ewing sarcoma",
                   "Dysembryoplastic neuroepithelial tumor",
                   "Neurofibroma Plexiform",
                   "Schwannoma",
                   "Craniopharyngioma"),
  cancer_group_hex = c("#4d2635",
                       "#bf0099",
                       "#ff40d9",
                       "#ffccf5",
                       "#4d0d85",
                       "#b08ccf",
                       "#a340ff",
                       "#2200ff",
                       "#058aff",
                       "#8c8cff",
                       "#000080",
                       "#2db398",
                       "#9fbf60",
                       "#614e01",
                       "#e6ac39",
                       "#ab7200",
                       "#b33000"),
  stringsAsFactors = FALSE
)
# Value for "other" groups
cancer_group_other_hex <- "#b5b5b5"
```

And again, we'll create a legend with `legend()`

```{r}
plot(NULL, xaxt = "n", yaxt = "n", bty = "n", ylab = "", xlab = "", 
     xlim = 0:1, ylim = 0:1)
legend("topleft", 
       legend = c(cancer_group_df$cancer_group, "Other"),
       col = c(cancer_group_df$cancer_group_hex, cancer_group_other_hex),
       pch = 15, pt.cex = 1.5, cex = 0.75, bty = "n")
mtext("Cancer Group", at = 0.0625, cex = 1)
```

### Output

We can create a data frame that contains both palettes with a series of left joins, where we will then fill the NA values with a single (gray) hex code per column (`r broad_histology_other_hex` for `broad_histology`, `r cancer_group_other_hex` for `cancer_group`.)

```{r}
palette_df <- histologies_df %>%
  # Exclude normal samples
  filter(sample_type == "Tumor") %>%
  # Filter to unique broad histology--cancer group pairs
  select(broad_histology, 
         cancer_group) %>% 
  distinct() %>%
  # Add broad histology palette
  left_join(broad_histology_df, by = "broad_histology") %>%
  # Add cancer group palette
  left_join(cancer_group_df, by = "cancer_group") %>%
  # Fill all other values with gray colors
  replace_na(list(broad_histology_hex = broad_histology_other_hex,
                  cancer_group_hex = cancer_group_other_hex)) %>%
  # The exception being - if cancer_group == NA, so should cancer_group_hex!
  mutate(cancer_group_hex = if_else(is.na(cancer_group), 
                                    NA_character_, 
                                    cancer_group_hex)) %>%
  # Sort by broad_histology for easy browsing
  arrange(broad_histology)
```

And now let's take a look!

```{r}
palette_df
```

### Add display names for convenience

When multiple values are using the same color, it can be helpful to have a separate value for the legend, e.g., for all `#808080` broad histologies, we may want to display `Other`. 
We'll add a couple columns for legend-making convenience.

```{r}
palette_df <- palette_df %>%
  mutate(broad_histology_display = if_else(broad_histology_hex == broad_histology_other_hex,
                                           "Other",
                                           broad_histology),
         cancer_group_display = if_else(cancer_group_hex == cancer_group_other_hex,
                                        "Other",
                                        cancer_group))
```

### Add `broad_histology_order` 

Previously, we had a concept known as `display_order` where we ordered categories based on their number of samples (from large to small).
Now that we've dropped `display_group`, let's apply this same concept to `broad_histology`.

```{r}
broad_histology_order_df <- histologies_df %>%  
  # Exclude normal samples
  filter(sample_type == "Tumor",
         # Only count histologies that we'll have a hex code for
         broad_histology %in% included_labels_df$broad_histology) %>%
  # Filter to unique sample--broad_histology pairs
  select(sample_id, 
         broad_histology) %>%
  distinct() %>%
  # Count samples within a broad histology
  count(broad_histology) %>%
  # Add Other placeholder
  bind_rows(data.frame(broad_histology = "Other",
                       n = 0, 
                       stringsAsFactors = FALSE)) %>%
  # Reorder based on sample size except Benign tumor and Other should come last
  # And then add numeric column with the order
  mutate(broad_histology = forcats::fct_reorder(broad_histology,
                                                n,
                                                .desc = TRUE) %>%
           forcats::fct_relevel("Benign tumor",
                                "Other",
                                after = Inf),
         broad_histology_order = as.numeric(broad_histology)) %>%
  # No longer require the sample size
  select(-n)

broad_histology_order_df
```

And now we're ready to add this to the palette data frame.

```{r}
palette_df <- palette_df %>%
  left_join(broad_histology_order_df, 
            by = c("broad_histology_display" = "broad_histology"))
```

### Add `oncoprint_group` and `oncoprint_hex`

For most plots that make use of the `cancer_group` palette, such as a box or violin plot, we will rely heavily on labels and therefore using the gray hex code for multiple groups will not be a problem.

We will have four panels of individual oncoprints, where many `broad_histology` values will get grouped together into the `Other CNS` panel which you can see [here](https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/d31c927a27813ec0b8032fbe768002f31723636f/analyses/oncoprint-landscape/02-plot-oncoprint.R#L181).
We can move this information into our palette data frame.

```{r}
# Taken from the current plot oncoprint script as of the writing of this
# See permalink above
other_cns_broad_histologies <- c(
  "Ependymal tumor",
  "Tumors of sellar region",
  "Neuronal and mixed neuronal-glial tumor",
  "Tumor of cranial and paraspinal nerves",
  "Meningioma",
  "Mesenchymal non-meningothelial tumor",
  "Germ cell tumor",
  "Choroid plexus tumor",
  "Histiocytic tumor",
  "Tumor of pineal region",
  "Metastatic tumors",
  "Other astrocytic tumor",
  "Lymphoma",
  "Melanocytic tumor",
  "Other tumor"
)

palette_df <- palette_df %>%
  mutate(oncoprint_group = case_when(
    broad_histology %in% other_cns_broad_histologies ~ "Other CNS",
    broad_histology %in% c(
      "Low-grade astrocytic tumor",
      "Embryonal tumor",
      "Diffuse astrocytic and oligodendroglial tumor"
    ) ~ broad_histology,
    TRUE ~ NA_character_
  ))
```

For cancer groups that do not get their own hex code for display (i.e., due to small sample sizes), we'll use a selection of grey colors as a palette and rely heavily on the ordering of the OncoPrint legend.
Unfortunately there are over 20 "Other CNS" cancer groups that meet this criterion, so it is not feasible to have a color for each of them and they will not be included in the "Other CNS" OncoPrint.

```{r}
greys_df <- palette_df %>%
  filter(cancer_group_display == "Other",
         !is.na(oncoprint_group),
         oncoprint_group != "Other CNS")

# Sample the greys sequential palette from color brewer
set.seed(2021)
greys_df <- greys_df %>%
  mutate(oncoprint_hex = sample(brewer.pal(nrow(greys_df), "Greys")))
```

Add `oncoprint_hex` & `oncoprint_include`, the latter will be `FALSE` when the former is `NA`.

Add `oncoprint_hex` for all other cancer groups and `oncoprint_include`.
The latter will be FALSE when the former is NA.

```{r}
palette_df <- palette_df %>%
  left_join(greys_df) %>%
  # When there's an oncoprint group and a specific cancer group display color,
  # use the cancer group display color as the oncoprint color
  mutate(oncoprint_hex = if_else(
    cancer_group_hex != cancer_group_other_hex & !is.na(oncoprint_group),
    cancer_group_hex,
    oncoprint_hex
  ),
  # Only when there's a specific oncoprint color -- even if that is a grey 
  # selected only for the oncoprint -- will a cancer group be included in the
  # oncoprint
  oncoprint_include = if_else(
    is.na(oncoprint_hex),
    FALSE,
    TRUE
  ))
```

### Save to TSV 

```{r}
readr::write_tsv(palette_df, 
                 file.path(output_dir, 
                           "broad_histology_cancer_group_palette.tsv"))
```

## Session Info

```{r}
sessionInfo()
```

diff --git a/figures/palettes/broad_histology_cancer_group_palette.tsv b/figures/palettes/broad_histology_cancer_group_palette.tsv new file mode 100644 index 0000000000..4a017f9d10 --- /dev/null +++ b/figures/palettes/broad_histology_cancer_group_palette.tsv @@ -0,0 +1,62 @@ +broad_histology cancer_group broad_histology_hex cancer_group_hex broad_histology_display cancer_group_display broad_histology_order oncoprint_group oncoprint_hex oncoprint_include +Benign tumor NA #590024 NA Benign tumor NA 11 NA NA FALSE +Benign tumor Choroid plexus papilloma #590024 #4d2635 Benign tumor Choroid plexus papilloma 11 NA NA FALSE +Benign tumor Atypical choroid plexus papilloma #590024 #b5b5b5 Benign tumor Other 11 NA NA FALSE +Benign tumor Adenoma #590024 #b5b5b5 Benign tumor Other 11 NA NA FALSE +Chordoma Chordoma #808080 #b5b5b5 Other Other 12 NA NA FALSE +Choroid plexus tumor Choroid plexus carcinoma #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Choroid plexus tumor Choroid plexus cyst #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Diffuse astrocytic and oligodendroglial tumor High-grade glioma astrocytoma #ff80e5 #ffccf5 Diffuse astrocytic and oligodendroglial tumor High-grade glioma astrocytoma 2 Diffuse astrocytic and oligodendroglial tumor #ffccf5 TRUE +Diffuse astrocytic and oligodendroglial tumor Diffuse midline glioma #ff80e5 #ff40d9 Diffuse astrocytic and oligodendroglial tumor Diffuse midline glioma 2 Diffuse astrocytic and oligodendroglial tumor #ff40d9 TRUE +Diffuse astrocytic and oligodendroglial tumor Diffuse intrinsic pontine glioma #ff80e5 #bf0099 Diffuse astrocytic and oligodendroglial tumor Diffuse intrinsic pontine glioma 2 Diffuse astrocytic and oligodendroglial tumor #bf0099 TRUE +Diffuse astrocytic and oligodendroglial tumor Oligodendroglioma #ff80e5 #b5b5b5 Diffuse astrocytic and oligodendroglial tumor Other 2 Diffuse astrocytic and oligodendroglial tumor #525252 TRUE +Embryonal tumor Medulloblastoma #220040 #a340ff Embryonal tumor Medulloblastoma 3 Embryonal tumor #a340ff TRUE +Embryonal tumor CNS Embryonal tumor #220040 #b08ccf Embryonal tumor CNS Embryonal tumor 3 Embryonal tumor #b08ccf TRUE +Embryonal tumor Atypical Teratoid Rhabdoid Tumor #220040 #4d0d85 Embryonal tumor Atypical Teratoid Rhabdoid Tumor 3 Embryonal tumor #4d0d85 TRUE +Embryonal tumor Embryonal tumor with multilayer rosettes #220040 #b5b5b5 Embryonal tumor Other 3 Embryonal tumor #737373 TRUE +Embryonal tumor Ganglioneuroblastoma #220040 #b5b5b5 Embryonal tumor Other 3 Embryonal tumor #252525 TRUE +Embryonal tumor CNS neuroblastoma #220040 #b5b5b5 Embryonal tumor Other 3 Embryonal tumor #F0F0F0 TRUE +Embryonal tumor Neuroblastoma #220040 #b5b5b5 Embryonal tumor Other 3 Embryonal tumor #BDBDBD TRUE +Ependymal tumor Ependymoma #2200ff #2200ff Ependymal tumor Ependymoma 4 Other CNS #2200ff TRUE +Germ cell tumor Teratoma #0074d9 #058aff Germ cell tumor Teratoma 10 Other CNS #058aff TRUE +Germ cell tumor Germinoma #0074d9 #b5b5b5 Germ cell tumor Other 10 Other CNS NA FALSE +Germ cell tumor Germinoma-Teratoma #0074d9 #b5b5b5 Germ cell tumor Other 10 Other CNS NA FALSE +Histiocytic tumor Rosai-Dorfman disease #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Histiocytic tumor Langerhans Cell histiocytosis #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Histiocytic tumor Juvenile xanthogranuloma #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Low-grade astrocytic tumor Low-grade glioma astrocytoma #8f8fbf #000080 Low-grade astrocytic tumor Low-grade glioma astrocytoma 1 Low-grade astrocytic tumor #000080 TRUE +Low-grade astrocytic tumor Ganglioglioma #8f8fbf #8c8cff Low-grade astrocytic tumor Ganglioglioma 1 Low-grade astrocytic tumor #8c8cff TRUE +Low-grade astrocytic tumor Subependymal Giant Cell Astrocytoma #8f8fbf #b5b5b5 Low-grade astrocytic tumor Other 1 Low-grade astrocytic tumor #969696 TRUE +Low-grade astrocytic tumor Diffuse fibrillary astrocytoma #8f8fbf #b5b5b5 Low-grade astrocytic tumor Other 1 Low-grade astrocytic tumor #000000 TRUE +Low-grade astrocytic tumor Pleomorphic xanthoastrocytoma #8f8fbf #b5b5b5 Low-grade astrocytic tumor Other 1 Low-grade astrocytic tumor #D9D9D9 TRUE +Low-grade astrocytic tumor Pilocytic astrocytoma #8f8fbf #b5b5b5 Low-grade astrocytic tumor Other 1 Low-grade astrocytic tumor #FFFFFF TRUE +Lymphoma CNS Burkitt's lymphoma #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Melanocytic tumor Melanocytic tumor #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Meningioma Meningioma #2db398 #2db398 Meningioma Meningioma 8 Other CNS #2db398 TRUE +Mesenchymal non-meningothelial tumor Sarcoma #7fbf00 #b5b5b5 Mesenchymal non-meningothelial tumor Other 9 Other CNS NA FALSE +Mesenchymal non-meningothelial tumor Ewing sarcoma #7fbf00 #9fbf60 Mesenchymal non-meningothelial tumor Ewing sarcoma 9 Other CNS #9fbf60 TRUE +Mesenchymal non-meningothelial tumor Hemangioblastoma #7fbf00 #b5b5b5 Mesenchymal non-meningothelial tumor Other 9 Other CNS NA FALSE +Mesenchymal non-meningothelial tumor Rhabdomyosarcoma #7fbf00 #b5b5b5 Mesenchymal non-meningothelial tumor Other 9 Other CNS NA FALSE +Mesenchymal non-meningothelial tumor Fibromyxoid lesion #7fbf00 #b5b5b5 Mesenchymal non-meningothelial tumor Other 9 Other CNS NA FALSE +Mesenchymal non-meningothelial tumor Cavernoma #7fbf00 #b5b5b5 Mesenchymal non-meningothelial tumor Other 9 Other CNS NA FALSE +Mesenchymal non-meningothelial tumor Myofibroblastoma #7fbf00 #b5b5b5 Mesenchymal non-meningothelial tumor Other 9 Other CNS NA FALSE +Metastatic tumors Metastatic secondary tumors-Neuroblastoma #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Metastatic tumors Metastatic secondary tumors #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Neuronal and mixed neuronal-glial tumor Diffuse leptomeningeal glioneuronal tumor #685815 #b5b5b5 Neuronal and mixed neuronal-glial tumor Other 6 Other CNS NA FALSE +Neuronal and mixed neuronal-glial tumor Dysembryoplastic neuroepithelial tumor #685815 #614e01 Neuronal and mixed neuronal-glial tumor Dysembryoplastic neuroepithelial tumor 6 Other CNS #614e01 TRUE +Neuronal and mixed neuronal-glial tumor Rosette-forming glioneuronal tumor #685815 #b5b5b5 Neuronal and mixed neuronal-glial tumor Other 6 Other CNS NA FALSE +Neuronal and mixed neuronal-glial tumor Dysplasia Gliosis-Glial-neuronal tumor NOS #685815 #b5b5b5 Neuronal and mixed neuronal-glial tumor Other 6 Other CNS NA FALSE +Neuronal and mixed neuronal-glial tumor Glial-neuronal tumor NOS #685815 #b5b5b5 Neuronal and mixed neuronal-glial tumor Other 6 Other CNS NA FALSE +Neuronal and mixed neuronal-glial tumor Neurocytoma #685815 #b5b5b5 Neuronal and mixed neuronal-glial tumor Other 6 Other CNS NA FALSE +Neuronal and mixed neuronal-glial tumor Desmoplastic infantile astrocytoma and ganglioglioma #685815 #b5b5b5 Neuronal and mixed neuronal-glial tumor Other 6 Other CNS NA FALSE +Non-CNS tumor Myxoid spindle cell tumor #808080 #b5b5b5 Other Other 12 NA NA FALSE +Non-tumor Epilepsy #808080 #b5b5b5 Other Other 12 NA NA FALSE +Non-tumor Arteriovenous malformation #808080 #b5b5b5 Other Other 12 NA NA FALSE +Non-tumor Reactive connective tissue #808080 #b5b5b5 Other Other 12 NA NA FALSE +Other tumor Ganglioneuroma #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Pre-cancerous lesion NA #808080 NA Other NA 12 NA NA FALSE +Tumor of cranial and paraspinal nerves Schwannoma #ffaa00 #ab7200 Tumor of cranial and paraspinal nerves Schwannoma 5 Other CNS #ab7200 TRUE +Tumor of cranial and paraspinal nerves Neurofibroma Plexiform #ffaa00 #e6ac39 Tumor of cranial and paraspinal nerves Neurofibroma Plexiform 5 Other CNS #e6ac39 TRUE +Tumor of cranial and paraspinal nerves Malignant peripheral nerve sheath tumor #ffaa00 #b5b5b5 Tumor of cranial and paraspinal nerves Other 5 Other CNS NA FALSE +Tumor of pineal region Pineoblastoma #808080 #b5b5b5 Other Other 12 Other CNS NA FALSE +Tumors of sellar region Craniopharyngioma #b2502d #b33000 Tumors of sellar region Craniopharyngioma 7 Other CNS #b33000 TRUE