Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Commit

Permalink
update figures README (#1172)
Browse files Browse the repository at this point in the history
* update figures README

- add figure theme
- add statistics info

* Update figures/README.md

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update figures/README.md

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update figures/README.md

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update figures/README.md

add `r`

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update figures/README.md

add `r`

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

* Update README.md

add r and bash to quoted text

* Update figures/README.md

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>

Co-authored-by: Jaclyn Taroni <jaclyn.n.taroni@gmail.com>
  • Loading branch information
Jo Lynne Rokita and jaclyn-taroni authored Sep 2, 2021
1 parent d3a7edf commit 4240cc6
Showing 1 changed file with 63 additions and 11 deletions.
74 changes: 63 additions & 11 deletions figures/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ We recommend [using the download script](https://github.com/AlexsLemonade/OpenPB

See [these instructions](https://github.com/AlexsLemonade/OpenPBTA-analysis#docker-image) for setting up the project Docker container.
Briefly, the latest version of the project Docker image, which is updated upon commit to `master`, can be obtained and run via:
```

```bash
docker pull ccdlopenpbta/open-pbta:latest
docker run \
-e PASSWORD=<password> \
Expand All @@ -26,7 +27,7 @@ You may choose to use [`docker exec`](https://docs.docker.com/engine/reference/c

This script runs **_all_** the intermediate steps needed to generate figures starting with the original data files.

```
```bash
bash figures/generate-figures.sh
```

Expand Down Expand Up @@ -76,10 +77,10 @@ To see a summary of what colors are used for histology labeling, see [`mapping-h

**Step 1)** Read in color palette and select the pertinent columns

There's some extra columns in `histology_label_color_table.tsv` that you don't need for plotting per se but are more record-keeping purposes.
With the code chunk below, we only import the four columns we need and then do a factor reorder to make sure the `display_group` is in the order declared by `display_order`.
There are some extra columns in `histology_label_color_table.tsv` that you don't need for plotting per se but are more record-keeping purposes.
With the code chunk below, you can import the columns you need (For example: `Kids_First_Biospecimen_ID, display_group, display_order, hex_codes` or `Kids_First_Biospecimen_ID, cancer_group, cancer_group_order, cancer_group_hex_codes` and then do a factor reorder to make sure the `display_group` (or `cancer_group`)is in the order declared by `display_order` (`cancer_group_order`).

```
```r
# Import standard color palettes for project
histology_label_mapping <- readr::read_tsv(
file.path(figures_dir, "palettes", "histology_label_color_table.tsv")
Expand All @@ -93,7 +94,7 @@ histology_label_mapping <- readr::read_tsv(
**Step 2)** Use `dplyr::inner_join` using `Kids_First_Biospecimen_ID` to join by so you can add on the `hex_codes` and `display_group` for each biospecimen.
`display_order` specifies what order the `display_group`s should be displayed.

```
```r
# Read in the metadata
metadata <- readr::read_tsv(metadata_file, guess_max = 10000) %>%
dplyr::inner_join(histology_label_mapping, by = "Kids_First_Biospecimen_ID")
Expand All @@ -105,7 +106,7 @@ Using the `ggplot2::scale_fill_identity()` or `ggplot2::scale_color_identity()`
For base R plots, you should be able to supply the `hex_codes` column as your `col` argument.
`display_group` should be used as the labels in the plot.

```
```r
metadata %>%
dplyr::group_by(display_group, hex_codes) %>%
dplyr::summarize(count = dplyr::n()) %>%
Expand All @@ -120,15 +121,15 @@ metadata %>%

You may want to remove the `na_color` at the end of the list depending on whether your data include `NA`s or if the plotting function you are using has the `na_color` supplied separately.

```
```r
gradient_col_palette <- readr::read_tsv(
file.path(figures_dir, "palettes", "gradient_color_palette.tsv")
)
```

If we need the `NA` color separated, like for use with `ComplexHeatmap` which has a separate argument for the color for `NA` values.

```
```r
na_color <- gradient_col_palette %>%
dplyr::filter(color_names == "na_color")

Expand All @@ -142,7 +143,7 @@ In this example, we are building a `colorRamp2` function based on a regular inte
However, depending on your data's distribution a regular interval based palette might not represent your data well on the plot.
You can provide any numeric vector to color code a palette using `circlize::colorRamp2` as long as that numeric vector is the same length as the palette itself.

```
```r
gradient_col_val <- seq(from = min(df$variable), to = max(df$variable),
length.out = nrow(gradient_col_palette))

Expand All @@ -154,7 +155,7 @@ col_fun <- circlize::colorRamp2(gradient_col_val,
This step depends on how your main plotting function would like the data supplied.
For example, `ComplexHeatmap` wants a function to be supplied to their `col` argument.

```
```r
# Apply to variable directly and make a new column
df <- df %>%
dplyr::mutate(color_key = col_fun(variable))
Expand All @@ -178,3 +179,54 @@ The script can be called from anywhere in this repository (will look for the `.g
The hex codes table in `figures/README.md` and its swatches should also be updated by using the `swatches_table` function at the end of the script and copy and pasting this function's output to the appropriate place in the table.

The histology color palette file is created by running `Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', clean = TRUE)"`.


### Overall figure theme

In general, we will use the `ggpubr` package with `ggtheme = theme_pubr())` and color palette `simpsons` from package `ggsci` since it has 16 levels and can accommodate the levels in groups such as `molecular_subtype`.

To view the palette:
```r
scales::show_col(ggsci::pal_simpsons("springfield")(16))
```

For 2+ group comparisons, we will use violin or boxplots with jitter.


### Statistics

Some modules perform group-wise comparisons.
For the manuscript, we may want to output tables of the statistics and/or print the statistical test and p-value directly on the plot.
We use the functions `ggpubr::compare_means()` and `ggpubr::stat_compare_means()` for this.
Below are the default tests, parameters, and method options for 2 groups or [more than two groups](http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/#compare-more-than-two-groups) for your convenience.
Caution: the default p-values on the plots are uncorrected.

| | 2 groups | 3+ groups |
|--------------------------------------------|------------------------------------------------------|-----------------------------------------------------------------------|
| Default test (method) | Wilcoxon | Kruskal-wallis |
| Allowed methods | "wilcox.test" (non-parametric) "t.test" (parametric) | "kruskal.test" (non-parametric) "anova" (parametric) |
| Default multiple testing (p.adjust.method) | NA | yes, but not bonferroni |
| Allowed p.adjust.method | NA | "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none" |

Below is an example for creating a violin plot with boxplot, jitter, and appropriate statistics.

```r
if(length(unique(df$var_x)) > 2){
method <- "kruskal.test"
} else {
method <- "wilcox.test"
}


p <- ggviolin(df, x = "var_x", y = "var_y",
color = "var_color",
palette = "simpsons",
order = c("a", "b", "c"),
add = c("boxplot", "jitter"),
ggtheme = theme_pubr()) +
# Add pairwise comparisons p-value
stat_compare_means(method = method, label.y = 1.2, label.x.npc = "center") +
xlab("xlab_text") +
ylab("ylab_text") +
rremove("legend")
```

0 comments on commit 4240cc6

Please sign in to comment.