Skip to content

Commit 335dd81

Browse files
committed
Added structure plot to pancreas_annotate analysis.
1 parent bc266b0 commit 335dd81

File tree

1 file changed

+50
-4
lines changed

1 file changed

+50
-4
lines changed

analysis/pancreas_annotate.Rmd

+50-4
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,13 @@ author: Peter Carbonetto
44
output: workflowr::wflow_html
55
---
66

7-
Here we re-examine some of the [matrix factorization results from
8-
the pancreas CEL-seq2 data](pancreas_another_look.html),
7+
Here we re-examine some of the
8+
[matrix factorization results from the pancreas CEL-seq2 data](pancreas_another_look.html),
99
with the goal of understanding how best to *annotate* the pancreas
1010
factors. As we will see, there isn't a single "one-size-fits-all"
11-
strategy that works best, so it is suggested that several annotation
12-
strategies be explored.
11+
strategy that works best, and so we recommend exploring different
12+
annotation strategies. Also, careful *interpretation* of the matrix
13+
factorization results is discussed.
1314

1415
The plotting functions used in this analysis are from
1516
[fastTopics][fastTopics].
@@ -35,6 +36,51 @@ Set the seed for reproducibility.
3536
set.seed(1)
3637
```
3738

39+
Load the CEL-Seq2 pancreas data and the outputs generated by running
40+
the `compute_pancreas_celseq2_factors.R` script.
41+
42+
```{r load-data-celseq2}
43+
load("../data/pancreas.RData")
44+
load("../output/pancreas_celseq2_factors.RData")
45+
i <- which(sample_info$tech == "celseq2")
46+
sample_info <- sample_info[i,]
47+
counts <- counts[i,]
48+
sample_info <- transform(sample_info,celltype = factor(celltype))
49+
```
50+
51+
We will first focus on the non-negative matrix factorization (NMF)
52+
produced by flashier.
53+
3854
## Structure plot
3955

56+
The Structure plot (also shown in the previous analysis) shows that
57+
many of the factors correspond closely to the cell types estimated in
58+
the published analysis:
59+
60+
```{r structure-plot-flashier-nmf, fig.height=2, fig.width=8, results="hide", message=FALSE}
61+
celltype <- sample_info$celltype
62+
celltype <-
63+
factor(celltype,
64+
c("acinar","ductal","activated_stellate","quiescent_stellate",
65+
"endothelial","macrophage","mast","schwann","alpha","beta",
66+
"delta","gamma","epsilon"))
67+
L <- fl_nmf_ldf$L
68+
k <- ncol(L)
69+
colnames(L) <- paste0("k",1:k)
70+
structure_plot(L[,-1],grouping = celltype,gap = 10,perplexity = 70,n = Inf) +
71+
labs(y = "membership",fill = "factor",color = "factor")
72+
```
73+
74+
Note that the first factor was omitted in the Structure plot because
75+
it is a "baseline" factor, and not particularly interesting to look
76+
at.
77+
78+
**A note about interpretation:** For visualization purposes, the
79+
columns of the L matrix—the "membership matrix"—were scaled so that
80+
the largest membership for a given factor (column) is always
81+
exactly 1. However, this *normalization is arbitrary*. Therefore, *it
82+
is not meaningful to compare memberships across factors (i.e., colors
83+
in the Structure plot); it is only meaningful to compare memberships
84+
within a given factor (color in the Structure plot).*
85+
4086
[fastTopics]: https://github.com/stephenslab/fastTopics/

0 commit comments

Comments
 (0)