@@ -4,12 +4,13 @@ author: Peter Carbonetto
4
4
output : workflowr::wflow_html
5
5
---
6
6
7
- Here we re-examine some of the [ matrix factorization results from
8
- the pancreas CEL-seq2 data] ( pancreas_another_look.html ) ,
7
+ Here we re-examine some of the
8
+ [ matrix factorization results from the pancreas CEL-seq2 data] ( pancreas_another_look.html ) ,
9
9
with the goal of understanding how best to * annotate* the pancreas
10
10
factors. As we will see, there isn't a single "one-size-fits-all"
11
- strategy that works best, so it is suggested that several annotation
12
- strategies be explored.
11
+ strategy that works best, and so we recommend exploring different
12
+ annotation strategies. Also, careful * interpretation* of the matrix
13
+ factorization results is discussed.
13
14
14
15
The plotting functions used in this analysis are from
15
16
[ fastTopics] [ fastTopics ] .
@@ -35,6 +36,51 @@ Set the seed for reproducibility.
35
36
set.seed(1)
36
37
```
37
38
39
+ Load the CEL-Seq2 pancreas data and the outputs generated by running
40
+ the ` compute_pancreas_celseq2_factors.R ` script.
41
+
42
+ ``` {r load-data-celseq2}
43
+ load("../data/pancreas.RData")
44
+ load("../output/pancreas_celseq2_factors.RData")
45
+ i <- which(sample_info$tech == "celseq2")
46
+ sample_info <- sample_info[i,]
47
+ counts <- counts[i,]
48
+ sample_info <- transform(sample_info,celltype = factor(celltype))
49
+ ```
50
+
51
+ We will first focus on the non-negative matrix factorization (NMF)
52
+ produced by flashier.
53
+
38
54
## Structure plot
39
55
56
+ The Structure plot (also shown in the previous analysis) shows that
57
+ many of the factors correspond closely to the cell types estimated in
58
+ the published analysis:
59
+
60
+ ``` {r structure-plot-flashier-nmf, fig.height=2, fig.width=8, results="hide", message=FALSE}
61
+ celltype <- sample_info$celltype
62
+ celltype <-
63
+ factor(celltype,
64
+ c("acinar","ductal","activated_stellate","quiescent_stellate",
65
+ "endothelial","macrophage","mast","schwann","alpha","beta",
66
+ "delta","gamma","epsilon"))
67
+ L <- fl_nmf_ldf$L
68
+ k <- ncol(L)
69
+ colnames(L) <- paste0("k",1:k)
70
+ structure_plot(L[,-1],grouping = celltype,gap = 10,perplexity = 70,n = Inf) +
71
+ labs(y = "membership",fill = "factor",color = "factor")
72
+ ```
73
+
74
+ Note that the first factor was omitted in the Structure plot because
75
+ it is a "baseline" factor, and not particularly interesting to look
76
+ at.
77
+
78
+ ** A note about interpretation:** For visualization purposes, the
79
+ columns of the L matrix—the "membership matrix"—were scaled so that
80
+ the largest membership for a given factor (column) is always
81
+ exactly 1. However, this * normalization is arbitrary* . Therefore, * it
82
+ is not meaningful to compare memberships across factors (i.e., colors
83
+ in the Structure plot); it is only meaningful to compare memberships
84
+ within a given factor (color in the Structure plot).*
85
+
40
86
[ fastTopics ] : https://github.com/stephenslab/fastTopics/
0 commit comments