Added structure plot to pancreas_annotate analysis.

pcarbo · pcarbo · commit 335dd81ff064 · 2025-02-19T15:12:44.000-06:00
diff --git a/analysis/pancreas_annotate.Rmd b/analysis/pancreas_annotate.Rmd
@@ -4,12 +4,13 @@ author: Peter Carbonetto
 output: workflowr::wflow_html
 ---
 
-Here we re-examine some of the [matrix factorization results from
-the pancreas CEL-seq2 data](pancreas_another_look.html),
+Here we re-examine some of the
+[matrix factorization results from the pancreas CEL-seq2 data](pancreas_another_look.html),
 with the goal of understanding how best to *annotate* the pancreas
 factors. As we will see, there isn't a single "one-size-fits-all"
-strategy that works best, so it is suggested that several annotation
-strategies be explored.
+strategy that works best, and so we recommend exploring different
+annotation strategies. Also, careful *interpretation* of the matrix
+factorization results is discussed.
 
 The plotting functions used in this analysis are from
 [fastTopics][fastTopics].
@@ -35,6 +36,51 @@ Set the seed for reproducibility.
 set.seed(1)
 ```
 
+Load the CEL-Seq2 pancreas data and the outputs generated by running
+the `compute_pancreas_celseq2_factors.R` script. 
+
+```{r load-data-celseq2}
+load("../data/pancreas.RData")
+load("../output/pancreas_celseq2_factors.RData")
+i           <- which(sample_info$tech == "celseq2")
+sample_info <- sample_info[i,]
+counts      <- counts[i,]
+sample_info <- transform(sample_info,celltype = factor(celltype))
+```
+
+We will first focus on the non-negative matrix factorization (NMF)
+produced by flashier.
+
 ## Structure plot
 
+The Structure plot (also shown in the previous analysis) shows that
+many of the factors correspond closely to the cell types estimated in
+the published analysis:
+
+```{r structure-plot-flashier-nmf, fig.height=2, fig.width=8, results="hide", message=FALSE}
+celltype <- sample_info$celltype
+celltype <-
+ factor(celltype,
+        c("acinar","ductal","activated_stellate","quiescent_stellate",
+		  "endothelial","macrophage","mast","schwann","alpha","beta",
+		  "delta","gamma","epsilon"))
+L <- fl_nmf_ldf$L
+k <- ncol(L)
+colnames(L) <- paste0("k",1:k)
+structure_plot(L[,-1],grouping = celltype,gap = 10,perplexity = 70,n = Inf) +
+  labs(y = "membership",fill = "factor",color = "factor")
+```
+
+Note that the first factor was omitted in the Structure plot because
+it is a "baseline" factor, and not particularly interesting to look
+at.
+
+**A note about interpretation:** For visualization purposes, the
+columns of the L matrix—the "membership matrix"—were scaled so that
+the largest membership for a given factor (column) is always
+exactly 1.  However, this *normalization is arbitrary*. Therefore, *it
+is not meaningful to compare memberships across factors (i.e., colors
+in the Structure plot); it is only meaningful to compare memberships
+within a given factor (color in the Structure plot).*
+
 [fastTopics]: https://github.com/stephenslab/fastTopics/