Skip to content

Commit 1f93686

Browse files
committed
Added an 'interpretable' annotation plot to the pancreas_annotate analysis.
1 parent b36508b commit 1f93686

4 files changed

+250
-37
lines changed
Loading
Loading
Loading

docs/pancreas_annotate.html

+250-37
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ <h4 class="author">Peter Carbonetto</h4>
300300
<div class="tab-content">
301301
<div id="summary" class="tab-pane fade in active">
302302
<p>
303-
<strong>Last updated:</strong> 2025-02-18
303+
<strong>Last updated:</strong> 2025-02-19
304304
</p>
305305
<p>
306306
<strong>Checks:</strong> <span
@@ -432,15 +432,15 @@ <h4 class="author">Peter Carbonetto</h4>
432432
<div class="panel panel-default">
433433
<div class="panel-heading">
434434
<p class="panel-title">
435-
<a data-toggle="collapse" data-parent="#workflowr-checks" href="#strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree371108b3bde59420be37c4794d9787c13f0c1d77targetblank371108ba">
435+
<a data-toggle="collapse" data-parent="#workflowr-checks" href="#strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetreeb36508bf5db1e33dca218ea14dc03f3683da85b7targetblankb36508ba">
436436
<span class="glyphicon glyphicon-ok text-success"
437437
aria-hidden="true"></span> <strong>Repository version:</strong>
438-
<a href="https://github.com/stephenslab/single-cell-jamboree/tree/371108b3bde59420be37c4794d9787c13f0c1d77" target="_blank">371108b</a>
438+
<a href="https://github.com/stephenslab/single-cell-jamboree/tree/b36508bf5db1e33dca218ea14dc03f3683da85b7" target="_blank">b36508b</a>
439439
</a>
440440
</p>
441441
</div>
442442
<div
443-
id="strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree371108b3bde59420be37c4794d9787c13f0c1d77targetblank371108ba"
443+
id="strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetreeb36508bf5db1e33dca218ea14dc03f3683da85b7targetblankb36508ba"
444444
class="panel-collapse collapse">
445445
<div class="panel-body">
446446
<p>
@@ -450,7 +450,7 @@ <h4 class="author">Peter Carbonetto</h4>
450450
</p>
451451
<p>
452452
The results in this page were generated with repository version
453-
<a href="https://github.com/stephenslab/single-cell-jamboree/tree/371108b3bde59420be37c4794d9787c13f0c1d77" target="_blank">371108b</a>.
453+
<a href="https://github.com/stephenslab/single-cell-jamboree/tree/b36508bf5db1e33dca218ea14dc03f3683da85b7" target="_blank">b36508b</a>.
454454
See the <em>Past versions</em> tab to see a history of the changes made
455455
to the R Markdown and HTML files.
456456
</p>
@@ -519,6 +519,92 @@ <h4 class="author">Peter Carbonetto</h4>
519519
Rmd
520520
</td>
521521
<td>
522+
<a href="https://github.com/stephenslab/single-cell-jamboree/blob/b36508bf5db1e33dca218ea14dc03f3683da85b7/analysis/pancreas_annotate.Rmd" target="_blank">b36508b</a>
523+
</td>
524+
<td>
525+
Peter Carbonetto
526+
</td>
527+
<td>
528+
2025-02-19
529+
</td>
530+
<td>
531+
wflow_publish("pancreas_annotate.Rmd", verbose = TRUE, view = FALSE)
532+
</td>
533+
</tr>
534+
<tr>
535+
<td>
536+
Rmd
537+
</td>
538+
<td>
539+
<a href="https://github.com/stephenslab/single-cell-jamboree/blob/a8d45762ba596b777dd984b9a6e8ca52385073ef/analysis/pancreas_annotate.Rmd" target="_blank">a8d4576</a>
540+
</td>
541+
<td>
542+
Peter Carbonetto
543+
</td>
544+
<td>
545+
2025-02-19
546+
</td>
547+
<td>
548+
A few edits to the text of the pancreas_annotate analysis.
549+
</td>
550+
</tr>
551+
<tr>
552+
<td>
553+
Rmd
554+
</td>
555+
<td>
556+
<a href="https://github.com/stephenslab/single-cell-jamboree/blob/9dd5b4a4eeace7f6e633c05cf49036c4105a4b73/analysis/pancreas_annotate.Rmd" target="_blank">9dd5b4a</a>
557+
</td>
558+
<td>
559+
Peter Carbonetto
560+
</td>
561+
<td>
562+
2025-02-19
563+
</td>
564+
<td>
565+
Added annotation plots for flashier NMF result to pancreas_annotate
566+
analysis.
567+
</td>
568+
</tr>
569+
<tr>
570+
<td>
571+
Rmd
572+
</td>
573+
<td>
574+
<a href="https://github.com/stephenslab/single-cell-jamboree/blob/335dd81ff06478b6c84002e65dbd8bd8a200a59e/analysis/pancreas_annotate.Rmd" target="_blank">335dd81</a>
575+
</td>
576+
<td>
577+
Peter Carbonetto
578+
</td>
579+
<td>
580+
2025-02-19
581+
</td>
582+
<td>
583+
Added structure plot to pancreas_annotate analysis.
584+
</td>
585+
</tr>
586+
<tr>
587+
<td>
588+
html
589+
</td>
590+
<td>
591+
<a href="https://rawcdn.githack.com/stephenslab/single-cell-jamboree/e77173868c42138ec9de2b78146ab92fed1c4963/docs/pancreas_annotate.html" target="_blank">e771738</a>
592+
</td>
593+
<td>
594+
Peter Carbonetto
595+
</td>
596+
<td>
597+
2025-02-18
598+
</td>
599+
<td>
600+
First build of the pancreas_annotate analysis.
601+
</td>
602+
</tr>
603+
<tr>
604+
<td>
605+
Rmd
606+
</td>
607+
<td>
522608
<a href="https://github.com/stephenslab/single-cell-jamboree/blob/371108b3bde59420be37c4794d9787c13f0c1d77/analysis/pancreas_annotate.Rmd" target="_blank">371108b</a>
523609
</td>
524610
<td>
@@ -542,8 +628,9 @@ <h4 class="author">Peter Carbonetto</h4>
542628
href="pancreas_another_look.html">matrix factorization results from the
543629
pancreas CEL-seq2 data</a>, with the goal of understanding how best to
544630
<em>annotate</em> the pancreas factors. As we will see, there isn’t a
545-
single “one-size-fits-all” strategy that works best, so it is suggested
546-
that several annotation strategies be explored.</p>
631+
single “one-size-fits-all” strategy that works best, and so we recommend
632+
exploring different annotation strategies. Also, careful interpretation
633+
of the matrix factorization is discussed.</p>
547634
<p>The plotting functions used in this analysis are from <a
548635
href="https://github.com/stephenslab/fastTopics/">fastTopics</a>.</p>
549636
<p>First, load the packages needed for this analysis.</p>
@@ -554,8 +641,132 @@ <h4 class="author">Peter Carbonetto</h4>
554641
library(cowplot)</code></pre>
555642
<p>Set the seed for reproducibility.</p>
556643
<pre class="r"><code>set.seed(1)</code></pre>
644+
<p>Load the CEL-Seq2 pancreas data and the outputs generated by running
645+
the <code>compute_pancreas_celseq2_factors.R</code> script.</p>
646+
<pre class="r"><code>load(&quot;../data/pancreas.RData&quot;)
647+
load(&quot;../output/pancreas_celseq2_factors.RData&quot;)
648+
i &lt;- which(sample_info$tech == &quot;celseq2&quot;)
649+
sample_info &lt;- sample_info[i,]
650+
counts &lt;- counts[i,]
651+
sample_info &lt;- transform(sample_info,celltype = factor(celltype))</code></pre>
652+
<p>We will first focus on the non-negative matrix factorization (NMF)
653+
produced by flashier.</p>
557654
<div id="structure-plot" class="section level2">
558655
<h2>Structure plot</h2>
656+
<p>The Structure plot (also shown in the previous analysis) shows that
657+
many of the factors correspond closely to the cell-type assignments that
658+
were estimated in the published analysis:</p>
659+
<pre class="r"><code>celltype &lt;- sample_info$celltype
660+
celltype &lt;-
661+
factor(celltype,
662+
c(&quot;acinar&quot;,&quot;ductal&quot;,&quot;activated_stellate&quot;,&quot;quiescent_stellate&quot;,
663+
&quot;endothelial&quot;,&quot;macrophage&quot;,&quot;mast&quot;,&quot;schwann&quot;,&quot;alpha&quot;,&quot;beta&quot;,
664+
&quot;delta&quot;,&quot;gamma&quot;,&quot;epsilon&quot;))
665+
L &lt;- fl_nmf_ldf$L
666+
colnames(L) &lt;- paste0(&quot;k&quot;,1:9)
667+
structure_plot(L[,-1],grouping = celltype,gap = 10,perplexity = 70,n = Inf) +
668+
labs(y = &quot;membership&quot;,fill = &quot;factor&quot;,color = &quot;factor&quot;)</code></pre>
669+
<p><img src="figure/pancreas_annotate.Rmd/structure-plot-flashier-nmf-1.png" width="960" style="display: block; margin: auto;" /></p>
670+
<p>Note that the first factor was omitted in the Structure plot because
671+
it is a “baseline” factor, and not particularly interesting to look
672+
at.</p>
673+
<p><strong>A note about interpretation:</strong> For visualization
674+
purposes, the columns of the L matrix—the “membership matrix”—were
675+
scaled so that the largest membership for a given factor (column) was
676+
always exactly 1. However, please note <em>this normalization is
677+
arbitrary</em>. Therefore, <em>it is not meaningful to compare
678+
memberships across factors (i.e., colors in the Structure plot); it is
679+
only meaningful to compare memberships within a given factor (a single
680+
color in the Structure plot).</em></p>
681+
</div>
682+
<div id="annotating-the-factors-by-driving-genes"
683+
class="section level2">
684+
<h2>Annotating the factors by “driving genes”</h2>
685+
<p>To illustrate annotating the factors, let’s focus on factors 4, 5 and
686+
6—these are the factors that largely capture the islet cells (alpha,
687+
beta, <em>etc</em>). Let’s consider two different selection strategies:
688+
(i) choosing genes <span class="math inline">\(j\)</span> with the
689+
largest <span class="math inline">\(f_{jk}\)</span>; (ii) choosing genes
690+
<span class="math inline">\(j\)</span> with the largest differences
691+
<span class="math inline">\(f_{jk} - f_{jk&#39;}\)</span> with other
692+
factors <span class="math inline">\(k&#39;\)</span> (“distinctive
693+
genes”). These two selection strategies are implemented in the
694+
<code>annotation_heatmap</code> function:</p>
695+
<pre class="r"><code>F &lt;- fl_nmf_ldf$F
696+
colnames(F) &lt;- paste0(&quot;k&quot;,1:9)
697+
kset &lt;- paste0(&quot;k&quot;,4:6)
698+
p1 &lt;- annotation_heatmap(F,n = 8,dims = kset,
699+
select_features = &quot;largest&quot;,
700+
font_size = 9) +
701+
labs(title = &quot;select_features = \&quot;largest\&quot;&quot;) +
702+
theme(plot.title = element_text(face = &quot;plain&quot;,size = 9))
703+
p2 &lt;- annotation_heatmap(F,n = 8,dims = kset,
704+
select_features = &quot;distinctive&quot;,
705+
compare_dims = kset,
706+
font_size = 9) +
707+
labs(title = &quot;select_features = \&quot;distinctive\&quot;&quot;) +
708+
theme(plot.title = element_text(face = &quot;plain&quot;,size = 9))
709+
plot_grid(p1,p2,nrow = 1,ncol = 2)</code></pre>
710+
<p><img src="figure/pancreas_annotate.Rmd/annotation-plot-flashier-nmf-1.png" width="720" style="display: block; margin: auto;" /></p>
711+
<pre><code># Features selected for plot: INS IAPP SCGN SLC30A8 ABCC8 G6PC2 NPTX2 HADH GCG CHGB TM4SF4 TTR SCG2 SCG5 ALDH1A1 PCSK2 SST RBP4 PCSK1 CPE PPY SEC11C ISL1
712+
c(&quot;INS&quot;, &quot;IAPP&quot;, &quot;SCGN&quot;, &quot;SLC30A8&quot;, &quot;ABCC8&quot;, &quot;G6PC2&quot;, &quot;NPTX2&quot;,
713+
&quot;HADH&quot;, &quot;GCG&quot;, &quot;CHGB&quot;, &quot;TM4SF4&quot;, &quot;TTR&quot;, &quot;SCG2&quot;, &quot;SCG5&quot;, &quot;ALDH1A1&quot;,
714+
&quot;PCSK2&quot;, &quot;SST&quot;, &quot;RBP4&quot;, &quot;PCSK1&quot;, &quot;CPE&quot;, &quot;PPY&quot;, &quot;SEC11C&quot;, &quot;ISL1&quot;
715+
)
716+
# Features selected for plot: INS IAPP NPTX2 MAFA MEG3 ADCYAP1 PFKFB2 DLK1 GCG GC TTR TM4SF4 FAP LOXL4 ALDH1A1 CRYBA2 SST AQP3 PPY LEPR EGR1 RBP4 DPYSL3 AKAP12
717+
c(&quot;INS&quot;, &quot;IAPP&quot;, &quot;NPTX2&quot;, &quot;MAFA&quot;, &quot;MEG3&quot;, &quot;ADCYAP1&quot;, &quot;PFKFB2&quot;,
718+
&quot;DLK1&quot;, &quot;GCG&quot;, &quot;GC&quot;, &quot;TTR&quot;, &quot;TM4SF4&quot;, &quot;FAP&quot;, &quot;LOXL4&quot;, &quot;ALDH1A1&quot;,
719+
&quot;CRYBA2&quot;, &quot;SST&quot;, &quot;AQP3&quot;, &quot;PPY&quot;, &quot;LEPR&quot;, &quot;EGR1&quot;, &quot;RBP4&quot;, &quot;DPYSL3&quot;,
720+
&quot;AKAP12&quot;)</code></pre>
721+
<p>Strategy (i) picks out some canonical marker genes for islet cells
722+
such as <em>INS</em> for beta cells and <em>GCG</em> for alpha cells.
723+
But it also picks out other genes that are highly expressed in multiple
724+
islet cell types, such as <em>TTR</em> and <em>CHGB</em>. Strategy (ii)
725+
focusses more strongly on genes that distinguish one cell type from
726+
another, and as a result marker genes such as <em>MAFA</em> (beta cells)
727+
and <em>GC</em> (alpha cells) are ranked more highly with this
728+
strategy.</p>
729+
<p>The better strategy will depend on the setting and on the goals of
730+
the analysis, which is why the <code>annotation_heatmap</code> function
731+
provides both options. These selection strategies can also reveal
732+
complementary insights and so in many situations it may be better to use
733+
both.</p>
734+
<div id="a-more-interpretable-annotation-plot" class="section level3">
735+
<h3>A more interpretable annotation plot</h3>
736+
<p>Above we sounded a note of caution about interpreting elements of L
737+
across factors/columns. The same applies to the F matrix. To provide a
738+
more even footing, above we employed the simple heuristic of scaling the
739+
columns of F so that the maximum element in each column was 1. That was
740+
helpful for selecting “distinctive” gene, but made the effect sizes
741+
difficult to interpret. To produce more easily interpretable effect
742+
sizes, we recommend visualizing this F matrix (in this code, fl is a
743+
“flash” object, e.g., the return value from a call to
744+
<code>flashier::flash()</code>):</p>
745+
<pre class="r"><code>out &lt;- ldf(fl)
746+
F &lt;- with(out,F %*% diag(D))</code></pre>
747+
<p>This is what this rescaled F matrix looks like for the pancreas
748+
data:</p>
749+
<pre class="r"><code>genes &lt;- c(&quot;INS&quot;,&quot;IAPP&quot;,&quot;NPTX2&quot;,&quot;MAFA&quot;,&quot;MEG3&quot;,&quot;ADCYAP1&quot;,&quot;PFKFB2&quot;,
750+
&quot;DLK1&quot;,&quot;GCG&quot;,&quot;GC&quot;,&quot;TTR&quot;,&quot;TM4SF4&quot;,&quot;FAP&quot;,&quot;LOXL4&quot;,&quot;ALDH1A1&quot;,
751+
&quot;CRYBA2&quot;,&quot;SST&quot;,&quot;AQP3&quot;,&quot;PPY&quot;,&quot;LEPR&quot;,&quot;EGR1&quot;,&quot;RBP4&quot;,&quot;DPYSL3&quot;,
752+
&quot;AKAP12&quot;)
753+
F &lt;- with(fl_nmf_ldf,F %*% diag(D))
754+
colnames(F) &lt;- paste0(&quot;k&quot;,1:9)
755+
annotation_heatmap(F,select_features = genes,font_size = 9)</code></pre>
756+
<p><img src="figure/pancreas_annotate.Rmd/annotation-plot-flashier-nmf-2-1.png" width="390" style="display: block; margin: auto;" /></p>
757+
<p>Visually, this plot looks quite similar to before, but now the effect
758+
sizes are on a different scale. With this rescaling, the effect sizes
759+
have the following interpretation:</p>
760+
<p><span class="math inline">\(f_{jk}\)</span> is (approximately) the
761+
<em>log-fold change</em> (LFC) of gene <span
762+
class="math inline">\(j\)</span> in a cell <span
763+
class="math inline">\(i\)</span> with the largest membership in factor
764+
<span class="math inline">\(k\)</span> (<span
765+
class="math inline">\(l_{ik} =
766+
1\)</span>) relative to a cell <span
767+
class="math inline">\(i&#39;\)</span> with no membership in factor <span
768+
class="math inline">\(k\)</span> (<span
769+
class="math inline">\(l_{i&#39;k} = 0\)</span>).</p>
559770
<br>
560771
<p>
561772
<button type="button" class="btn btn-default btn-workflowr btn-workflowr-sessioninfo" data-toggle="collapse" data-target="#workflowr-sessioninfo" style="display: block;">
@@ -583,39 +794,41 @@ <h2>Structure plot</h2>
583794
# [1] stats graphics grDevices utils datasets methods base
584795
#
585796
# other attached packages:
586-
# [1] cowplot_1.1.3 ggplot2_3.5.0 fastTopics_0.7-20 flashier_1.0.55
797+
# [1] cowplot_1.1.3 ggplot2_3.5.0 fastTopics_0.7-21 flashier_1.0.55
587798
# [5] ebnm_1.1-34 Matrix_1.6-5
588799
#
589800
# loaded via a namespace (and not attached):
590-
# [1] tidyselect_1.2.1 viridisLite_0.4.2 dplyr_1.1.4
591-
# [4] fastmap_1.1.1 lazyeval_0.2.2 promises_1.2.1
592-
# [7] digest_0.6.34 lifecycle_1.0.4 invgamma_1.1
593-
# [10] magrittr_2.0.3 compiler_4.3.3 rlang_1.1.3
594-
# [13] sass_0.4.8 progress_1.2.3 tools_4.3.3
595-
# [16] utf8_1.2.4 yaml_2.3.8 data.table_1.15.2
596-
# [19] knitr_1.45 prettyunits_1.2.0 htmlwidgets_1.6.4
597-
# [22] scatterplot3d_0.3-44 plyr_1.8.9 RColorBrewer_1.1-3
598-
# [25] Rtsne_0.17 workflowr_1.7.1 withr_3.0.0
599-
# [28] purrr_1.0.2 grid_4.3.3 fansi_1.0.6
600-
# [31] git2r_0.33.0 colorspace_2.1-0 scales_1.3.0
601-
# [34] gtools_3.9.5 cli_3.6.2 rmarkdown_2.26
602-
# [37] crayon_1.5.2 generics_0.1.3 RcppParallel_5.1.7
603-
# [40] httr_1.4.7 reshape2_1.4.4 pbapply_1.7-2
604-
# [43] cachem_1.0.8 stringr_1.5.1 splines_4.3.3
605-
# [46] parallel_4.3.3 softImpute_1.4-1 vctrs_0.6.5
606-
# [49] jsonlite_1.8.8 hms_1.1.3 mixsqp_0.3-54
607-
# [52] ggrepel_0.9.5 irlba_2.3.5.1 horseshoe_0.2.0
608-
# [55] trust_0.1-8 plotly_4.10.4 jquerylib_0.1.4
609-
# [58] tidyr_1.3.1 glue_1.7.0 uwot_0.2.2.9000
610-
# [61] stringi_1.8.3 Polychrome_1.5.1 gtable_0.3.4
611-
# [64] later_1.3.2 quadprog_1.5-8 munsell_0.5.0
612-
# [67] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.7
613-
# [70] truncnorm_1.0-9 R6_2.5.1 rprojroot_2.0.4
614-
# [73] evaluate_0.23 lattice_0.22-5 RhpcBLASctl_0.23-42
615-
# [76] SQUAREM_2021.1 ashr_2.2-66 httpuv_1.6.14
616-
# [79] bslib_0.6.1 Rcpp_1.0.12 deconvolveR_1.2-1
617-
# [82] whisker_0.4.1 xfun_0.42 fs_1.6.3
618-
# [85] pkgconfig_2.0.3</code></pre>
801+
# [1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.1
802+
# [4] dplyr_1.1.4 fastmap_1.1.1 lazyeval_0.2.2
803+
# [7] promises_1.2.1 digest_0.6.34 lifecycle_1.0.4
804+
# [10] invgamma_1.1 magrittr_2.0.3 compiler_4.3.3
805+
# [13] rlang_1.1.3 sass_0.4.8 progress_1.2.3
806+
# [16] tools_4.3.3 utf8_1.2.4 yaml_2.3.8
807+
# [19] data.table_1.15.2 knitr_1.45 labeling_0.4.3
808+
# [22] prettyunits_1.2.0 htmlwidgets_1.6.4 scatterplot3d_0.3-44
809+
# [25] plyr_1.8.9 RColorBrewer_1.1-3 Rtsne_0.17
810+
# [28] workflowr_1.7.1 withr_3.0.0 purrr_1.0.2
811+
# [31] grid_4.3.3 fansi_1.0.6 git2r_0.33.0
812+
# [34] colorspace_2.1-0 scales_1.3.0 gtools_3.9.5
813+
# [37] cli_3.6.2 rmarkdown_2.26 crayon_1.5.2
814+
# [40] generics_0.1.3 RcppParallel_5.1.7 httr_1.4.7
815+
# [43] reshape2_1.4.4 pbapply_1.7-2 cachem_1.0.8
816+
# [46] stringr_1.5.1 splines_4.3.3 parallel_4.3.3
817+
# [49] softImpute_1.4-1 vctrs_0.6.5 jsonlite_1.8.8
818+
# [52] hms_1.1.3 mixsqp_0.3-54 ggrepel_0.9.5
819+
# [55] irlba_2.3.5.1 horseshoe_0.2.0 trust_0.1-8
820+
# [58] plotly_4.10.4 jquerylib_0.1.4 tidyr_1.3.1
821+
# [61] glue_1.7.0 uwot_0.2.2.9000 stringi_1.8.3
822+
# [64] Polychrome_1.5.1 gtable_0.3.4 later_1.3.2
823+
# [67] quadprog_1.5-8 munsell_0.5.0 tibble_3.2.1
824+
# [70] pillar_1.9.0 htmltools_0.5.7 truncnorm_1.0-9
825+
# [73] R6_2.5.1 rprojroot_2.0.4 evaluate_0.23
826+
# [76] lattice_0.22-5 highr_0.10 RhpcBLASctl_0.23-42
827+
# [79] SQUAREM_2021.1 ashr_2.2-66 httpuv_1.6.14
828+
# [82] bslib_0.6.1 Rcpp_1.0.12 deconvolveR_1.2-1
829+
# [85] whisker_0.4.1 xfun_0.42 fs_1.6.3
830+
# [88] pkgconfig_2.0.3</code></pre>
831+
</div>
619832
</div>
620833
</div>
621834

0 commit comments

Comments
 (0)