@@ -300,7 +300,7 @@ <h4 class="author">Peter Carbonetto</h4>
300
300
< div class ="tab-content ">
301
301
< div id ="summary " class ="tab-pane fade in active ">
302
302
< p >
303
- < strong > Last updated:</ strong > 2025-02-18
303
+ < strong > Last updated:</ strong > 2025-02-19
304
304
</ p >
305
305
< p >
306
306
< strong > Checks:</ strong > < span
@@ -432,15 +432,15 @@ <h4 class="author">Peter Carbonetto</h4>
432
432
< div class ="panel panel-default ">
433
433
< div class ="panel-heading ">
434
434
< p class ="panel-title ">
435
- < a data-toggle ="collapse " data-parent ="#workflowr-checks " href ="#strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree371108b3bde59420be37c4794d9787c13f0c1d77targetblank371108ba ">
435
+ < a data-toggle ="collapse " data-parent ="#workflowr-checks " href ="#strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetreeb36508bf5db1e33dca218ea14dc03f3683da85b7targetblankb36508ba ">
436
436
< span class ="glyphicon glyphicon-ok text-success "
437
437
aria-hidden ="true "> </ span > < strong > Repository version:</ strong >
438
- < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/371108b3bde59420be37c4794d9787c13f0c1d77 " target ="_blank "> 371108b </ a >
438
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/b36508bf5db1e33dca218ea14dc03f3683da85b7 " target ="_blank "> b36508b </ a >
439
439
</ a >
440
440
</ p >
441
441
</ div >
442
442
< div
443
- id ="strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree371108b3bde59420be37c4794d9787c13f0c1d77targetblank371108ba "
443
+ id ="strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetreeb36508bf5db1e33dca218ea14dc03f3683da85b7targetblankb36508ba "
444
444
class ="panel-collapse collapse ">
445
445
< div class ="panel-body ">
446
446
< p >
@@ -450,7 +450,7 @@ <h4 class="author">Peter Carbonetto</h4>
450
450
</ p >
451
451
< p >
452
452
The results in this page were generated with repository version
453
- < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/371108b3bde59420be37c4794d9787c13f0c1d77 " target ="_blank "> 371108b </ a > .
453
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/b36508bf5db1e33dca218ea14dc03f3683da85b7 " target ="_blank "> b36508b </ a > .
454
454
See the < em > Past versions</ em > tab to see a history of the changes made
455
455
to the R Markdown and HTML files.
456
456
</ p >
@@ -519,6 +519,92 @@ <h4 class="author">Peter Carbonetto</h4>
519
519
Rmd
520
520
</ td >
521
521
< td >
522
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/b36508bf5db1e33dca218ea14dc03f3683da85b7/analysis/pancreas_annotate.Rmd " target ="_blank "> b36508b</ a >
523
+ </ td >
524
+ < td >
525
+ Peter Carbonetto
526
+ </ td >
527
+ < td >
528
+ 2025-02-19
529
+ </ td >
530
+ < td >
531
+ wflow_publish("pancreas_annotate.Rmd", verbose = TRUE, view = FALSE)
532
+ </ td >
533
+ </ tr >
534
+ < tr >
535
+ < td >
536
+ Rmd
537
+ </ td >
538
+ < td >
539
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/a8d45762ba596b777dd984b9a6e8ca52385073ef/analysis/pancreas_annotate.Rmd " target ="_blank "> a8d4576</ a >
540
+ </ td >
541
+ < td >
542
+ Peter Carbonetto
543
+ </ td >
544
+ < td >
545
+ 2025-02-19
546
+ </ td >
547
+ < td >
548
+ A few edits to the text of the pancreas_annotate analysis.
549
+ </ td >
550
+ </ tr >
551
+ < tr >
552
+ < td >
553
+ Rmd
554
+ </ td >
555
+ < td >
556
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/9dd5b4a4eeace7f6e633c05cf49036c4105a4b73/analysis/pancreas_annotate.Rmd " target ="_blank "> 9dd5b4a</ a >
557
+ </ td >
558
+ < td >
559
+ Peter Carbonetto
560
+ </ td >
561
+ < td >
562
+ 2025-02-19
563
+ </ td >
564
+ < td >
565
+ Added annotation plots for flashier NMF result to pancreas_annotate
566
+ analysis.
567
+ </ td >
568
+ </ tr >
569
+ < tr >
570
+ < td >
571
+ Rmd
572
+ </ td >
573
+ < td >
574
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/335dd81ff06478b6c84002e65dbd8bd8a200a59e/analysis/pancreas_annotate.Rmd " target ="_blank "> 335dd81</ a >
575
+ </ td >
576
+ < td >
577
+ Peter Carbonetto
578
+ </ td >
579
+ < td >
580
+ 2025-02-19
581
+ </ td >
582
+ < td >
583
+ Added structure plot to pancreas_annotate analysis.
584
+ </ td >
585
+ </ tr >
586
+ < tr >
587
+ < td >
588
+ html
589
+ </ td >
590
+ < td >
591
+ < a href ="https://rawcdn.githack.com/stephenslab/single-cell-jamboree/e77173868c42138ec9de2b78146ab92fed1c4963/docs/pancreas_annotate.html " target ="_blank "> e771738</ a >
592
+ </ td >
593
+ < td >
594
+ Peter Carbonetto
595
+ </ td >
596
+ < td >
597
+ 2025-02-18
598
+ </ td >
599
+ < td >
600
+ First build of the pancreas_annotate analysis.
601
+ </ td >
602
+ </ tr >
603
+ < tr >
604
+ < td >
605
+ Rmd
606
+ </ td >
607
+ < td >
522
608
< a href ="https://github.com/stephenslab/single-cell-jamboree/blob/371108b3bde59420be37c4794d9787c13f0c1d77/analysis/pancreas_annotate.Rmd " target ="_blank "> 371108b</ a >
523
609
</ td >
524
610
< td >
@@ -542,8 +628,9 @@ <h4 class="author">Peter Carbonetto</h4>
542
628
href ="pancreas_another_look.html "> matrix factorization results from the
543
629
pancreas CEL-seq2 data</ a > , with the goal of understanding how best to
544
630
< em > annotate</ em > the pancreas factors. As we will see, there isn’t a
545
- single “one-size-fits-all” strategy that works best, so it is suggested
546
- that several annotation strategies be explored.</ p >
631
+ single “one-size-fits-all” strategy that works best, and so we recommend
632
+ exploring different annotation strategies. Also, careful interpretation
633
+ of the matrix factorization is discussed.</ p >
547
634
< p > The plotting functions used in this analysis are from < a
548
635
href ="https://github.com/stephenslab/fastTopics/ "> fastTopics</ a > .</ p >
549
636
< p > First, load the packages needed for this analysis.</ p >
@@ -554,8 +641,132 @@ <h4 class="author">Peter Carbonetto</h4>
554
641
library(cowplot)</ code > </ pre >
555
642
< p > Set the seed for reproducibility.</ p >
556
643
< pre class ="r "> < code > set.seed(1)</ code > </ pre >
644
+ < p > Load the CEL-Seq2 pancreas data and the outputs generated by running
645
+ the < code > compute_pancreas_celseq2_factors.R</ code > script.</ p >
646
+ < pre class ="r "> < code > load("../data/pancreas.RData")
647
+ load("../output/pancreas_celseq2_factors.RData")
648
+ i <- which(sample_info$tech == "celseq2")
649
+ sample_info <- sample_info[i,]
650
+ counts <- counts[i,]
651
+ sample_info <- transform(sample_info,celltype = factor(celltype))</ code > </ pre >
652
+ < p > We will first focus on the non-negative matrix factorization (NMF)
653
+ produced by flashier.</ p >
557
654
< div id ="structure-plot " class ="section level2 ">
558
655
< h2 > Structure plot</ h2 >
656
+ < p > The Structure plot (also shown in the previous analysis) shows that
657
+ many of the factors correspond closely to the cell-type assignments that
658
+ were estimated in the published analysis:</ p >
659
+ < pre class ="r "> < code > celltype <- sample_info$celltype
660
+ celltype <-
661
+ factor(celltype,
662
+ c("acinar","ductal","activated_stellate","quiescent_stellate",
663
+ "endothelial","macrophage","mast","schwann","alpha","beta",
664
+ "delta","gamma","epsilon"))
665
+ L <- fl_nmf_ldf$L
666
+ colnames(L) <- paste0("k",1:9)
667
+ structure_plot(L[,-1],grouping = celltype,gap = 10,perplexity = 70,n = Inf) +
668
+ labs(y = "membership",fill = "factor",color = "factor")</ code > </ pre >
669
+ < p > < img src ="figure/pancreas_annotate.Rmd/structure-plot-flashier-nmf-1.png " width ="960 " style ="display: block; margin: auto; " /> </ p >
670
+ < p > Note that the first factor was omitted in the Structure plot because
671
+ it is a “baseline” factor, and not particularly interesting to look
672
+ at.</ p >
673
+ < p > < strong > A note about interpretation:</ strong > For visualization
674
+ purposes, the columns of the L matrix—the “membership matrix”—were
675
+ scaled so that the largest membership for a given factor (column) was
676
+ always exactly 1. However, please note < em > this normalization is
677
+ arbitrary</ em > . Therefore, < em > it is not meaningful to compare
678
+ memberships across factors (i.e., colors in the Structure plot); it is
679
+ only meaningful to compare memberships within a given factor (a single
680
+ color in the Structure plot).</ em > </ p >
681
+ </ div >
682
+ < div id ="annotating-the-factors-by-driving-genes "
683
+ class ="section level2 ">
684
+ < h2 > Annotating the factors by “driving genes”</ h2 >
685
+ < p > To illustrate annotating the factors, let’s focus on factors 4, 5 and
686
+ 6—these are the factors that largely capture the islet cells (alpha,
687
+ beta, < em > etc</ em > ). Let’s consider two different selection strategies:
688
+ (i) choosing genes < span class ="math inline "> \(j\)</ span > with the
689
+ largest < span class ="math inline "> \(f_{jk}\)</ span > ; (ii) choosing genes
690
+ < span class ="math inline "> \(j\)</ span > with the largest differences
691
+ < span class ="math inline "> \(f_{jk} - f_{jk'}\)</ span > with other
692
+ factors < span class ="math inline "> \(k'\)</ span > (“distinctive
693
+ genes”). These two selection strategies are implemented in the
694
+ < code > annotation_heatmap</ code > function:</ p >
695
+ < pre class ="r "> < code > F <- fl_nmf_ldf$F
696
+ colnames(F) <- paste0("k",1:9)
697
+ kset <- paste0("k",4:6)
698
+ p1 <- annotation_heatmap(F,n = 8,dims = kset,
699
+ select_features = "largest",
700
+ font_size = 9) +
701
+ labs(title = "select_features = \"largest\"") +
702
+ theme(plot.title = element_text(face = "plain",size = 9))
703
+ p2 <- annotation_heatmap(F,n = 8,dims = kset,
704
+ select_features = "distinctive",
705
+ compare_dims = kset,
706
+ font_size = 9) +
707
+ labs(title = "select_features = \"distinctive\"") +
708
+ theme(plot.title = element_text(face = "plain",size = 9))
709
+ plot_grid(p1,p2,nrow = 1,ncol = 2)</ code > </ pre >
710
+ < p > < img src ="figure/pancreas_annotate.Rmd/annotation-plot-flashier-nmf-1.png " width ="720 " style ="display: block; margin: auto; " /> </ p >
711
+ < pre > < code > # Features selected for plot: INS IAPP SCGN SLC30A8 ABCC8 G6PC2 NPTX2 HADH GCG CHGB TM4SF4 TTR SCG2 SCG5 ALDH1A1 PCSK2 SST RBP4 PCSK1 CPE PPY SEC11C ISL1
712
+ c("INS", "IAPP", "SCGN", "SLC30A8", "ABCC8", "G6PC2", "NPTX2",
713
+ "HADH", "GCG", "CHGB", "TM4SF4", "TTR", "SCG2", "SCG5", "ALDH1A1",
714
+ "PCSK2", "SST", "RBP4", "PCSK1", "CPE", "PPY", "SEC11C", "ISL1"
715
+ )
716
+ # Features selected for plot: INS IAPP NPTX2 MAFA MEG3 ADCYAP1 PFKFB2 DLK1 GCG GC TTR TM4SF4 FAP LOXL4 ALDH1A1 CRYBA2 SST AQP3 PPY LEPR EGR1 RBP4 DPYSL3 AKAP12
717
+ c("INS", "IAPP", "NPTX2", "MAFA", "MEG3", "ADCYAP1", "PFKFB2",
718
+ "DLK1", "GCG", "GC", "TTR", "TM4SF4", "FAP", "LOXL4", "ALDH1A1",
719
+ "CRYBA2", "SST", "AQP3", "PPY", "LEPR", "EGR1", "RBP4", "DPYSL3",
720
+ "AKAP12")</ code > </ pre >
721
+ < p > Strategy (i) picks out some canonical marker genes for islet cells
722
+ such as < em > INS</ em > for beta cells and < em > GCG</ em > for alpha cells.
723
+ But it also picks out other genes that are highly expressed in multiple
724
+ islet cell types, such as < em > TTR</ em > and < em > CHGB</ em > . Strategy (ii)
725
+ focusses more strongly on genes that distinguish one cell type from
726
+ another, and as a result marker genes such as < em > MAFA</ em > (beta cells)
727
+ and < em > GC</ em > (alpha cells) are ranked more highly with this
728
+ strategy.</ p >
729
+ < p > The better strategy will depend on the setting and on the goals of
730
+ the analysis, which is why the < code > annotation_heatmap</ code > function
731
+ provides both options. These selection strategies can also reveal
732
+ complementary insights and so in many situations it may be better to use
733
+ both.</ p >
734
+ < div id ="a-more-interpretable-annotation-plot " class ="section level3 ">
735
+ < h3 > A more interpretable annotation plot</ h3 >
736
+ < p > Above we sounded a note of caution about interpreting elements of L
737
+ across factors/columns. The same applies to the F matrix. To provide a
738
+ more even footing, above we employed the simple heuristic of scaling the
739
+ columns of F so that the maximum element in each column was 1. That was
740
+ helpful for selecting “distinctive” gene, but made the effect sizes
741
+ difficult to interpret. To produce more easily interpretable effect
742
+ sizes, we recommend visualizing this F matrix (in this code, fl is a
743
+ “flash” object, e.g., the return value from a call to
744
+ < code > flashier::flash()</ code > ):</ p >
745
+ < pre class ="r "> < code > out <- ldf(fl)
746
+ F <- with(out,F %*% diag(D))</ code > </ pre >
747
+ < p > This is what this rescaled F matrix looks like for the pancreas
748
+ data:</ p >
749
+ < pre class ="r "> < code > genes <- c("INS","IAPP","NPTX2","MAFA","MEG3","ADCYAP1","PFKFB2",
750
+ "DLK1","GCG","GC","TTR","TM4SF4","FAP","LOXL4","ALDH1A1",
751
+ "CRYBA2","SST","AQP3","PPY","LEPR","EGR1","RBP4","DPYSL3",
752
+ "AKAP12")
753
+ F <- with(fl_nmf_ldf,F %*% diag(D))
754
+ colnames(F) <- paste0("k",1:9)
755
+ annotation_heatmap(F,select_features = genes,font_size = 9)</ code > </ pre >
756
+ < p > < img src ="figure/pancreas_annotate.Rmd/annotation-plot-flashier-nmf-2-1.png " width ="390 " style ="display: block; margin: auto; " /> </ p >
757
+ < p > Visually, this plot looks quite similar to before, but now the effect
758
+ sizes are on a different scale. With this rescaling, the effect sizes
759
+ have the following interpretation:</ p >
760
+ < p > < span class ="math inline "> \(f_{jk}\)</ span > is (approximately) the
761
+ < em > log-fold change</ em > (LFC) of gene < span
762
+ class ="math inline "> \(j\)</ span > in a cell < span
763
+ class ="math inline "> \(i\)</ span > with the largest membership in factor
764
+ < span class ="math inline "> \(k\)</ span > (< span
765
+ class ="math inline "> \(l_{ik} =
766
+ 1\)</ span > ) relative to a cell < span
767
+ class ="math inline "> \(i'\)</ span > with no membership in factor < span
768
+ class ="math inline "> \(k\)</ span > (< span
769
+ class ="math inline "> \(l_{i'k} = 0\)</ span > ).</ p >
559
770
< br >
560
771
< p >
561
772
< button type ="button " class ="btn btn-default btn-workflowr btn-workflowr-sessioninfo " data-toggle ="collapse " data-target ="#workflowr-sessioninfo " style ="display: block; ">
@@ -583,39 +794,41 @@ <h2>Structure plot</h2>
583
794
# [1] stats graphics grDevices utils datasets methods base
584
795
#
585
796
# other attached packages:
586
- # [1] cowplot_1.1.3 ggplot2_3.5.0 fastTopics_0.7-20 flashier_1.0.55
797
+ # [1] cowplot_1.1.3 ggplot2_3.5.0 fastTopics_0.7-21 flashier_1.0.55
587
798
# [5] ebnm_1.1-34 Matrix_1.6-5
588
799
#
589
800
# loaded via a namespace (and not attached):
590
- # [1] tidyselect_1.2.1 viridisLite_0.4.2 dplyr_1.1.4
591
- # [4] fastmap_1.1.1 lazyeval_0.2.2 promises_1.2.1
592
- # [7] digest_0.6.34 lifecycle_1.0.4 invgamma_1.1
593
- # [10] magrittr_2.0.3 compiler_4.3.3 rlang_1.1.3
594
- # [13] sass_0.4.8 progress_1.2.3 tools_4.3.3
595
- # [16] utf8_1.2.4 yaml_2.3.8 data.table_1.15.2
596
- # [19] knitr_1.45 prettyunits_1.2.0 htmlwidgets_1.6.4
597
- # [22] scatterplot3d_0.3-44 plyr_1.8.9 RColorBrewer_1.1-3
598
- # [25] Rtsne_0.17 workflowr_1.7.1 withr_3.0.0
599
- # [28] purrr_1.0.2 grid_4.3.3 fansi_1.0.6
600
- # [31] git2r_0.33.0 colorspace_2.1-0 scales_1.3.0
601
- # [34] gtools_3.9.5 cli_3.6.2 rmarkdown_2.26
602
- # [37] crayon_1.5.2 generics_0.1.3 RcppParallel_5.1.7
603
- # [40] httr_1.4.7 reshape2_1.4.4 pbapply_1.7-2
604
- # [43] cachem_1.0.8 stringr_1.5.1 splines_4.3.3
605
- # [46] parallel_4.3.3 softImpute_1.4-1 vctrs_0.6.5
606
- # [49] jsonlite_1.8.8 hms_1.1.3 mixsqp_0.3-54
607
- # [52] ggrepel_0.9.5 irlba_2.3.5.1 horseshoe_0.2.0
608
- # [55] trust_0.1-8 plotly_4.10.4 jquerylib_0.1.4
609
- # [58] tidyr_1.3.1 glue_1.7.0 uwot_0.2.2.9000
610
- # [61] stringi_1.8.3 Polychrome_1.5.1 gtable_0.3.4
611
- # [64] later_1.3.2 quadprog_1.5-8 munsell_0.5.0
612
- # [67] tibble_3.2.1 pillar_1.9.0 htmltools_0.5.7
613
- # [70] truncnorm_1.0-9 R6_2.5.1 rprojroot_2.0.4
614
- # [73] evaluate_0.23 lattice_0.22-5 RhpcBLASctl_0.23-42
615
- # [76] SQUAREM_2021.1 ashr_2.2-66 httpuv_1.6.14
616
- # [79] bslib_0.6.1 Rcpp_1.0.12 deconvolveR_1.2-1
617
- # [82] whisker_0.4.1 xfun_0.42 fs_1.6.3
618
- # [85] pkgconfig_2.0.3</ code > </ pre >
801
+ # [1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.1
802
+ # [4] dplyr_1.1.4 fastmap_1.1.1 lazyeval_0.2.2
803
+ # [7] promises_1.2.1 digest_0.6.34 lifecycle_1.0.4
804
+ # [10] invgamma_1.1 magrittr_2.0.3 compiler_4.3.3
805
+ # [13] rlang_1.1.3 sass_0.4.8 progress_1.2.3
806
+ # [16] tools_4.3.3 utf8_1.2.4 yaml_2.3.8
807
+ # [19] data.table_1.15.2 knitr_1.45 labeling_0.4.3
808
+ # [22] prettyunits_1.2.0 htmlwidgets_1.6.4 scatterplot3d_0.3-44
809
+ # [25] plyr_1.8.9 RColorBrewer_1.1-3 Rtsne_0.17
810
+ # [28] workflowr_1.7.1 withr_3.0.0 purrr_1.0.2
811
+ # [31] grid_4.3.3 fansi_1.0.6 git2r_0.33.0
812
+ # [34] colorspace_2.1-0 scales_1.3.0 gtools_3.9.5
813
+ # [37] cli_3.6.2 rmarkdown_2.26 crayon_1.5.2
814
+ # [40] generics_0.1.3 RcppParallel_5.1.7 httr_1.4.7
815
+ # [43] reshape2_1.4.4 pbapply_1.7-2 cachem_1.0.8
816
+ # [46] stringr_1.5.1 splines_4.3.3 parallel_4.3.3
817
+ # [49] softImpute_1.4-1 vctrs_0.6.5 jsonlite_1.8.8
818
+ # [52] hms_1.1.3 mixsqp_0.3-54 ggrepel_0.9.5
819
+ # [55] irlba_2.3.5.1 horseshoe_0.2.0 trust_0.1-8
820
+ # [58] plotly_4.10.4 jquerylib_0.1.4 tidyr_1.3.1
821
+ # [61] glue_1.7.0 uwot_0.2.2.9000 stringi_1.8.3
822
+ # [64] Polychrome_1.5.1 gtable_0.3.4 later_1.3.2
823
+ # [67] quadprog_1.5-8 munsell_0.5.0 tibble_3.2.1
824
+ # [70] pillar_1.9.0 htmltools_0.5.7 truncnorm_1.0-9
825
+ # [73] R6_2.5.1 rprojroot_2.0.4 evaluate_0.23
826
+ # [76] lattice_0.22-5 highr_0.10 RhpcBLASctl_0.23-42
827
+ # [79] SQUAREM_2021.1 ashr_2.2-66 httpuv_1.6.14
828
+ # [82] bslib_0.6.1 Rcpp_1.0.12 deconvolveR_1.2-1
829
+ # [85] whisker_0.4.1 xfun_0.42 fs_1.6.3
830
+ # [88] pkgconfig_2.0.3</ code > </ pre >
831
+ </ div >
619
832
</ div >
620
833
</ div >
621
834
0 commit comments