@@ -432,15 +432,15 @@ <h4 class="author">Peter Carbonetto</h4>
432
432
< div class ="panel panel-default ">
433
433
< div class ="panel-heading ">
434
434
< p class ="panel-title ">
435
- < a data-toggle ="collapse " data-parent ="#workflowr-checks " href ="#strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree9440e05bc58ce8e42a1d61a79ffdfd18896e60cbtargetblank9440e05a ">
435
+ < a data-toggle ="collapse " data-parent ="#workflowr-checks " href ="#strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree6cf9aa720cc16dc543e93b36d57bc1a85555860dtargetblank6cf9aa7a ">
436
436
< span class ="glyphicon glyphicon-ok text-success "
437
437
aria-hidden ="true "> </ span > < strong > Repository version:</ strong >
438
- < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/9440e05bc58ce8e42a1d61a79ffdfd18896e60cb " target ="_blank "> 9440e05 </ a >
438
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/6cf9aa720cc16dc543e93b36d57bc1a85555860d " target ="_blank "> 6cf9aa7 </ a >
439
439
</ a >
440
440
</ p >
441
441
</ div >
442
442
< div
443
- id ="strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree9440e05bc58ce8e42a1d61a79ffdfd18896e60cbtargetblank9440e05a "
443
+ id ="strongRepositoryversionstrongahrefhttpsgithubcomstephenslabsinglecelljamboreetree6cf9aa720cc16dc543e93b36d57bc1a85555860dtargetblank6cf9aa7a "
444
444
class ="panel-collapse collapse ">
445
445
< div class ="panel-body ">
446
446
< p >
@@ -450,7 +450,7 @@ <h4 class="author">Peter Carbonetto</h4>
450
450
</ p >
451
451
< p >
452
452
The results in this page were generated with repository version
453
- < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/9440e05bc58ce8e42a1d61a79ffdfd18896e60cb " target ="_blank "> 9440e05 </ a > .
453
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/tree/6cf9aa720cc16dc543e93b36d57bc1a85555860d " target ="_blank "> 6cf9aa7 </ a > .
454
454
See the < em > Past versions</ em > tab to see a history of the changes made
455
455
to the R Markdown and HTML files.
456
456
</ p >
@@ -464,12 +464,19 @@ <h4 class="author">Peter Carbonetto</h4>
464
464
were generated:
465
465
</ p >
466
466
< pre > < code >
467
+ Ignored files:
468
+ Ignored: analysis/figure/
469
+
467
470
Untracked files:
471
+ Untracked: analysis/temp.R
468
472
Untracked: data/GSE132188_adata.h5ad.h5
469
473
Untracked: data/Immune_ALL_human.h5ad
470
474
Untracked: data/pancreas_endocrine.RData
471
475
Untracked: data/pancreas_endocrine_alldays.h5ad
472
476
477
+ Unstaged changes:
478
+ Modified: code/annotation_plots.R
479
+
473
480
</ code > </ pre >
474
481
< p >
475
482
Note that any generated files, e.g. HTML, png, CSS, etc., are not
@@ -519,6 +526,57 @@ <h4 class="author">Peter Carbonetto</h4>
519
526
Rmd
520
527
</ td >
521
528
< td >
529
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/6cf9aa720cc16dc543e93b36d57bc1a85555860d/analysis/pancreas_annotate.Rmd " target ="_blank "> 6cf9aa7</ a >
530
+ </ td >
531
+ < td >
532
+ Peter Carbonetto
533
+ </ td >
534
+ < td >
535
+ 2025-02-20
536
+ </ td >
537
+ < td >
538
+ wflow_publish("pancreas_annotate.Rmd", verbose = TRUE, view = FALSE)
539
+ </ td >
540
+ </ tr >
541
+ < tr >
542
+ < td >
543
+ Rmd
544
+ </ td >
545
+ < td >
546
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/105293f8f21a4fa50160bc5f09a98a67b30c9dd7/analysis/pancreas_annotate.Rmd " target ="_blank "> 105293f</ a >
547
+ </ td >
548
+ < td >
549
+ Peter Carbonetto
550
+ </ td >
551
+ < td >
552
+ 2025-02-20
553
+ </ td >
554
+ < td >
555
+ Removed newsgroups_annotate.R.
556
+ </ td >
557
+ </ tr >
558
+ < tr >
559
+ < td >
560
+ html
561
+ </ td >
562
+ < td >
563
+ < a href ="https://rawcdn.githack.com/stephenslab/single-cell-jamboree/66b44ce7013ab2f76c1f3627d66e58b09b53d603/docs/pancreas_annotate.html " target ="_blank "> 66b44ce</ a >
564
+ </ td >
565
+ < td >
566
+ Peter Carbonetto
567
+ </ td >
568
+ < td >
569
+ 2025-02-20
570
+ </ td >
571
+ < td >
572
+ Added plots for semi-NMF to pancreas_annotate analysis.
573
+ </ td >
574
+ </ tr >
575
+ < tr >
576
+ < td >
577
+ Rmd
578
+ </ td >
579
+ < td >
522
580
< a href ="https://github.com/stephenslab/single-cell-jamboree/blob/9440e05bc58ce8e42a1d61a79ffdfd18896e60cb/analysis/pancreas_annotate.Rmd " target ="_blank "> 9440e05</ a >
523
581
</ td >
524
582
< td >
@@ -734,13 +792,15 @@ <h4 class="author">Peter Carbonetto</h4>
734
792
single “one-size-fits-all” strategy that works best, and so we recommend
735
793
exploring different annotation strategies. Also, careful interpretation
736
794
of the matrix factorization is discussed.</ p >
737
- < p > The plotting functions used in this analysis are from < a
738
- href ="https://github.com/stephenslab/fastTopics/ "> fastTopics</ a > .</ p >
795
+ < p > A side benefit of this investigation is to illustate some useful
796
+ plotting strategies, including the < code > annotation_heatmap()</ code >
797
+ function from the [fastTopics][fastTopics package].</ p >
739
798
< p > First, load the packages needed for this analysis.</ p >
740
799
< pre class ="r "> < code > library(Matrix)
741
800
library(flashier)
742
801
library(fastTopics)
743
802
library(ggplot2)
803
+ library(ggrepel)
744
804
library(cowplot)</ code > </ pre >
745
805
< p > Set the seed for reproducibility.</ p >
746
806
< pre class ="r "> < code > set.seed(1)</ code > </ pre >
@@ -898,11 +958,15 @@ <h2>Annotating the factors by “driving genes”</h2>
898
958
< p > Strategy (i) picks out some canonical marker genes for islet cells
899
959
such as < em > INS</ em > for beta cells and < em > GCG</ em > for alpha cells.
900
960
But it also picks out other genes that are highly expressed in multiple
901
- islet cell types, such as < em > TTR </ em > and < em > CHGB </ em > . Strategy (ii)
961
+ islet cell types, such as < em > SCGN </ em > and < em > TTR </ em > . Strategy (ii)
902
962
focusses more strongly on genes that distinguish one cell type from
903
963
another, and as a result marker genes such as < em > MAFA</ em > (beta cells)
904
964
and < em > GC</ em > (alpha cells) are ranked more highly with this
905
965
strategy.</ p >
966
+ < p > Below, we take a closer look at the ranking of the genes based on
967
+ these two strategies, and suggest another simple visualization which
968
+ could be useful. (See: “A closer look at ranking genes by largest versus
969
+ distinctive”.)</ p >
906
970
< p > The better strategy will depend on the setting and on the goals of
907
971
the analysis, which is why the < code > annotation_heatmap</ code > function
908
972
provides both options. These selection strategies can also reveal
@@ -1000,6 +1064,43 @@ <h2>Annotating the semi-NMF</h2>
1000
1064
perplexity = 70,n = Inf) +
1001
1065
labs(y = "membership",fill = "factor",color = "factor")</ code > </ pre >
1002
1066
< p > < img src ="figure/pancreas_annotate.Rmd/structure-plot-flashier-snmf-1.png " width ="960 " style ="display: block; margin: auto; " /> </ p >
1067
+ < p >
1068
+ < button type ="button " class ="btn btn-default btn-xs btn-workflowr btn-workflowr-fig " data-toggle ="collapse " data-target ="#fig-structure-plot-flashier-snmf-1 ">
1069
+ Past versions of structure-plot-flashier-snmf-1.png
1070
+ </ button >
1071
+ </ p >
1072
+ < div id ="fig-structure-plot-flashier-snmf-1 " class ="collapse ">
1073
+ < div class ="table-responsive ">
1074
+ < table class ="table table-condensed table-hover ">
1075
+ < thead >
1076
+ < tr >
1077
+ < th >
1078
+ Version
1079
+ </ th >
1080
+ < th >
1081
+ Author
1082
+ </ th >
1083
+ < th >
1084
+ Date
1085
+ </ th >
1086
+ </ tr >
1087
+ </ thead >
1088
+ < tbody >
1089
+ < tr >
1090
+ < td >
1091
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/66b44ce7013ab2f76c1f3627d66e58b09b53d603/docs/figure/pancreas_annotate.Rmd/structure-plot-flashier-snmf-1.png " target ="_blank "> 66b44ce</ a >
1092
+ </ td >
1093
+ < td >
1094
+ Peter Carbonetto
1095
+ </ td >
1096
+ < td >
1097
+ 2025-02-20
1098
+ </ td >
1099
+ </ tr >
1100
+ </ tbody >
1101
+ </ table >
1102
+ </ div >
1103
+ </ div >
1003
1104
< p > As a result, we would expect that the factors themselves would tend
1004
1105
to pick more “distinctive” features; for example, factor 8 capturing
1005
1106
expression specific to dselta, gamma and epsilon cells doesn’t need to
@@ -1026,6 +1127,43 @@ <h2>Annotating the semi-NMF</h2>
1026
1127
theme(plot.title = element_text(face = "plain",size = 9))
1027
1128
plot_grid(p1,p2,nrow = 1,ncol = 2)</ code > </ pre >
1028
1129
< p > < img src ="figure/pancreas_annotate.Rmd/annotation-plot-flashier-snmf-1.png " width ="720 " style ="display: block; margin: auto; " /> </ p >
1130
+ < p >
1131
+ < button type ="button " class ="btn btn-default btn-xs btn-workflowr btn-workflowr-fig " data-toggle ="collapse " data-target ="#fig-annotation-plot-flashier-snmf-1 ">
1132
+ Past versions of annotation-plot-flashier-snmf-1.png
1133
+ </ button >
1134
+ </ p >
1135
+ < div id ="fig-annotation-plot-flashier-snmf-1 " class ="collapse ">
1136
+ < div class ="table-responsive ">
1137
+ < table class ="table table-condensed table-hover ">
1138
+ < thead >
1139
+ < tr >
1140
+ < th >
1141
+ Version
1142
+ </ th >
1143
+ < th >
1144
+ Author
1145
+ </ th >
1146
+ < th >
1147
+ Date
1148
+ </ th >
1149
+ </ tr >
1150
+ </ thead >
1151
+ < tbody >
1152
+ < tr >
1153
+ < td >
1154
+ < a href ="https://github.com/stephenslab/single-cell-jamboree/blob/66b44ce7013ab2f76c1f3627d66e58b09b53d603/docs/figure/pancreas_annotate.Rmd/annotation-plot-flashier-snmf-1.png " target ="_blank "> 66b44ce</ a >
1155
+ </ td >
1156
+ < td >
1157
+ Peter Carbonetto
1158
+ </ td >
1159
+ < td >
1160
+ 2025-02-20
1161
+ </ td >
1162
+ </ tr >
1163
+ </ tbody >
1164
+ </ table >
1165
+ </ div >
1166
+ </ div >
1029
1167
< pre > < code > # Features selected for plot: GCG TTR TM4SF4 GC CHGB PCSK2 MALAT1 IGFBP7 INS IAPP HADH NPTX2 MAFA RBP4 PCSK1 SCD5 SST AQP3 PPY LEPR DPYSL3 AKAP12
1030
1168
c("GCG", "TTR", "TM4SF4", "GC", "CHGB", "PCSK2", "MALAT1", "IGFBP7",
1031
1169
"INS", "IAPP", "HADH", "NPTX2", "MAFA", "RBP4", "PCSK1", "SCD5",
@@ -1035,6 +1173,68 @@ <h2>Annotating the semi-NMF</h2>
1035
1173
"INS", "IAPP", "MAFA", "NPTX2", "ADCYAP1", "PFKFB2", "MEG3",
1036
1174
"DLK1", "SST", "AQP3", "LEPR", "AKAP12", "MTUS1", "EGR1", "PPY",
1037
1175
"S100A6")</ code > </ pre >
1176
+ < p > Note that the F matrix in the semi-NMF allows for both positive and
1177
+ negative log-fold changes.</ p >
1178
+ </ div >
1179
+ < div id ="a-closer-look-at-ranking-genes-by-largest-vs.-distinctive "
1180
+ class ="section level2 ">
1181
+ < h2 > A closer look at ranking genes by largest vs. distinctive</ h2 >
1182
+ < p > Above, we compared gene selection strategies for some annotation
1183
+ heatmaps of NMF results. Here we visualize how these two different
1184
+ strategies result in two different gene rankings. And this visualization
1185
+ may be useful on its own to annotate the factors.</ p >
1186
+ < p > First we define a couple of functions used to create some plots.</ p >
1187
+ < p > This function computes the “least extreme” (l.e.) effect differences
1188
+ for a non-negative effects matrix:</ p >
1189
+ < pre class ="r "> < code > compute_le_diff <- function (effects_matrix,
1190
+ compare_dims = seq(1,ncol(effects_matrix))) {
1191
+ m <- ncol(effects_matrix)
1192
+ out <- effects_matrix
1193
+ for (i in 1:m) {
1194
+ dims <- setdiff(compare_dims,i)
1195
+ out[,i] <- effects_matrix[,i] - apply(effects_matrix[,dims],1,max)
1196
+ }
1197
+ return(out)
1198
+ }</ code > </ pre >
1199
+ < p > This function will be used to create the scatterplots:</ p >
1200
+ < pre class ="r "> < code > distinctive_genes_scatterplot <- function (effects_matrix, k,
1201
+ effect_quantile_prob = 0.999,
1202
+ lediff_quantile_prob = 0.999) {
1203
+ lediff <- compute_le_diff(effects_matrix)
1204
+ genes <- rownames(effects_matrix)
1205
+ pdat <- data.frame(gene = genes,
1206
+ effect = effects_matrix[,k],
1207
+ lediff = lediff[,k])
1208
+ effect_quantile <- quantile(pdat$effect,effect_quantile_prob)
1209
+ lediff_quantile <- quantile(pdat$lediff,lediff_quantile_prob)
1210
+ i <- which(pdat$effect < effect_quantile & pdat$lediff < lediff_quantile)
1211
+ pdat[i,"gene"] <- NA
1212
+ return(ggplot(pdat,aes(x = effect,y = lediff,label = gene)) +
1213
+ geom_point(color = "dodgerblue") +
1214
+ geom_hline(yintercept = 0,color = "magenta",linetype = "dotted",
1215
+ linewidth = 0.5) +
1216
+ geom_text_repel(color = "black",size = 2,
1217
+ fontface = "italic",segment.color = "black",
1218
+ segment.size = 0.25,min.segment.length = 0,
1219
+ max.overlaps = Inf,na.rm = TRUE) +
1220
+ labs(x = "log-fold change",y = "l.e. difference") +
1221
+ theme_cowplot(font_size = 9))
1222
+ }</ code > </ pre >
1223
+ < p > Now we compare the two different gene rankings in the scatterplots
1224
+ for factors 4, 5 and 6 of the flashier NMF result:</ p >
1225
+ < pre class ="r "> < code > F <- fl_nmf_ldf$F
1226
+ colnames(F) <- paste0("k",1:9)
1227
+ kset <- paste0("k",4:6)
1228
+ p1 <- distinctive_genes_scatterplot(F[,kset],"k4") + ggtitle("factor k4")
1229
+ p2 <- distinctive_genes_scatterplot(F[,kset],"k5") + ggtitle("factor k5")
1230
+ p3 <- distinctive_genes_scatterplot(F[,kset],"k6") + ggtitle("factor k6")
1231
+ print(plot_grid(p1,p2,p3,nrow = 1,ncol = 3))</ code > </ pre >
1232
+ < p > < img src ="figure/pancreas_annotate.Rmd/distinctive-gene-scatterplots-flashier-nmf-1.png " width ="900 " style ="display: block; margin: auto; " /> </ p >
1233
+ < p > It is clear from these scatterplots that the rankings are very
1234
+ different, and strikingly so for factor 5 representing alpha cells. This
1235
+ means that many of the top-ranked genes for factor 5 (largest increases
1236
+ in expression) also show very large increases in other islet cells,
1237
+ e.g., < em > SCG5</ em > .</ p >
1038
1238
< br >
1039
1239
< p >
1040
1240
< button type ="button " class ="btn btn-default btn-workflowr btn-workflowr-sessioninfo " data-toggle ="collapse " data-target ="#workflowr-sessioninfo " style ="display: block; ">
@@ -1062,8 +1262,8 @@ <h2>Annotating the semi-NMF</h2>
1062
1262
# [1] stats graphics grDevices utils datasets methods base
1063
1263
#
1064
1264
# other attached packages:
1065
- # [1] cowplot_1.1.3 ggplot2_3.5.0 fastTopics_0.7-24 flashier_1.0.55
1066
- # [5] ebnm_1.1-34 Matrix_1.6-5
1265
+ # [1] cowplot_1.1.3 ggrepel_0.9.5 ggplot2_3.5.0 fastTopics_0.7-24
1266
+ # [5] flashier_1.0.55 ebnm_1.1-34 Matrix_1.6-5
1067
1267
#
1068
1268
# loaded via a namespace (and not attached):
1069
1269
# [1] tidyselect_1.2.1 viridisLite_0.4.2 farver_2.1.1
@@ -1083,19 +1283,18 @@ <h2>Annotating the semi-NMF</h2>
1083
1283
# [43] reshape2_1.4.4 pbapply_1.7-2 cachem_1.0.8
1084
1284
# [46] stringr_1.5.1 splines_4.3.3 parallel_4.3.3
1085
1285
# [49] softImpute_1.4-1 vctrs_0.6.5 jsonlite_1.8.8
1086
- # [52] hms_1.1.3 mixsqp_0.3-54 ggrepel_0.9.5
1087
- # [55] irlba_2.3.5.1 horseshoe_0.2.0 trust_0.1-8
1088
- # [58] plotly_4.10.4 jquerylib_0.1.4 tidyr_1.3.1
1089
- # [61] glue_1.7.0 uwot_0.2.2.9000 stringi_1.8.3
1090
- # [64] Polychrome_1.5.1 gtable_0.3.4 later_1.3.2
1091
- # [67] quadprog_1.5-8 munsell_0.5.0 tibble_3.2.1
1092
- # [70] pillar_1.9.0 htmltools_0.5.7 truncnorm_1.0-9
1093
- # [73] R6_2.5.1 rprojroot_2.0.4 evaluate_0.23
1094
- # [76] lattice_0.22-5 highr_0.10 RhpcBLASctl_0.23-42
1095
- # [79] SQUAREM_2021.1 ashr_2.2-66 httpuv_1.6.14
1096
- # [82] bslib_0.6.1 Rcpp_1.0.12 deconvolveR_1.2-1
1097
- # [85] whisker_0.4.1 xfun_0.42 fs_1.6.3
1098
- # [88] pkgconfig_2.0.3</ code > </ pre >
1286
+ # [52] hms_1.1.3 mixsqp_0.3-54 irlba_2.3.5.1
1287
+ # [55] horseshoe_0.2.0 trust_0.1-8 plotly_4.10.4
1288
+ # [58] jquerylib_0.1.4 tidyr_1.3.1 glue_1.7.0
1289
+ # [61] uwot_0.2.2.9000 stringi_1.8.3 Polychrome_1.5.1
1290
+ # [64] gtable_0.3.4 later_1.3.2 quadprog_1.5-8
1291
+ # [67] munsell_0.5.0 tibble_3.2.1 pillar_1.9.0
1292
+ # [70] htmltools_0.5.7 truncnorm_1.0-9 R6_2.5.1
1293
+ # [73] rprojroot_2.0.4 evaluate_0.23 lattice_0.22-5
1294
+ # [76] highr_0.10 RhpcBLASctl_0.23-42 SQUAREM_2021.1
1295
+ # [79] ashr_2.2-66 httpuv_1.6.14 bslib_0.6.1
1296
+ # [82] Rcpp_1.0.12 deconvolveR_1.2-1 whisker_0.4.1
1297
+ # [85] xfun_0.42 fs_1.6.3 pkgconfig_2.0.3</ code > </ pre >
1099
1298
</ div >
1100
1299
</ div >
1101
1300
0 commit comments