Merge branch 'master' into devel

js2264 · Nov 29, 2023 · f420d32 · f420d32
2 parents 3b8043f + 12fa956
commit f420d32
Show file tree

Hide file tree

Showing 6 changed files with 1,148 additions and 0 deletions.
diff --git a/60_network_learning.Rmd b/60_network_learning.Rmd
diff --git a/61_network_comparison.Rmd b/61_network_comparison.Rmd
@@ -0,0 +1,335 @@
+# Network comparison {#network-comparison}
+
+```{r setup, echo=FALSE, results="asis"}
+library(rebook)
+chapterPreamble()
+```
+
+```{r, echo=FALSE}
+# The networks shown in this chapter take several minutes to generate. 
+# Therefore, the network objects have been stored in a folder and are loaded 
+# here to reduce the time needed to knit the book.
+load("general/network_data/networks_diet.RData")
+```
+
+This chapter assumes that you are already familiar with how to construct and analyze a single microbial network, which is explained in Chapter \@ref(network-learning).
+
+Since microbial interactions are likely to change between conditions, such as between patients and healthy individuals or between different environmental states, identifying network differences between groups is often an integral secondary analysis step. Differences can be detected visually by plotting the networks side by side, or quantitatively using differential network analysis tools.
+
+Two approaches for comparing networks between two conditions are considered in this chapter: 
+
+- **Differential network analysis**, which analyzes differences in network metrics and network structure. 
+- **Differential association analysis**, which focuses on differences in the strength of individual associations.
+
+See Section \@ref(netcomp-methods) for further details on these two approaches and the methods used in this chapter.
+
+Here we use [NetCoMi](https://github.com/stefpeschel/NetCoMi) [@peschel2021netcomi] for network comparison, which includes several **differential network analysis approaches** as well as functionality for *differential association analysis**, i.e., generating a differential network. How to install NetCoMi from GitHub is explained in chapter \@ref(network-learning). 
+
+The **PeerJ data set** [@potbhare2022skin] containing skin microbial profiles for 58 subjects is again used in this chapter. The dataset also includes information on the subjects' geographic location, gender, age and diet. Whether the **skin microbiome differs** between people with **different diets** is an interesting question that will be explored in this chapter.
+
+## Data preparation
+
+We perform the same data preprocessing steps as in chapter \@ref(network-learning).
+
+```{r load_packages, message=FALSE, warning=FALSE}
+library(NetCoMi)
+library(mia)
+```
+
+```{r load_data}
+data("peerj13075", package = "mia")
+tse0 <- peerj13075
+```
+
+```{r preprocessing}
+# Agglomerate to genus level
+tse <- agglomerateByRank(tse0, rank = "genus") 
+
+# Add relative abundances
+tse <- transformAssay(tse, 
+                      assay.type = "counts", 
+                      method = "relabundance",
+                      MARGIN = "samples") 
+
+# Filter by prevalence
+tse <- subsetByPrevalentFeatures(tse,
+                                 prevalence = 0.2,
+                                 detection = 0,
+                                 assay.type = "relabundance")
+
+# Add log10-transformed abundances
+tse <- transformAssay(tse, method = "log10", pseudocount = 1)
+
+# Add clr-transformed abundances
+tse <- transformAssay(tse, method = "clr", pseudocount = 1)
+```
+
+Based on "Diet", the `tse` object is then split into two groups: One with mixed diet subjects, and one with vegetarian subjects. Both subsets have nearly the same sample size and are therefore comparable.
+
+```{r split_data}
+table(tse$Diet)
+tse_list <- splitOn(tse, f = "Diet", use_names = TRUE, MARGIN = 2)
+```
+
+## Network learning and analysis
+
+The approach starts again with network construction and analysis, but this time we pass the two data sets to `netConstruct` to perform a network comparison.
+
+The `rep.num` argument is set to 10 to perform only 10 repetitions in the model selection approach. This speeds up the permutation tests performed later, and has a negligible effect for this data set.
+
+```{r netConstruct_diet, eval=FALSE}
+spring_net_diet <- netConstruct(data = tse_list$Mixed,
+                                data2 = tse_list$Veg,
+                                taxRank = "genus",
+                                filtTax = "highestFreq",
+                                filtTaxPar = list(highestFreq  = 100),
+                                measure = "spring",
+                                measurePar = list(nlambda = 20, 
+                                                  rep.num = 10,
+                                                  thresh = 0.05,
+                                                  Rmethod = "approx"),
+                                sparsMethod = "none", 
+                                dissFunc = "signed",
+                                verbose = 3,
+                                seed = 13075)
+```
+
+
+All network measures are now computed for both networks. Also, both GCMs are plotted together with a third matrix containing the differences between the GCMs and significance codes that express if the differences are significantly different from zero.
+
+```{r netAnalyze_diet, fig.width=10, fig.height=10}
+spring_netprops_diet <- netAnalyze(spring_net_diet, 
+                                   clustMethod = "cluster_fast_greedy",
+                                   hubPar = "eigenvector",
+                                   normDeg = FALSE)
+```
+
+In both of the networks, some graphlet correlations are significantly different from zero. However, none of the correlations are significantly different between the groups.
+
+```{r summary_diet}
+summary(spring_netprops_diet, groupNames = c("Mixed diet", "Vegetarian"))
+```
+
+For each centrality measure, the five nodes with the highest centrality in each group are plotted by default.
+
+We notice some differences in the network properties. The differential network analysis performed in the next section will show if the differences are significant.
+
+## Differential network analysis
+
+### Visual comparison
+
+We start with a visual comparison of the two networks using NetCoMi's plot function. The same configuration as in chapter \@ref(network-learning) is used.
+
+```{r network_plot_diet_difflay, fig.width=17, fig.height=8}
+plot(spring_netprops_diet,
+     repulsion = 0.97,
+     rmSingles = TRUE,
+     labelScale = FALSE,
+     nodeSize = "eigenvector",
+     nodeSizeSpread = 2,
+     nodeColor = "cluster", 
+     sameColThresh = 2,
+     hubBorderCol = "darkgray",
+     cexNodes = 2,
+     edgeTranspHigh = 20,
+     title1 = "Mixed diet", 
+     title2 = "Vegetarian",
+     showTitle = TRUE,
+     cexTitle = 2,
+     mar = c(1, 4, 4, 4))
+
+# Overlay a transparent plot on which the legend is plotted
+par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), mar=c(0, 0, 0, 0), new=TRUE)
+plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
+
+legend(-0.2, -0.9, cex = 1.5, title = "estimated correlation:",
+       legend = c("+","-"), lty = 1, lwd = 3, col = c("#009900","red"), 
+       bty = "n", horiz = TRUE)
+```
+
+The layout is computed separately for each network, making it difficult to visually compare certain associations. It is therefore recommended to use the same layout for both groups (argument `sameLayout`). Instead of simply copying one layout to the other network, we set `layoutGroup` to "union". This ensures that the nodes are placed as optimally as possible for both networks.
+
+```{r network_plot_diet_samelay, fig.width=17, fig.height=8}
+plot(spring_netprops_diet,
+     sameLayout = TRUE,
+     repulsion = 0.95,
+     rmSingles = "inboth",
+     labelScale = FALSE,
+     nodeSize = "eigenvector",
+     nodeSizeSpread = 2,
+     nodeColor = "cluster", 
+     sameColThresh = 2,
+     hubBorderCol = "darkgray",
+     cexNodes = 2,
+     edgeTranspHigh = 20,
+     title1 = "Mixed diet", 
+     title2 = "Vegetarian",
+     showTitle = TRUE,
+     cexTitle = 2,
+     mar = c(1, 4, 4, 4))
+
+# Add legend
+par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), mar=c(0, 0, 0, 0), new=TRUE)
+plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
+legend(-0.2, -0.8, cex = 1.7, title = "estimated correlation:",
+       legend = c("+","-"), lty = 1, lwd = 3, col = c("#009900","red"), 
+       bty = "n", horiz = TRUE)
+```
+
+**A few notes:**
+
+-   Differences in the edge weights can now be seen at first glance. For example, Serratia and Citrobacter are strongly associated in the mixed diet group, but not at all in the vegetarian group.
+-   Clusters must share at least two nodes (`sameColThresh` argument) to be colored equally in both networks, which is why the color of some clusters differs between the groups.
+-   The clustering generally differs markedly. In particular, the cluster assignment of many of the nodes in the largest connected component differs between the two groups.
+
+As in Chapter \@ref(network-learning), we also generate a network plot using phylum names to color the nodes and mclr-transformed abundances to scale node sizes.
+
+```{r, message=FALSE, warning=FALSE}
+library(RColorBrewer)
+```
+
+```{r network_plot_diet_phyla, fig.width=17, fig.height=8}
+# Generate vector with phylum names for node coloring
+phyla <- as.factor(rowData(tse)$phylum)
+names(phyla) <- rowData(tse)$genus
+
+# Create color vector
+colvec <- RColorBrewer::brewer.pal(length(levels(phyla)), "Set3")
+
+p_diet <- plot(spring_netprops_diet,
+               sameLayout = TRUE,
+               repulsion = 0.95,
+               rmSingles = "inboth",
+               labelScale = FALSE,
+               nodeSize = "clr",
+               nodeColor = "feature", 
+               featVecCol = phyla, 
+               colorVec =  colvec,
+               nodeTransp = 20,
+               sameColThresh = 2,
+               highlightHubs = FALSE,
+               cexNodes = 2,
+               edgeTranspHigh = 20,
+               title1 = "Mixed diet", 
+               title2 = "Vegetarian",
+               showTitle = TRUE,
+               cexTitle = 2,
+               mar = c(1, 4, 4, 4))
+
+# Add legends
+# Colors used in the legend should be equally transparent as in the plot
+col_transp <- colToTransp(colvec, 20)
+
+par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), mar=c(0, 0, 0, 0), new=TRUE)
+plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
+legend(-0.15, -0.8, cex = 1.7, title = "estimated correlation:",
+       legend = c("+","-"), lty = 1, lwd = 3, col = c("#009900","red"), 
+       bty = "n", horiz = TRUE)
+legend(-0.15, 1.3, cex = 1.7, pt.cex = 2.5, title = "Phylum:", 
+       legend=levels(phyla), col = col_transp, bty = "n", pch = 16) 
+```
+
+### Quantitative comparison
+
+`netCompare()` enables a quantitative network comparison using comparative measures such as Jaccard's Index, Adjusted Rand Index, and permutation tests.
+
+To test for statistical significance of differences in network properties, we perform permutation tests with 1000 permutations. Multiple CPU cores are used to save run time. The association matrices estimated for all permutations are stored in an external file. We will reuse them later when performing differential association analysis. They could also be used to rerun `netCompare()` with different parameter settings.
+
+Note that unless running on a cluster with considerably more CPU cores, a network comparison with permutation tests may take several hours. You should test the code below with a small number of permutations to make sure it works before applying it to your data.
+
+```{r netcomp_diet_permute, eval=FALSE}
+spring_netcomp_diet <- netCompare(spring_netprops_diet, 
+                                  permTest = TRUE,
+                                  nPerm = 1000,
+                                  cores = 6,
+                                  seed = 13075,
+                                  storeAssoPerm = TRUE,
+                                  fileStoreAssoPerm = "general/network_data/spring_assoPerm",
+                                  verbose = TRUE)
+```
+
+```{r summary_netcomp_diet}
+summary(spring_netcomp_diet, 
+        groupNames = c("Mixed diet", "Vegetarian"),
+        numbNodes = 5)
+```
+
+**Interpreting some results:**
+
+-   Almost all global network properties are significantly different between the groups (for $\alpha=0.1$), thus reflecting the different overall network structure we already have seen in the network plots.
+-   For the Jaccard index of degree, closeness, and eigenvector centrality, the probability P(\>=Jacc) is significant, meaning that the sets of the most central nodes are quite similar for these three measures. The Jaccard index for the hub nodes, on the other hand, is low because the two networks share only one hub node ("Erwinia").
+-   As indicated by some similarities in the clusterings, the adjusted Rand index (ARI) of the whole network is significantly different from zero and thus from random clustering. The ARI of the largest connected component (LCC), however, is close to zero due to the different clusterings in the LCC.
+-   The two GCD values are significantly different from zero, indicating substantial differences in the overall network structures.
+-   All nodes are also tested for having significantly different centrality (only the five nodes with the highest absolute difference are shown in the summary). For $\alpha=0.05$, some nodes have a significantly different closeness centrality, and for $\alpha=0.1$ also a significantly different eigenvector centrality. Most of these nodes have a high centrality in the one group, but are not connected in the other group.
+
+## Differential association analysis
+
+The `diffnet()` function provides statistical tests to assess whether the associations themselves are significantly different between the two groups. `NetCoMi` also provides a plot function to generate a differential network, where two nodes are connected if they are differentially associated between the groups.
+
+Since we have already computed the permutation association matrices before, we can reuse them here (argument `fileLoadAssoPerm`).
+
+The local false discovery rate is controlled at level 0.2 to account for multiplicity.
+
+```{r diffnet_diet, eval=FALSE, message=FALSE, results='hide'}
+spring_diffnet <- diffnet(spring_net_diet,
+                          diffMethod = "perm",
+                          fileLoadAssoPerm = "general/network_data/spring_assoPerm",
+                          adjust = "lfdr")
+```
+
+```{r hist_pvals, out.width='80%'}
+sum(spring_diffnet$pAdjustVec < 0.05)
+sum(spring_diffnet$pvalsVec < 0.05)
+```
+
+Some of the unadjusted p-values are below the usual 5% significance level. However, none of the differences remain significant after adjusting for multiple testing so that the differential network would be empty.
+
+To demonstrate the interpretation of a differential network, we set `adjust` to "none", which is actually statistically incorrect.
+
+```{r diffnet_diet_unadjusted, message=FALSE, results='hide'}
+spring_diffnet_unadj <- diffnet(spring_net_diet,
+                                pvalsVec = spring_diffnet$pvalsVec,
+                                diffMethod = "perm",
+                                alpha = 0.05,
+                                adjust = "none")
+```
+
+The `diffnet` object it now plotted using NetCoMi's plot function. 
+
+```{r diffnet_plot, fig.width=20, fig.height=14}
+plot(spring_diffnet_unadj, 
+     cexLabels = 2,
+     cexNodes = 0.7, 
+     cexLegend = 2.5,
+     cexTitle = 3,
+     mar = c(3,2,5,15),
+     legendGroupnames = c("Mixed diet", "Vegetarian"),
+     legendPos = c(1.2,1.5),
+     legendArgs = list(lwd = 4),
+     fade = FALSE)
+```
+
+Edge colors represent the direction of the associations in the two groups. For example, if two OTUs are positively correlated in the mixed diet group and uncorrelated in the vegetarian group (such as Serratia and Citrobacter), the edge color is dark green.
+
+```{r, eval=FALSE, echo=FALSE}
+save(spring_diffnet, spring_net_diet, spring_netcomp_diet, 
+     file = "general/network_data/networks_diet.RData")
+```
+
+
+## Network comparison methods {#netcomp-methods}
+
+While many approaches exist for the detection of **differential correlations**, e.g. [@yu2019new; @mckenzie2016dgca; @siska2017differential], the literature on the more general case of **differential association** detection is scarce. @bhuva2019differential compare various methods in a simulation study, which again includes many differential correlation approaches, but also more general methods such as latent differential graphical models. @gill2010statistical introduce an approach to analyze whether the connectivity of individual nodes is different between two groups using permutation tests, which is applicable to any kind of association. @he2019statistical propose a test to infer the differential network structure for two conditional dependence networks.
+
+Performing **differential network analysis** is challenging because network measures do not follow classical statistical distributions. @shojaie2021differential provide an overview of differential network analysis methods, but focus only on changes in edge sets. @lichtblau2017comparative compare differential network analysis methods that incorporate multiple local and global network measures. @jardim2019bionetstat present a tool "BioNetStat" for differential analysis of biological networks, which is able to compare certain network measures between groups. 
+
+The [NetCoMi](https://github.com/stefpeschel/NetCoMi) package used for network comparison in this chapter includes the following **differential network analysis approaches:**:
+
+- **Permutation approach** to test global network measures (e.g., transitivity, connectivity, or average path length) as well as centrality measures for group differences.
+- **Jaccard** index to assess the similarity between sets of most central nodes
+- **Adjusted Rand index** to assess the similarity between clusterings
+- **Graphlet Correlation Distance** (GCM)
+
+See [@peschel2021netcomi] for an explanation of the first three approaches. The GCM was proposed by @yaverouglu2014revealing.
+
+Two methods (Fisher's z-test [@fisher1970statistical] and the Discordant method [@siska2016discordant]) are available for identifying differential correlations, and **permutation tests** for the more general case of identifying **differential associations**. See [@peschel2021netcomi] for details. NetCoMi offers also a function for plotting a differential network. 
diff --git a/general/network_data/networks.RData b/general/network_data/networks.RData
diff --git a/general/network_data/networks_diet.RData b/general/network_data/networks_diet.RData
diff --git a/general/network_data/spring_networks.RData b/general/network_data/spring_networks.RData
diff --git a/inst/pages/images/FigureNetworkLearning.png b/inst/pages/images/FigureNetworkLearning.png