forked from microbiome/OMA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13-microbiome-exploration.Rmd
166 lines (131 loc) · 5.75 KB
/
13-microbiome-exploration.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# Microbiome Exploration {#microbiome-exploration}
```{r setup, echo=FALSE, results="asis"}
library(rebook)
chapterPreamble()
```
This chapter focuses on the exploration of microbiome data and establish
commonly used descriptors of a microbiome. The main difference to quality
control is that the exploration assumes the technical aspects of the dataset
have been investigated to your satisfaction. Generally speaking at this point
you should be quite certain, that the dataset doesn't suffer from severe
technical biases or you should at least be aware of potential problems.
In reality you might need to go back and forth between QC and exploration,
since you discover through exploration of your dataset technical aspects you
need to check.
```{r, message=FALSE}
library(mia)
data("GlobalPatterns")
se <- GlobalPatterns
```
## Prevalence
Prevalence is a measurements, which describes in how many samples certain
microbes were detected.
Investigating the prevalence of microbes allows you either to focus on changes,
which pertain to most of the samples, or to focus on less often found microbes,
which are nonetheless abundantly found in a small number of samples.
On the raw data, the population prevalence (frequency) at a 1% relative
abundance threshold (`detection = 1/100` and `as_relative = TRUE`), can look
like this. The low prevalence in this examples can be explained by rather
different sample types as well as the in depth nature of the data.
```{r exploration-prevalence}
head(getPrevalence(se, detection = 1/100, sort = TRUE, as_relative = TRUE))
```
The `detection` and `as_relative` can also be used to access, how many samples
do pass a threshold for raw counts. Here the population prevalence (frequency)
at the absolute abundance threshold (`as_relative = FALSE`) at read count 1
(`detection = TRUE`) is accessed.
```{r concepts_prevalence2}
head(getPrevalence(se, detection = 1, sort = TRUE, abund_values = "counts",
as_relative = FALSE))
```
Note that, if the output should used for subsetting or storing the data in
the `rowData`, set `sort = FALSE`.
### Prevalent microbiota analysis
To investigate the microbiome data at a selected taxonomic levels, two
approaches are available.
First the data can be agglomerated to the taxonomic level and `getPrevalence`
be used on the result.
```{r}
altExp(se,"Phylum") <- agglomerateByRank(se, "Phylum")
head(getPrevalence(altExp(se,"Phylum"), detection = 1/100, sort = TRUE,
abund_values = "counts", as_relative = TRUE))
```
Alternatively the `rank` argument can be set, to perform the agglomeration on
the fly.
```{r}
altExp(se,"Phylum") <- agglomerateByRank(se, "Phylum")
head(getPrevalence(se, rank = "Phylum", detection = 1/100, sort = TRUE,
abund_values = "counts", as_relative = TRUE))
```
The difference in the naming scheme, is that by default `na.rm = TRUE` is used
for agglomeration in `getPrevalence`, whereas the default for
`agglomerateByRank` is `FALSE` to prevent accidental data loss.
If you only need the names of the prevalent taxa, `getPrevalentTaxa` is
available. This returns the taxa that exceed the given prevalence and detection
thresholds.
```{r core-members, message=FALSE, warning=FALSE, eval = FALSE}
getPrevalentTaxa(se, detection = 0, prevalence = 50/100)
prev <- getPrevalentTaxa(se, detection = 0, prevalence = 50/100,
rank = "Phylum", sort = TRUE)
prev
```
Note, that the `detection` and `prevalence` thresholds are not the same, since
`detection` can be applied to relative counts or absolute couts depending on
whether `as_relative` is set `TRUE` or `FALSE`
TODO
See also related functions for the analysis of rare and variable taxa
(rareMembers; rareAbundance; lowAbundance).
### Plotting prevalence
To plot the prevalence, the data is first added to the `rowData`.
```{r}
rowData(altExp(se,"Phylum"))$prevalence <-
getPrevalence(altExp(se,"Phylum"), detection = 1/100, sort = FALSE,
abund_values = "counts", as_relative = TRUE)
```
Then it can be plotted via the plotting functions from the `scater` package.
```{r, message=FALSE, warning=FALSE}
library(scater)
plotRowData(altExp(se,"Phylum"), "prevalence", colour_by = "Phylum")
```
Additionally, the prevalence can be plotted on the taxonomic tree using the
`miaViz` package.
```{r}
altExps(se) <- splitByRanks(se)
altExps(se) <-
lapply(altExps(se),
function(y){
rowData(y)$prevalence <-
getPrevalence(y, detection = 1/100, sort = FALSE,
abund_values = "counts", as_relative = TRUE)
y
})
top_phyla <- getTopTaxa(altExp(se,"Phylum"),
method="prevalence",
top=10L,
abund_values="counts")
top_phyla_mean <- getTopTaxa(altExp(se,"Phylum"),
method="mean",
top=10L,
abund_values="counts")
x <- unsplitByRanks(se, ranks = taxonomyRanks(se)[1:6])
x <- addTaxonomyTree(x)
```
After some preparation the data is assembled and can be plotted via
`plotRowTree`.
```{r plot-prev-prev, message=FALSE, fig.cap="Prevalence of top phyla as jduged by prevalence"}
library(miaViz)
plotRowTree(x[rowData(x)$Phylum %in% top_phyla,],
edge_colour_by = "Phylum",
tip_colour_by = "prevalence",
node_colour_by = "prevalence")
```
```{r plot-prev-mean, message=FALSE, fig.cap="Prevalence of top phyla as jduged by prevalence"}
plotRowTree(x[rowData(x)$Phylum %in% top_phyla_mean,],
edge_colour_by = "Phylum",
tip_colour_by = "prevalence",
node_colour_by = "prevalence")
```
## Session Info {-}
```{r sessionInfo, echo=FALSE, results='asis'}
prettySessionInfo()
```