-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathalpha-div.Rmd
175 lines (147 loc) · 4.75 KB
/
alpha-div.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
title: "alpha-div"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{alpha-div}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE
)
```
```{r setup, message=FALSE}
library(MiscMetabar)
data(data_fungi)
```
### Alpha diversity analysis
#### Hill number
Numerous metrics of diversity exist.
Hill numbers [^Hill73] is a kind of general framework for alpha diversity index.
[^Hill73]: Hill MO. 1973. Diversity and evenness: a unifying notation and its consequences. Ecology 54, 427-473.
```{r}
renyi_res <- vegan::renyi(data_fungi@otu_table)
head(renyi_res)
```
#### Test for difference in diversity (hill number)
One way to keep into account for difference in the number of sequences per samples is to use a Tukey test on a linear model with the square roots of the number of sequence as the first explanatory variable of the linear model [^Balint15].
[^Balint15]: Bálint M et al. 2015. Relocation, high-latitude warming and host genetic identity shape the foliar fungal microbiome of poplars. Molecular Ecology 24, 235-248. https://doi.org/10.1111/mec.13018
```{r, fig.cap="Hill number 1"}
p <- MiscMetabar::hill_pq(data_fungi, fact = "Height")
p$plot_Hill_0
```
```{r, fig.cap="Result of the Tuckey post-hoc test"}
p$plot_tuckey
```
See also the [tutorial](https://microbiome.github.io/tutorials/Alphadiversity.html) of the microbiome package for an alternative using the non-parametric [Kolmogorov-Smirnov test](https://www.rdocumentation.org/packages/dgof/versions/1.2/topics/ks.test) for two-group comparisons when there are no relevant covariates.
## Alpha diversity using package `MicrobiotaProcess`
```{r}
library("MicrobiotaProcess")
clean_pq(subset_samples_pq(data_fungi, !is.na(data_fungi@sam_data$Height))) %>%
as.MPSE() %>%
mp_cal_alpha() %>%
mp_plot_alpha(.group = "Height")
```
## Effect of samples variables on alpha diversity using automated model selection and multimodel inference with (G)LMs
From the help of glmulti package :
> glmulti finds what are the n best models (the confidence set of models) among all possible models (the candidate set, as specified by the user). Models are fitted with the specified fitting function (default is glm) and are ranked with the specified Information Criterion (default is aicc). The best models are found either through exhaustive screening of the candidates, or using a genetic algorithm, which allows very large candidate sets to be adressed. The output can be used for model selection, variable selection, and multimodel inference.
```{r}
library("glmulti")
formula <- "Hill_0 ~ Hill_1 + Abundance + Time + Height"
res_glmulti <-
glmutli_pq(data_fungi, formula = formula, level = 1)
res_glmulti
ggplot(data = res_glmulti, aes(x = estimates, y = variable)) +
geom_point(
size = 2,
alpha = 1,
show.legend = FALSE
) +
geom_vline(
xintercept = 0,
linetype = "dotted",
linewidth = 1
) +
geom_errorbar(
aes(xmin = estimates - alpha, xmax = estimates + alpha),
width = 0.8,
position = position_dodge(width = 0.8),
alpha = 0.7,
show.legend = FALSE
) +
geom_label(aes(label = nb_model), nudge_y = 0.3, size = 3) +
xlab("Standardized estimates") +
ylab(formula)
ggplot(data = res_glmulti, aes(
x = importance,
y = as.factor(variable),
fill = estimates
)) +
geom_bar(
stat = "identity",
show.legend = FALSE,
alpha = 0.8
) +
xlim(c(0, 1)) +
geom_label(aes(label = nb_model, x = 0.1),
size = 3,
fill = "white"
) +
scale_fill_viridis_b() +
xlab("Importance") +
ylab(formula)
```
```{r}
formula <- "Hill_0 ~ Abundance + Time + Height"
res_glmulti_interaction <-
glmutli_pq(data_fungi, formula = formula, level = 2)
res_glmulti_interaction
ggplot(data = res_glmulti_interaction, aes(x = estimates, y = variable)) +
geom_point(
size = 2,
alpha = 1,
show.legend = FALSE
) +
geom_vline(
xintercept = 0,
linetype = "dotted",
linewidth = 1
) +
geom_errorbar(
aes(xmin = estimates - alpha, xmax = estimates + alpha),
width = 0.8,
position = position_dodge(width = 0.8),
alpha = 0.7,
show.legend = FALSE
) +
geom_label(aes(label = nb_model), nudge_y = 0.3, size = 3) +
xlab("Standardized estimates") +
ylab(formula)
ggplot(data = res_glmulti_interaction, aes(
x = importance,
y = as.factor(variable),
fill = estimates
)) +
geom_bar(
stat = "identity",
show.legend = FALSE,
alpha = 0.8
) +
xlim(c(0, 1)) +
geom_label(aes(label = nb_model, x = 0.1),
size = 3,
fill = "white"
) +
scale_fill_viridis_b() +
xlab("Importance") +
ylab(formula)
```
# Session information
```{r}
sessionInfo()
```