-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbootstrap.jmd
147 lines (121 loc) · 6.07 KB
/
bootstrap.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
# Parametric bootstrap for linear mixed-effects models
Julia is well-suited to implementing bootstrapping and other simulation-based methods for statistical models.
The `parametricbootstrap` function in the [MixedModels package](https://github.com/JuliaStats/MixedModels.jl) provides an efficient parametric bootstrap for linear mixed-effects models.
## The parametric bootstrap
[Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) is a family of procedures
for generating sample values of a statistic, allowing for visualization of the distribution of the
statistic or for inference from this sample of values.
A _parametric bootstrap_ is used with a parametric model, `m`, that has been fit to data.
The procedure is to simulate `n` response vectors from `m` using the estimated parameter values
and refit `m` to these responses in turn, accumulating the statistics of interest at each iteration.
The parameters of a `LinearMixedModel` object are the fixed-effects
parameters, `β`, the standard deviation, `σ`, of the per-observation noise, and the covariance
parameter, `θ`, that defines the variance-covariance matrices of the random effects.
For example, a simple linear mixed-effects model for the `Dyestuff` data in the [`lme4`](http://github.com/lme4/lme4)
package for [`R`](https://www.r-project.org) is fit by
```{julia;term=true}
using DataFrames
using Gadfly # graphics
using MixedModels
using Random # set seed for random number generator
using RCall # communicate with r
```
```{julia;term=true}
dyestuff = MixedModels.dataset(:dyestuff)
m1 = fit(MixedModel, @formula(yield ~ 1 + (1 | batch)), dyestuff)
```
To bootstrap the model parameters, first initialize a random number generator
```{julia;term=true}
const rng = MersenneTwister(1234321);
```
then create a bootstrap sample
```{julia;term=true}
samp = parametricbootstrap(rng, 10_000, m1);
propertynames(samp)
```
As shown above, the sample has several named properties, which allow for convenient extraction of information. For example, a density plot of the estimates of `σ`, the residual standard deviation, can be created as
```{julia;label="sigmadensity";fig_width=7;fig_height=4;fig_ext=".svg"}
plot(x=samp.σ, Geom.density, Guide.xlabel("Parametric bootstrap estimates of σ"))
```
For the estimates of the intercept parameter, the `getproperty` extractor must be used
```{julia;label="betadensity";fig_width=7;fig_height=4;fig_ext=".svg"}
plot(x = first.(samp.β), Geom.density, Guide.xlabel("Parametric bootstrap estimates of β₁"))
```
The `σs` property contains the estimates of the standard deviation of the random effects in a hierarchical format.
```{julia;term=true}
typeof(samp.σs)
```
This is to allow for random effects associated with more than one grouping factor.
If we only have one grouping factor for random effects, which is the case here, we can use the `first` extractor, as in
```{julia;term=true}
first(samp.σs)
```
or, to carry this one step further,
```{julia;label="sigma1density";fig_width=7;fig_height=4;fig_ext=".svg"}
plot(x=first.(first(samp.σs)), Geom.density,
Guide.xlabel("Parametric bootstrap estimates of σ₁"))
```
Notice that this density plot has a spike, or mode, at zero.
Although this mode appears to be diffuse, this is an artifact of the way that density plots are created.
In fact, it is a pulse, as can be seen from a histogram.
```{julia;label="sigma1histogram";fig_width=7;fig_height=4;fig_ext=".svg"}
plot(x=first.(first(samp.σs)), Geom.histogram,
Guide.xlabel("Parametric bootstrap estimates of σ₁"))
```
A value of zero for the standard deviation of the random effects is an example of a *singular* covariance.
It is easy to detect the singularity in the case of a scalar random-effects term.
However, it is not as straightforward to detect singularity in vector-valued random-effects terms.
For example, if we bootstrap a model fit to the `sleepstudy` data
```{julia;term=true}
sleepstudy = MixedModels.dataset(:sleepstudy)
m2 = fit(
MixedModel,
@formula(reaction ~ 1+days+(1+days|subj)),
sleepstudy,
)
```
```{julia;term=true}
samp2 = parametricbootstrap(rng, 10_000, m2, use_threads=true);
```
the singularity can be exhibited as a standard deviation of zero or as a correlation of $\pm1$.
The `σρs` property of the sample is a vector of named tuples
```{julia;term=true}
σρ = first(samp2.σρs);
typeof(σρ)
```
where the first element of the tuple is itself a tuple of standard deviations and the second (also the last) element of the tuple is the correlation.
A histogram of the estimated correlations from the bootstrap sample has a spike at `+1`.
```{julia;label="rhohistogram";fig_width=7;fig_height=4;fig_ext=".svg"}
ρs = first.(last.(σρ))
plot(x = ρs, Geom.histogram,
Guide.xlabel("Parametric bootstrap samples of correlation of random effects"))
```
or, as a count,
```{julia;term=true}
sum(ρs .≈ 1)
```
Close examination of the histogram shows a few values of `-1`.
```{julia;term=true}
sum(ρs .≈ -1)
```
Furthermore there are even a few cases where the estimate of the standard deviation of the random effect for the intercept is zero.
```{julia;term=true}
sum(first.(first.(first.(σρ))) .≈ 0)
```
There is a general condition to check for singularity of an estimated covariance matrix or matrices in a bootstrap sample.
The parameter optimized in the estimation is `θ`, the relative covariance parameter.
Some of the elements of this parameter vector must be non-negative and, when one of these components is approximately zero, one of the covariance matrices will be singular.
The boundary values are available as
```{julia;term=true}
samp2.m.optsum.lowerbd
```
so the check on singularity becomes
```{julia;term=true}
sum(θ -> any(θ .≈ samp2.m.optsum.lowerbd), samp2.θ)
```
The `issingular` method for a `LinearMixedModel` object that tests if a parameter vector `θ` corresponds to a boundary or singular fit.
The default value of `θ` is that from the model but another value can be given as the second argument.
This operation is encapsulated in a method for the `issingular` function.
```{julia;term=true}
sum(issingular(samp2))
```