-
Notifications
You must be signed in to change notification settings - Fork 58
/
LMM2.rmd
1136 lines (888 loc) · 42.8 KB
/
LMM2.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Linear Mixed Models (LMMs) - Part 2"
author: "Joshua F. Wiley"
date: "`r Sys.Date()`"
output:
tufte::tufte_html:
toc: true
number_sections: true
---
Download the raw `R` markdown code here
[https://jwiley.github.io/MonashHonoursStatistics/LMM2.rmd](https://jwiley.github.io/MonashHonoursStatistics/LMM2.rmd).
These are the `R` packages we will use.
```{r setup}
options(digits = 4)
## new packages are lme4, lmerTest, and multilevelTools
library(data.table)
library(JWileymisc)
library(extraoperators)
library(lme4)
library(lmerTest)
library(multilevelTools)
library(visreg)
library(ggplot2)
library(ggpubr)
library(haven)
## load data collection exercise data
## merged is a a merged long dataset of baseline and daily
dm <- as.data.table(read_sav("Merged.sav"))
```
# Random Intercepts and Slopes
## Theory / Conceptual
Thus far we have learned about estimating LMMs where only the
intercept is a random effect, although we've added fixed effect
between and within predictors, looked at diagnostics, interpretations,
etc.
In addition to allowing intercepts to differ by person, we can also
allow regression slopes to differ by person. In LMMs, we always
include a random intercept. That means if we also include a random
slope, we will have two random effects. With two (or more) random
effects, because they are assumed to be a random variable with a
normal distribution, we can also look at how the random effects
correlate with each other. That is, just as how you could look at how
two random variables, say age and stress, correlate with each other,
you can look at how any two random effects correlate with each other
(e.g., how a random intercept and random slope correlate).
When working with random slopes, we also need to slightly modify our
understanding of the assumptions and notation for LMMs.
Let $\mathbf{G}$ be a square, $q$ x $q$ covariance matrix, where $q$
is the number of random effects in the model. In our simplest of
examples, there is only one random effect, the random intercept, so $q
= 1$ and the random effect covariance matrix is:
$$
\mathbf{G} =
\begin{bmatrix}
\sigma^{2}_{int} \\
\end{bmatrix}
$$
If we had a random intercept only, then we assume:
$$ u_{0j} \sim \mathcal{N}(0, \mathbf{G}) $$
This is the same assumption we covered in the introduction to LMMs,
however the subtle change in notation sets us up for more complexity.
By using $\mathbf{G}$ we are replacing a single standard
deviation/variance with a covariance matrix. The benefit of the matrix
is that a covariance matrix can be small, like a 1 x 1 matrix or
bigger like a 2 x 2 or $q$ x $q$ matrix for any number of random
effects.
So what is the implication of this change? We do not *just* assume
that each random effect follows a normal distribution. In fact, for
LMMs, the assumption is that the random effects follow a
**multivariate** normal distribution^[for more info, see here:
https://en.wikipedia.org/wiki/Multivariate_normal_distribution].
The multivariate normal distribution (MVN) is a distribution for
multiple variables. If $q$ variables follow a MVN, then each $q_i$
variable will itself follow a univariate normal distribution.
However, just because each $q_i$ variable individually follows a
univariate normal distribution does not mean that the $q$ variables
together follow a MVN. In other words, MVN implies univariate normal,
but univariate normal does not imply MVN.
Like the univariate normal distribution, the MVN has two basic
parameters, the mean and variance. However, unlike a univariate normal
distribution, the MVN the means will be a vector of means, one for
each variable, and the variances, will be a $q$ x $q$ covariance
matrix.
Back to LMMs, what this means is that if we have a random intercept
and a random slope, $q = 2$ and we'd have:
$$
\mathbf{G} =
\begin{bmatrix}
\sigma^{2}_{int} & \sigma_{int,slope} \\
\sigma_{int,slope} & \sigma^{2}_{slope} \\
\end{bmatrix}
$$
The variances are on the diagonal and the covariance (the
unstandardized correlation between the intercept and the slope) is on
the off diagonal.
The usual practice in LMMs is to freely estimate both the variances
and covariances (correlations) for any random effects. We can,
however, also assume that the random effects are uncorrelated, that
is, assume that the random effects follow a multivariate normal
distribution like this:
$$
\mathbf{G} =
\begin{bmatrix}
\sigma^{2}_{int} & 0 \\
0 & \sigma^{2}_{slope} \\
\end{bmatrix}
$$
Although random slopes are different in some respects from a random
intercept, in most ways they are comparable. In both cases we assume a
normal distribution, with a mean (the "fixed effect" part, the average
slope across people) and a standard deviation (how much variability
there is in the slope across people). We estimate the mean and
standard deviation, but we *can* get BLUPs for the individual slopes
as well, which are predictions about what the slope is for any
specific individual. While a random intercept only impacts the level
of lines, a random slope will impact both the level and the slope of
the lines, shown graphically in the following figure.
```{r, fig.width = 7, fig.height = 10, fig.cap = "Figure showing a random intercept or random intercept and slope"}
ggarrange(
ggplot(expand.grid(x = c(0, 10), y = c(0, 10)), aes(x, y)) +
geom_point(colour = "white") +
geom_abline(intercept = 1, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = 2, slope = 1, colour = "yellow", size = 1) +
geom_abline(intercept = 3, slope = 1, colour = "blue", size = 1) +
geom_abline(intercept = 4, slope = 1, colour = "purple", size = 1) +
theme_pubr() +
coord_cartesian(xlim = c(0, 9), ylim = c(0, 10), expand=FALSE) +
ggtitle("Random Intercepts"),
ggplot(expand.grid(x = c(0, 10), y = c(0, 10)), aes(x, y)) +
geom_point(colour = "white") +
geom_abline(intercept = 1, slope = 1, colour = "black", size = 1) +
geom_abline(intercept = 2, slope = .5, colour = "yellow", size = 1) +
geom_abline(intercept = 3, slope = 1.5, colour = "blue", size = 1) +
geom_abline(intercept = 4, slope = .5, colour = "purple", size = 1) +
theme_pubr() +
coord_cartesian(xlim = c(0, 9), ylim = c(0, 10), expand=FALSE) +
ggtitle("Random Slopes"),
ncol = 1, nrow = 2)
```
To evaluate the distributional assumptions of LMMs with multiple
random effects, the ideal test is to evaluate whether the random
effects come from a MVN. Visualizing MVN distributions is not easy,
especially with more than 2 dimensions, so instead we use another
approach.
The Mahalanobis distance measures the distance between a point and a
space defined by a vector of means and a covariance matrix. The
"point" can be unidimensional or multidimensional and if multidimensional
is not limited to only two dimensions. That is, the Mahalanobis
distance can compute the distance between a multidimensional point and
a multidimensional distribution. This is very convenient as it scales
arbitrarily to however complex (many random effects) or simple (one
random effect) we may have. If for each row of data, we calculate its
Mahalanobis distance, those squared distances will follow a $\chi^{2}$
(chi-squared) distribution with degrees of freedom equal to the number
of dimensions ($p$), that is a $\chi^2$ distribution with $df =
p$ if the raw data were MVN. This allows us to compare our observed
Mahalanobis distances to a chi-squared distribution and if they are a
close match, it indicates the data on which the Mahalanobis distances
were calculated were MVN. Thus, we can use Mahalanobis distances to
evaluate whether a set of variables have a MVN distribution.
They also can help identify multivariate outliers. We will see
examples of this when we look at diagnostics for LMMs with multiple
random effects.
## Random Slopes in `R`
Random slopes can only be estimated for predictors that vary within
units. Using the merged daily data collection exercise dataset, we can
look at mood predicting stress. First, we will take a look at a model
with a random intercept and `mood` as a fixed effect only.
```{r}
## get rid of the haven_labeled class type for stress
dm[, stress := as.numeric(stress)]
m0 <- lmer(stress ~ mood + (1 | ID),
data = dm)
summary(m0)
plot(modelDiagnostics(m0), ncol = 2, nrow = 2, ask = FALSE)
```
From the results, we can see that on average, higher mood is
associated with lower stress. Mood has not been separated into a
between or within component, so we cannot say whether its driven by
people who have higher mood on average having lower stress or days
with higher than usual mood having lower than usual stress, or both.
We can see that the residuals (top left graph panel) and random
intercept by ID (bottom left graph panel) are about normally
distributed with only modest if any outliers and we can see that the
homogeneity of variance assumption is approximately met at least in
the residuals vs fitted/predicted values (top right).
Next, we add mood as a random slope by adding it inside the
parentheses before ID, `(mood | ID)`.
```{r}
m1 <- lmer(stress ~ mood + (mood | ID),
data = dm)
summary(m1)
```
The output is fairly familiar to the fixed effect slope and random
intercept only model but there is another random effect under the
groups ID for `mood`. We get the variance and standard deviation of
`mood` as well as the correlation between the random intercept and the
random slope. The correlation is
`r as.data.table(VarCorr(m1))[3, sdcor]` indicating that people who
have a higher level of stress when mood is 0 (the intercept) also tend
to have a more negative slope of stress on mood. Note that although
the fixed effect slope estimate for mood is about the same between the
two models, the standard error is larger so the p-value is higher
(further away from 0) for the model with a fixed and random slope of
`mood` than the model with only a fixed slope of `mood`. This is a
fairly common pattern.
Now we can look at model diagnostic plots. We use 2 columns and 3 rows
as there are more plots now, but otherwise we use the usual
`modelDiagnostics()` function.
```{r, fig.width = 7, fig.height = 10}
plot(modelDiagnostics(m1), ncol = 2, nrow = 3, ask = FALSE)
```
In the figure, we see our familiar density and QQ deviates plot for
the residuals (top left). Univariate density plots (black lines) and
univariate normal distributions (blue dashed lines) for the random
intercept alone (`ID : (Intercept)`) and the random slope of mood
alone (`ID : mood`). The naming convention is to include the
coefficient name, intercept or mood for the mood slope, prefixed by
what the variable name is for the random effect, here ID. Both of the
univariate random effect distribution plots can be interpreted as
usual and are shown on the middle row, left and right panels. What is
technically graphed are the BLUPs and at the bottom we can get a five
number summary to help see the minimum, 25th percentile, 50th
percentile (median), 75th percentile, and maximum, which can give us
some sense of the spread of the random intercept and slope. For
example, for the random slope of mood, the maximum BLUP is 0,
indicating that the highest predicted individual slope for anyone is
0. This tells us that no one is expected to have *higher* stress if
they have higher mood. We can also see that 50% of people have BLUPs
between -0.38 and -0.27, giving us a sense of the spread of the random
slopes.
The new graph is on the bottom row, left side and it evaluates whether
the random effects by ID follow a MVN or not by using the Mahalanobis
distances. The observed density is the black line. The theoretical
density again is in dashed blue line. Here the theoretical density
**is not** a normal density, but a chi-squared density with $df = p =
2$. In this case, we can see that its not a terrible fit between the
observed and theoretical chi-squared density, suggesting the MVN
assumption is reasonably well met. We also can see, however, that
there are some relatively extreme distances, with the maximum at 10.84
being quite a bit higher than the next nearest. Although its not
flagged as an extreme value, partly because there are only about 50
people.
In practice, I may not actually make any changes here. For the sake of
illustration, however, and to show how to do it, we will use a more
stringent criteria for extreme values. Rather than defining an extreme
value as the top and bottom 0.5% of the theoretical distribution,
let's use the top and bottom 1% of the theoretical distribution.
Because I know this will yield some extreme values to remove, I am
saving the results of `modelDiagnostics()` in an object, `m1.diag`
which I can plot but I can also use to identify which rows / IDs in
the dataset are extreme and should / could be dropped.
The new figure shows a few extreme values in the random effect
BLUPs. A low value on the intercept, a couple too high (near 0)
slopes, and at least one MVN extreme value, the 10.84 Mahalanobis
distance.
```{r, fig.width = 7, fig.height = 10}
m1.diag <- modelDiagnostics(m1, ev.perc = .01)
plot(m1.diag, ncol = 2, nrow = 3, ask=FALSE)
```
To consider removing these cases, we need to identify them.
The `m1.diag` object in `R` has several subparts, but we are
particularly interested here in the `extremeValues` subpart, which is
itself a little data table, shown in the following.
```{r}
m1.diag$extremeValues
```
In this extreme values data table, we can see a few columns. The most
important columns are:
- `ID` this is the ID variable from the dataset and will be helpful
later.
- `Index` this is the row number in the dataset used in the LMM and
can be used to identify specific rows of data that are extreme.
- `EffectType` this indicates what type of effect the extreme value
was identified on.
The first column, `stress` just shows the outcome variable score for
reference, which may help but is not necessarily that useful always.
The name of the first column will depend on the name of the outcome
variable
In this dataset, we can see that there are three extreme values
identified on the residuals. For all the random effects, because the
BLUPs are calculated per person, not per observation if a person is
classified as an extreme value, then all observations belonging to
that specific ID will be shown. That is because at the random effect
level, a person as a whole unit is either extreme or not and if
extreme the only "choice" would be to exclude / modify that entire person.
Because there are multiple types of extreme values, we could have some
choice in how to address them. We could remove any extreme values, or
remove one at a time and re-run the model to see if anything else
remained extreme or not. For example, we can see that ID 24 is an
extreme value on the multivariate test as well as the random
intercept. Dropping ID 24 might change the rest enough that say some
of the other residuals are no longer extreme. Conversely, we could
drop some of the extreme residuals, specific 'weird' days and see if
that happens to address any of the random effects. In this case, it is
not too likely as the extreme residuals come from different
participants (IDs 20, 30, 51) than do the extreme values on the random
effects.
A relatively intense approach would be to drop all of these extreme
values and re-estimate the LMM in the dataset with these rows / IDs
excluded. Here are different ways that we could do that.
The first approach uses the `Index` column from the extreme values
data table and then we pass that to our dataset, `dm` and use the
minus sign to indicate we want to drop those rows of data.
Note that because the same rows are extreme on a few different
measures, we use the `unique()` function so that we only drop each row
once. This never hurts to use, as if all rows are already unique its
fine, but it helps if there are duplicate extreme values (e.g., Index
80 is extreme on the random intercept and MVN).
```{r}
dm.noev1 <- dm[-unique(m1.diag$extremeValues$Index)]
```
Another approach, supposing we only wanted to drop selected IDs. For
example, we could decide that we want to only get rid of ID 24, the
MVN outlier. Here we use the `%nin%` operator which takes a variable
on the left and returns `TRUE` if it is not in (nin) the value/vector
of values on the right hand side. This operator comes from the
`extraoperators` package so we need to have that loaded.
```{r}
dm.noev2 <- dm[ID %nin% 24]
```
Supposing we wanted to remove IDs 24, 26, and 51, we could use the
same approach but instead of listing one ID, we'd list all three as a
vector, like this:
```{r}
dm.noev3 <- dm[ID %nin% c(24, 26, 51)]
```
If we wanted to only exclude one type of effect, say only the MVN
extreme values, we could use the row indices but pre-select specific
effect types. We can do this because the extreme values dataset is a
regular data table, so we can operate on it the same as we would on
any dataset. Let's just confirm that we can subset the extreme values
dataset to only give us the MVN extreme values.
```{r}
m1.diag$extremeValues[EffectType == "Multivariate Random Effect ID"]
```
Now that that works, we can use the same approach we did to exclude
all extreme values, but using this subsetted dataset. We replace:
`m1.diag$extremeValues` with
`m1.diag$extremeValues[EffectType == "Multivariate Random Effect ID"]`
and then we use just the `Index` column as before by writing:
`$Index`.
```{r}
dm.noev4 <- dm[-unique(m1.diag$extremeValues[EffectType == "Multivariate Random Effect ID"]$Index)]
```
If you wanted even more customized options, you could achieve those by
making the subsetting of the extreme values dataset fancier (e.g.,
picking multiple but not all effect types, etc.)^[This is one of the
benefits of `R` and an approach where everything is an object. We can
use the output from one function, `modelDiagnostics()`, and because it
returns an object, a dataset of extreme values, we can operate on it
as we want and then use those results to subset our main dataset for
analysis.].
At this point, we can re-run our analysis using the revised dataset.
For the sake of example, I will just use our first dataset that
excluded all extreme values from any type of effect, `dm.noev1`.
```{r, fig.width = 7, fig.height = 10}
m1noev <- lmer(stress ~ mood + (mood | ID),
data = dm.noev1)
summary(m1noev)
m1noev.diag <- modelDiagnostics(m1noev, ev.perc = .01)
plot(m1noev.diag, ncol = 2, nrow = 3, ask=FALSE)
```
The results look fairly similar, although the correlation between the
random intercept and slope has dropped from
`r as.data.table(VarCorr(m1))[3, sdcor]`
to
`r as.data.table(VarCorr(m1noev))[3, sdcor]`
and the average slope of mood on stress, the fixed effect part, is
somewhat stronger going from
`r fixef(m1)[["mood"]]` to
`r fixef(m1noev)[["mood"]]`.
The diagnostics also look improved.
A shorter trick to re-running the same model on a new dataset is to
use the `update()` function. The `update()` function takes an existing
model, and you can update that model in different ways. Here, we will
update the model by just using a new dataset, here the dataset where
we only excluded the MVN extreme value, `dm.noev2`. This is just to
illustrate how easy it is even without knowing the formula for a model
to just update an old model on a different dataset. This is very
helpful for running models with and without extreme values or with and
without some participants who perhaps did not follow the study
protocol correctly, etc. The benefit of using `update()` is that even
if you have lots of predictors in your model, you don't have to type
them all again and you are guaranteed to have the same model as
before, just with a new dataset, no chance for typos and forgetting a
predictor or covariate.
```{r}
m1noev2 <- update(m1, data = dm.noev2)
summary(m1noev2)
```
In the previous topic, we saw adding both between and within person
variants of a predictor into a LMM as fixed effects. Now let's take a
look at doing that with both fixed and random slopes.
First, we need to create a between and within person version of our
predictor, `mood`. Note that this only works because mood was measured
each day. It would not work if `mood` was measured only once, it would
already be a between person variable. Note also that mean deviations
only makes sense for continuous predictors. If `mood` was binary, it
would not make sense to look at the average and the deviations from
the average, probably.
```{r}
dm[, c("Bmood", "Wmood") := meanDeviations(mood), by = ID]
```
Now we can fit a LMM with `Bmood` and `Wmood` as fixed effects and a
random intercept and random slope for `Wmood`. Note that the random
intercept is included automatically, without us having to ask for it
explicitly. We also look at the model diagnostics and confidence
intervals.
```{r, fig.width = 7, fig.height = 10}
m2 <- lmer(stress ~ Bmood + Wmood +
(Wmood | ID),
data = dm)
summary(m2)
m2.diag <- modelDiagnostics(m2)
plot(m2.diag, ncol = 2, nrow = 3, ask=FALSE)
m2.ci <- confint(m2, oldNames = FALSE)
m2.ci
```
In the output, we now see the random slope is for `Wmood` and under
fixed effects we have both `Bmood` and `Wmood`. In this instance both
the sign and the magnitude of the fixed effects for the slope of
`Bmood` and `Wmood` on stress are about the same. However, this does
not have to be true. In fact, its even possible for the between person
effect to have one sign and the within person effect to have the
opposite sign for the slope^[A conceptual example of why the between
and within might differ. Imagine that you look at the relationship
between exercise and well being. You might find that people that
exercise more on average have better average wellbeing. However, if
people exercise more than usual for them, they might over exercise and
that be associated with worse wellbeing. Because overexercising is
relative to an individuals' own fitness level, it only shows up at the
within person only level. Although this is relatively rare, it often
happens that the magnitude of effects varies across the between person
and within person effects.].
Visually, all the diagnostics look relatively OK, although there are a
few extreme values (now using the default top and bottom 0.5% of the
theoretical distribution definition) on the residuals and on the
random intercept. In practice, it would be worthwhile to consider
whether these extreme values should be addressed somehow or if you are
comfortable proceeding as is.
The profile likelihood confidence intervals take a few seconds to
complete and generate a few warnings related to the fact that the
lower bound for the intercept-slope correlation is -1, the lowest
possible correlation.
### Sample Write Up
A linear mixed model was fit to
`r nobs(m2)` stress scores from
`r as.integer(ngrps(m2))` people. The intraclass correlation
coefficient of stress, the outcome, was
`r iccMixed("stress", id = "ID", data = dm)$ICC[1]` and of mood, the
predictor was
`r iccMixed("mood", id = "ID", data = dm)$ICC[1]`,
indicating that about 40$ of the total variance in stress and about
30% of the total variance in mood exists between people with the
remaining due to daily fluctuations within people.
The fixed effect intercept revealed that the
estimated average [95% CI] stress was
`r fixef(m2)[["(Intercept)"]]`
`r sprintf("[%0.2f, %0.2f]", m2.ci[5, 1], m2.ci[5, 2])`, when `Bmood`
and `Wmood` are zero.
However, there were individual differences, with the standard
deviation for the random intercept being
`r as.data.table(VarCorr(m2))[1, sdcor]`
indicating that there are individual differences. Assuming the random
intercepts follow a normal distribution,
we expect most people to fall within one standard deviation of the
mean, which in these data would be somewhere between:
`r fixef(m2)[["(Intercept)"]] + c(-1, 1) * as.data.table(VarCorr(m2))[1, sdcor]`.
There was also a significant fixed effect of average mood on stress,
such that each one unit higher average mood people had was associated
with `r fixef(m2)[["Bmood"]]`
95% CI = `r sprintf("[%0.2f, %0.2f]", m2.ci[6, 1], m2.ci[6, 2])`.
There was also a significant fixed effect of within person mood on
stress, such that on days where people were one unit higher on mood
than their own average, people were expected to have
`r fixef(m2)[["Wmood"]]`
95% CI = `r sprintf("[%0.2f, %0.2f]", m2.ci[7, 1], m2.ci[7, 2])`
lower stress that same day, on average.
However, there were individual differences, with the standard
deviation for the random slope being
`r as.data.table(VarCorr(m2))[2, sdcor]`
indicating that there are individual differences in the slope of
within person mood on stress.
Assuming the random slope follow a normal distribution,
we expect most people to fall within one standard deviation of the
mean, which in these data would indicate that most people are expected
to have a within person slope of stress on mood between:
`r fixef(m2)[["Wmood"]] + c(-1, 1) * as.data.table(VarCorr(m2))[2, sdcor]`.
Finally, there was a negative correlation between the random intercept
and slope,
`r as.data.table(VarCorr(m2))[3, sdcor]`
indicating that compared to the population averages, people who had a
relatively higher stress when mood was 0 also tended to have a
more negative within person association between mood and stress.
## Solving Convergence/Estimation Issues
We have talked before some about convergence, where algorithms iterate
through cycles and try to find the best parameter estimates. Sometimes
this process fails or runs into problems, which we may broadly call
convergence or estimation issues.
Sometimes these do not make too big of a difference, sort of false
positives, but sometimes these issues may severely impact a models'
results. If you see warnings about convergence or estimation, its best
to address and resolve them before being confidence in the model
results.
Let's create a between and within version of the daily variable,
`energy` and look at a fixed and random slope LMM predicting stress
from energy.
```{r}
dm[, c("Benergy", "Wenergy") := meanDeviations(energy), by = ID]
mest <- lmer(stress ~ Benergy + Wenergy +
(Wenergy | ID),
data = dm)
```
When we estimate the model, we get a message about a boundary
(singular) fit with the suggestion to see `?isSingular`.
This is a helpful suggestion and provides more details on potential
approaches to resolving it. If we run
`?isSingular` in the `R` console will get a help page with some more
information, which you **should** do and go read now.
Armed with that knowledge, we can run a `summary()` on our model to
learn a bit more. Its not always clear, but in this case we can see a
correlation of -1 between the random intercept and slope that is
causing the singularity.
```{r}
summary(mest)
```
Our options for resolving this are listed in `?isSingular`. In this
case, a simple path forward is to remove some random effects. We
basically always keep a random intercept in our LMMs, so the only
candidate to remove is the random slope for `Wenergy`. Doing that we
no longer get the singularity warning.
```{r}
mest2 <- lmer(stress ~ Benergy + Wenergy +
(1 | ID),
data = dm)
summary(mest2)
```
If you have multiple random slopes, it can be tricky to decide which
one to drop.
Now let's look at a convergence issue. For this, we'll use another
dataset, the `aces_daily` data. First we setup the data (from the
`JWileymisc` package) and then fit a model with a random slope of
stress predicting negative affect.
```{r}
data(aces_daily)
mconv <- lmer(NegAff ~ STRESS + (1 + STRESS | UserID),
data = aces_daily)
```
In this case, we get a warning message that the model did not
converge. This is not a singularity issue. This means that the
algorithm that tries to find the best parameter estimates tried and
failed to find estimates its confident are 'best'. For now, we won't
worry too much about what the 'best' actually means. In this case,
sometimes using a different optimizer, basically the algorithm that
goes about trying to find the best estimates, or changing its options
can help. We can control the underlying algorithms using the
`lmerControl()` function. In this unit, we are not going to talk much
about the different options but just show one alternate you could try
if you run into convergence problems.
```{r}
## use the control function to pick a different algorithm
## and adjust the options to be a bit stricter
## may take longer but also may have a better chance of converging
strictControl <- lmerControl(optCtrl = list(
algorithm = "NLOPT_LN_NELDERMEAD",
xtol_abs = 1e-12,
ftol_abs = 1e-12))
## now fit our model
mconv2 <- lmer(NegAff ~ STRESS + (1 + STRESS | UserID),
data = aces_daily,
control = strictControl)
```
In this case, the change in algorithm and options did it and we now
get our model converging without warnings. Sometimes it may still have
warnings and none of what you know how to try will solve those. In
those cases, you'd either want to consult an expert or simplify the
model. Very often, much as with singularity issues, convergence issues
surround the random effects in a model. If you remove some of the
random effects (at the extreme a random intercept only) you are likely
to resolve the convergence issues, although you may give up aspects of
the model you wanted to keep.
Sometimes as well, the convergence issues may not make much
difference. Here we can look at the model results that did not
converge quite and those that did.
```{r}
summary(mconv)
summary(mconv2)
```
In this instance, the only apparent difference is in the 4th decimal
point of the standard deviation of the random intercept.
While the first algorithm did not quite achieve estimates it was
confident are the best, it was really very close. Sometimes however,
the estimates could be far off. If you cannot get a model that
converges, though, you have no comparison so its hard to know if the
non convergence was an issue or not. That is why its generally the
best idea to always resolve a non convergence issue if possible and be
very cautious about interpreting a model that did not converge.
# Graphing LMMs
We can graph results from LMMs much as we do for linear regressions or
GLMs. There are a few nuances, however. In LMMs, people sometimes
refer to marginal or conditional effects. In some ways, these
correspond to the idea of fixed or random effects.
Marginal predictions are based only on the averages, basically the fixed
effects portion of the model. Conditional predictions incorporate
**both** the fixed and random effects portion of the model.
Because the conditional predictions incorporate random effects, they
can only be made for the existing data because it is impossible to
know what a new person would look like (e.g., if you recruited one
more participant, would their BLUP be like ID 1, ID 2, or ID XX?).
Therefore, conditional predictions are only made off the data used to
build the model. Marginal predictions could be made for new data,
though to answer a question like, what would the model predict, on
average, the stress score would be if mood was 2 points above average
on a day? That would not tell you how any single individual is
predicted to be, but does indicate what, on average, the prediction
would be from the model. Additionally, because conditional predictions
will vary by ID, we can get different conditional predictions for each
participant in the dataset.
We will look at examples of both of these briefly.
## Marginal
Graphs based off the marginal predictions from the model are probably
the default and most common approach. These graphs show the average
association or predictions from the model and basically graph the
fixed effects part of the model. They are interpreted nearly
identically to linear regression graphs.
To start, we will use our model with between person and within person
mood predicting stress with a random intercept and random slope for
within person mood, called `m2`. We use the `visreg()` function to
graph the results with the x axis being `Bmood` and the y axis the
predicted stress scores.
```{r, error = TRUE}
m2 <- lmer(stress ~ Bmood + Wmood +
(Wmood | ID),
data = dm)
visreg(m2, xvar = "Bmood")
```
This however results in an error message about a differing number of
rows. Its not clear from the error message, but since everything else
in the code is right, we can infer this error message has something to
do with the underlying data. We can use the `str()` function to get
the classes in `R` for each of our variables.
```{r}
str(dm[, .(stress, Bmood, Wmood, ID)])
```
Here we can see that `Wmood` has the class haven labeled. As this is
not a common class (common classes in `R` are logical, integer,
factor, numeric, character), its a reasonable guess that is causing
the trouble. `visreg()` may not know how to work with haven labeled
type variables. We don't need the labels so we can convert `Wmood` to
a numeric variable and try again. Since we are using a model, after
updating `Wmood` we need to re-run our LMM. The results will not
change but hopefully now `visreg()` will work as expected.
```{r}
dm[, Wmood := as.numeric(Wmood)]
m2 <- lmer(stress ~ Bmood + Wmood +
(Wmood | ID),
data = dm)
visreg(m2, xvar = "Bmood",
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Average Mood",
ylab = "Predicted Stress",
line = list(col = "black", size = 1)) +
theme_pubr()
```
The graph shows us how average mood (between person mood) and stress
are associated on average. We made a number of customizations to the
graph, including not showing partial residuals, no rug plot, defining
our own x and y axis labels, and changing the line colour to black
(the default is blue), and use the `pubr` theme.
We can make the same plot for within person mood. We just change the
`xvar` option to be `Wmood`. We also make a bit fancier x axis
label. Note the use of "\n" that tells `R` that we want a line break
in the x axis label.
```{r}
visreg(m2, xvar = "Wmood",
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Within Mood\n(deviations from own average)",
ylab = "Predicted Stress",
line = list(col = "black", size = 1)) +
theme_pubr()
```
This figure shows the negative fixed effect coefficient for `Wmood` graphically.
## Conditional
We can use `visreg()` function to graph conditional predictions as
well. We will make a few modifications compared to the marginal
graph.
1. We specify that we want lines `by = "ID"` to indicate
we should get a separate line for each ID.
2. We specify particular `breaks`, these are the breaks for the `by`
variable and in this case indicate which specific IDs should be
plotted. To see available IDs, we look at `unique(dm$ID)`. I've
chosen a few IDs to show for example. You could pick others.
3. We specify the `re.form` argument. If we want to incorporate all
random effects in the predictions, we should specify the same
formula for random effects as we did in the model. If we want to
only incorporate some but not all random effects, we can specify a
modified formula, we'll look at some examples a bit later.
With those changes we get the following figure. Each ID is separated
into a separate plot panel.
```{r, fig.fullwidth = TRUE}
unique(dm$ID)
visreg(m2, xvar = "Wmood", by = "ID",
breaks = c(5, 6, 11, 24, 26, 37),
re.form = ~ (Wmood | ID),
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Within Mood\n(deviations from own average)",
ylab = "Predicted Stress") +
theme_pubr() +
ggtitle("Conditional Random Intercept and Slope")
```
Note that these are not raw regression lines per person, they are
conditional predictions from the LMM, so they will include shrinkage
of the random intercept and slopes to the overall sample average
intercept and slope. Nevertheless, we can see that there are quite
large differences in the within person association between mood and
stress for say ID 5 compared to ID 37. They are not the same, and
although in the marginal plots section earlier, we saw that on average
there is a negative association between within person mood and stress,
not everyone exactly has a negative predicted slope.
If we did not want to incorporate all random effects in the
predictions, we could modify the `re.form`. This does not change the
LMM we fit, it merely changes which random effects are incorporated
into the predictions from the regression model and then plotted.
For example, we could only include the intercept, which gives us the
following figure:
```{r, fig.fullwidth = TRUE}
visreg(m2, xvar = "Wmood", by = "ID",
breaks = c(5, 6, 11, 24, 26, 37),
re.form = ~ (1 | ID),
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Within Mood\n(deviations from own average)",
ylab = "Predicted Stress") +
theme_pubr() +
ggtitle("Conditional Random Intercept")
```
What this figure showed is each individual IDs own predicted
intercept, but it used the average marginal slope for within person
mood and stress for all IDs.
Finally, if we leave out the `re.form` section all together, we get
the marginal intercept and marginal slope, which is just the same as
our marginal graph but repeated for each ID, a fairly useless figure,
but it highlights what happens if you forget to include `re.form` in
your conditional graph.
```{r, fig.fullwidth = TRUE}
visreg(m2, xvar = "Wmood", by = "ID",
breaks = c(5, 6, 11, 24, 26, 37),
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Within Mood\n(deviations from own average)",
ylab = "Predicted Stress") +
theme_pubr() +
ggtitle("Unconditional (marginal) only")
```
If you don't want to look at specific IDs too carefully but instead
want to get some sense of the overall variation or you have small
number of IDs, you can overlay all the lines together by specifying
`overlay = TRUE`.
```{r}
visreg(m2, xvar = "Wmood", by = "ID",
breaks = c(5, 6, 11, 24, 26, 37),
re.form = ~ (Wmood | ID),
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Within Mood\n(deviations from own average)",
ylab = "Predicted Stress",
overlay = TRUE) +
theme_pubr()
```
To see the overall variation in conditional intercepts and slopes, we
could set the breaks equal to all the IDs by using:
`breaks = unique(dm$ID)`.
```{r}
visreg(m2, xvar = "Wmood", by = "ID",
breaks = unique(dm$ID),
re.form = ~ (Wmood | ID),
partial = FALSE, rug = FALSE,
gg = TRUE,
xlab = "Within Mood\n(deviations from own average)",
ylab = "Predicted Stress",
overlay = TRUE) +
theme_pubr()
```
With so many different people, we probably do not care about or want
the legend, we can turn that off in `ggplot2` by adding no position to