This repository has been archived by the owner on Sep 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathkwdyz11.qmd
408 lines (328 loc) · 13 KB
/
kwdyz11.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
---
title: "RePsychLing Kliegl et al. (2010)"
jupyter: julia-1.8
---
## Background
We take the `kwdyz11.arrow` dataset [@Kliegl2010] from an experiment looking at three effects of visual cueing under four different cue-target relations (CTRs).
Two horizontal rectangles are displayed above and below a central fixation point or they displayed in vertical orientation to the left and right of the fixation point.
Subjects react to the onset of a small visual target occuring at one of the four ends of the two rectangles.
The target is cued validly on 70% of trials by a brief flash of the corner of the rectangle at which it appears; it is cued invalidly at the three other locations 10% of the trials each.
We specify three contrasts for the four-level factor CTR that are derived from spatial, object-based, and attractor-like features of attention.
They map onto sequential differences between appropriately ordered factor levels. At the level of fixed effects, there is the noteworthy result, that the attraction effect was estimated at 2 ms, that is clearly not significant.
Nevertheless, there was a highly reliable variance component (VC) estimated for this effect.
Moreover, the reliable individual differences in the attraction effect were negatively correlated with those in the spatial effect.
Unfortunately, a few years after the publication, we determined that the reported LMM is actually singular and that the singularity is linked to a theoretically critical correlation parameter (CP) between the spatial effect and the attraction effect.
Fortunately, there is also a larger dataset `kkl15.arrow` from a replication and extension of this study [@Kliegl2015], analyzed with `kkl15.jl` notebook.
The critical CP (along with other fixed effects and CPs) was replicated in this study.
A more comprehensive analysis was reported in the parsimonious mixed-model paper [@Bates2015].
Data and R scripts are also available in [R-package RePsychLing](https://github.com/dmbates/RePsychLing/tree/master/data/).
In this and the complementary `kkl15.jl` scripts, we provide some corresponding analyses with _MixedModels.jl_.
## Packages
```{julia}
#| code-fold: true
using Arrow
using AlgebraOfGraphics
using CairoMakie
using CategoricalArrays
using Chain
using DataFrames
using DataFrameMacros
using MixedModels
using MixedModelsMakie
using ProgressMeter
using Random
using StatsBase
using Statistics
using AlgebraOfGraphics: density
using AlgebraOfGraphics: boxplot
CairoMakie.activate!(; type="svg")
ProgressMeter.ijulia_behavior(:clear);
```
## Read data, compute and plot densities and means
```{julia}
#| code-fold: true
dat = @chain "./data/kwdyz11.arrow" begin
Arrow.Table
DataFrame
select(
:subj =>
(s -> categorical(string.('S', lpad.(s, 2, '0')))) => :Subj,
:tar => categorical => :CTR,
:rt,
:rt => (x -> log.(x)) => :lrt,
)
end
levels!(dat.CTR, ["val", "sod", "dos", "dod"])
describe(dat)
```
We recommend to code the levels/units of random factor / grouping variable not as a number, but as a string starting with a letter and of the same length for all levels/units.
We also recommend to sort levels of factors into a meaningful order, that is overwrite the default alphabetic ordering.
This is also a good place to choose alternative names for variables in the context of the present analysis.
The LMM analysis is based on log-transformed reaction times `lrt`, indicated by a _boxcox()_ check of model residuals.
With the exception of diagnostic plots of model residuals, the analysis of untransformed reaction times did not lead to different results and exhibited the same problems of model identification [see @Kliegl2010].
Comparative density plots of all response times by cue-target relation, @fig-comparativedensity, show the times for valid cues to be faster than for the other conditions.
```{julia}
#| code-fold: true
#| fig-cap: "Comparative density plots of log reaction time for different cue-target relations."
#| label: fig-comparativedensity
draw(
data(dat) *
mapping(
:lrt => "log(Reaction time [ms])";
color=:CTR =>
renamer("val" => "valid cue", "sod" => "some obj/diff pos", "dos" => "diff obj/same pos", "dod" => "diff obj/diff pos") => "Cue-target relation",
) *
density(),
)
```
An alternative visualization without overlap of the conditions can be accomplished with ridge plots.
**To be done**
For the next set of plots we average subjects' data within the four experimental conditions.
This table could be used as input for a repeated-measures ANOVA.
```{julia}
dat_subj = combine(
groupby(dat, [:Subj, :CTR]),
:rt => length => :n,
:rt => mean => :rt_m,
:lrt => mean => :lrt_m,
)
```
```{julia}
#| code-fold: true
#| label: fig-logrtboxplots
#| fig-cap: "Comparative boxplots of log response time by cue-target relation."
boxplot(
dat_subj.CTR.refs,
dat_subj.lrt_m;
orientation=:horizontal,
show_notch=true,
axis=(;
yticks=(
1:4,
[
"valid cue",
"same obj/diff pos",
"diff obj/same pos",
"diff obj/diff pos",
],
),
),
figure=(; resolution=(800, 300)),
)
```
Mean of log reaction times for four cue-target relations. Targets appeared at (a) the cued position (valid) in a rectangle, (b) in the same rectangle cue, but at its other end, (c) on the second rectangle, but at a corresponding horizontal/vertical physical distance, or (d) at the other end of the second rectangle, that is $\sqrt{2}$ of horizontal/vertical distance diagonally across from the cue, that is also at larger physical distance compared to (c).
A better alternative to the boxplot is a dotplot. It also displays subjects' condition means.
**To be done**
## Linear mixed model
```{julia}
contrasts = Dict(
:CTR => SeqDiffCoding(; levels=["val", "sod", "dos", "dod"]),
:Subj => Grouping(),
)
m1 = let
form = @formula(lrt ~ 1 + CTR + (1 + CTR | Subj))
fit(MixedModel, form, dat; contrasts)
end
```
```{julia}
VarCorr(m1)
```
```{julia}
issingular(m1)
```
LMM `m1` is not fully supported by the data; it is overparameterized.
This is also visible in the PCA: only three, not four PCS are needed to account for all the variance and covariance in the random-effect structure.
The problem is the +.93 CP for spatial `sod` and attraction `dod` effects.
```{julia}
first(MixedModels.PCA(m1))
```
## Diagnostic plots of LMM residuals
Do model residuals meet LMM assumptions? Classic plots are
- Residual over fitted
- Quantiles of model residuals over theoretical quantiles of normal distribution
### Residual-over-fitted plot
The slant in residuals show a lower and upper boundary of reaction times, that is we have have too few short and too few long residuals. Not ideal, but at least width of the residual band looks similar across the fitted values, that is there is no evidence for heteroskedasticity.
```{julia}
#| code-fold: true
#| fig-cap: "Residuals versus the fitted values for model m1 of the log response time."
#| label: fig-resvsfittedm1
CairoMakie.activate!(; type="png")
set_aog_theme!()
draw(
data((; f=fitted(m1), r=residuals(m1))) *
mapping(:f => "Fitted values", :r => "Residual from model m1") *
visual(Scatter);
)
```
With many observations the scatterplot is not that informative.
Contour plots or heatmaps may be an alternative.
```{julia}
#| code-fold: true
#| fig-cap: Heatmap of residuals versus fitted values for model m1
#| label: fig-resvsfittedhm
CairoMakie.activate!(; type="png")
draw(
data((; f=fitted(m1), r=residuals(m1))) *
mapping(
:f => "Fitted log response time", :r => "Residual from model m1"
) *
density();
)
```
### Q-Q plot
The plot of quantiles of model residuals over corresponding quantiles of the normal distribution should yield a straight line along the main diagonal.
```{julia}
qqnorm(residuals(m1); qqline=:none)
```
### Observed and theoretical normal distribution
The violation of expectation is again due to the fact that the distribution of residuals is much narrower than expected from a normal distribution, as shown in @fig-standresdens.
Overall, it does not look too bad.
```{julia}
#| code-fold: true
#| fig-cap: "Kernel density plot of the standardized residuals from model m1 compared to a Gaussian"
#| label: fig-standresdens
let
n = nrow(dat)
dat_rz = DataFrame(;
value=vcat(residuals(m1) ./ std(residuals(m1)), randn(n)),
curve=vcat(fill.("residual", n), fill.("normal", n)),
)
draw(
data(dat_rz) *
mapping(:value => "Standardized residuals"; color=:curve) *
density(; bandwidth=0.1);
)
end
```
## Conditional modes
Now we move on to visualizations that are based on model parameters and subjects' data, that is "predictions" of the LMM for subject's GM and experimental effects. Three important plots are
- Overlay
- Caterpillar
- Shrinkage
### Overlay
The first plot overlays shrinkage-corrected conditional modes of the random effects with within-subject-based and pooled GMs and experimental effects.
**To be done**
### Caterpillar plot
The caterpillar plot, @fig-m1caterpillar, also reveals the high correlation between spatial `sod` and attraction `dod` effects.
```{julia}
#| code-fold: true
#| fig-cap: "Prediction intervals on the random effects for Subj in model m1"
#| label: fig-m1caterpillar
caterpillar!(
Figure(; resolution=(800, 1000)), ranefinfo(m1, :Subj); orderby=2
)
```
### Shrinkage plot
@fig-m1shrinkage provides more evidence for a problem with the visualization of the spatial `sod` and attraction `dod` CP.
The corresponding panel illustrates an *implosion* of conditional modes.
```{julia}
#| code-fold: true
#| fig-cap: "Shrinkage plot of the conditional means of the random effects for model m1"
#| label: fig-m1shrinkage
shrinkageplot!(Figure(; resolution=(1000, 1000)), m1)
```
## Parametric bootstrap
Here we
- generate a bootstrap sample
- compute shortest covergage intervals for the LMM parameters
- plot densities of bootstrapped parameter estimates for residual, fixed effects, variance components, and correlation parameters
### Generate a bootstrap sample
We generate 2500 samples for the 15 model parameters (4 fixed effect, 4 VCs, 6 CPs, and 1 residual).
```{julia}
#| code-fold: true
Random.seed!(1234321)
samp = parametricbootstrap(2500, m1; hide_progress=true)
dat2 = DataFrame(samp.allpars)
first(dat2, 10)
```
```{julia}
nrow(dat2) # 2500 estimates for each of 15 model parameters
```
### Shortest coverage interval
The upper limit of the interval for the critical CP `CTR: sod, CTR: dod` is hitting the upper wall of a perfect correlation.
This is evidence of singularity.
The other intervals do not exhibit such pathologies; they appear to be ok.
```{julia}
#| code-fold: true
DataFrame(shortestcovint(samp))
```
### Comparative density plots of bootstrapped parameter estimates
#### Residual
```{julia}
#| code-fold: true
#| label: fig-residstddevdens
draw(
data(@subset(dat2, :type == "σ", ismissing(:names))) *
mapping(:value => "Residual standard deviation") *
density();
)
```
#### Fixed effects (w/o GM)
The shortest coverage interval for the `GM` ranges from 376 to 404 ms.
To keep the plot range small we do not inlcude its density here.
```{julia}
#| code-fold: true
#| fig-cap: "Comparative density plots of the fixed-effects parameters for model m1"
#| label: fig-m1fixedeffdens
labels = [
"CTR: sod" => "spatial effect",
"CTR: dos" => "object effect",
"CTR: dod" => "attraction effect",
"(Intercept)" => "grand mean",
]
draw(
data(@subset(dat2, :type == "β" && :names ≠ "(Intercept)")) *
mapping(
:value => "Experimental effect size [ms]";
color=:names => renamer(labels) => "Experimental effects",
) *
density();
)
```
The densitiies correspond nicely with the shortest coverage intervals.
#### Variance components (VCs)
```{julia}
#| code-fold: true
#| fig-cap: "Comparative density plots of the variance components for model m1"
#| label: fig-m1varcompdens
draw(
data(@subset(dat2, :type == "σ" && :group == "Subj")) *
mapping(
:value => "Standard deviations [ms]";
color=:names => renamer(labels) => "Variance components",
) *
density();
)
```
The VC are all very nicely defined.
#### Correlation parameters (CPs)
```{julia}
#| code-fold: true
#| label: fig-m1corrdens
#| fig-cap: "Comparative density plots of the correlation parameters for model m1"
let
labels = [
"(Intercept), CTR: sod" => "GM, spatial",
"(Intercept), CTR: dos" => "GM, object",
"CTR: sod, CTR: dos" => "spatial, object",
"(Intercept), CTR: dod" => "GM, attraction",
"CTR: sod, CTR: dod" => "spatial, attraction",
"CTR: dos, CTR: dod" => "object, attraction",
]
draw(
data(@subset(dat2, :type == "ρ")) *
mapping(
:value => "Correlation";
color=:names => renamer(labels) => "Correlation parameters",
) *
density();
)
end
```
Two of the CPs stand out positively.
First, the correlation between GM and the spatial effect is well defined.
Second, as discussed throughout this script, the CP between spatial and attraction effect is close to the 1.0 border and clearly not well defined.
Therefore, this CP will be replicated with a larger sample in script `kkl15.jl` [@Kliegl2015].
# References
::: {#refs}
:::