-
Notifications
You must be signed in to change notification settings - Fork 0
/
Complexity.jmd
558 lines (398 loc) · 25.5 KB
/
Complexity.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
# Complexity in fitting Linear Mixed Models
Linear mixed-effects models are increasingly used for the analysis of data from experiments in fields like psychology where several subjects are each exposed to each of several different items. In addition to a response, which here will be assumed to be on a continuous scale, such as a *response time*, a number of experimental conditions are systematically varied during the experiment. In the language of statistical experimental design the latter variables are called *experimental factors* whereas factors like `Subject` and `Item` are *blocking factors*. That is, these are known sources of variation that usually are not of interest by themselves but still should be accounted for when looking for systematic variation in the response.
# An example data set
The data from experiment 2 in *[Kronmueller and Barr (2007)](https://doi.org/10.1016/j.jml.2006.05.002)* are available in `.rds` (R Data Set) format in the file `kb07_exp2_rt.rds` in the [github repository](https://github.com/dalejbarr/kronmueller-barr-2007) provided by Dale Barr. Files in this format can be loaded using the *[RData](https://github.com/JuliaData/RData.jl)*[ package](https://github.com/JuliaData/RData.jl) for **[Julia](https://julialang.org)**.
## Loading the data in Julia
Attach the packages to be used.
```julia; label="a217767d-beb7-4f13-a5d4-dd4ad4f52822"
using BenchmarkTools
using CSV, DataFrames, GLM # data input and manipulation - lm and glm
using Gadfly # grammar of graphics in Julia
using MixedModels
using RCall, RData # communicate with R, load .rda, .rds, .RData files
using Statistics # basic stat summaries: mean, median, ...
```
```julia; label="825d5660-e406-4e40-8ac5-42a22a9acf8b"
kb07 = load("../kronmueller-barr-2007/kb07_exp2_rt.rds");
first(kb07, 20)
```
```julia; label="b68a6fb0-c7e6-4783-9021-961bc0dd685b"
describe(kb07)
```
The blocking factors are `subj` and `item` with 56 and 32 levels respectively. There are three experimental factors each with two levels: `spkr` (speaker), `prec` (precedence), and `load` (cognitive load). The response time, `rt_raw`, is measured in milliseconds. A few very large values, e.g. the maximum which is nearly 16 seconds, which could skew the results, are truncated in the `rt_trunc` column. In addition, three erroneously recorded responses (values of 300 ms.) have been dropped, resulting in a slight imbalance in the data.
A table of mean responses and standard deviations for combinations of the experimental factors, as shown in Table 3 of the paper and on the data repository can be reproduced as
```julia; label="3b5d4560-3e9e-40eb-9379-bb6cd3304013"
cellmeans = by(kb07, [:spkr, :prec, :load],
meanRT = :rt_trunc => mean, sdRT = :rt_trunc => std, n = :rt_trunc => length,
semean = :rt_trunc => x -> std(x)/sqrt(length(x))
)
```
The data are slightly imbalanced because 3 unrealistically low response times were removed.
An interaction plot of the cell means shows that the main effect of `prec` is the dominant effect. (Need to fix this plot to show levels of prec in different colors/symbols.)
```julia;label="264e5571-2c4e-4e71-b665-21e2bf833194";fig_ext=".svg"
plot(cellmeans, x=:load, y=:meanRT, color=:spkr,
Geom.point,
Guide.xlabel("Cognitive load"),
Guide.ylabel("Mean response time (ms)"))
```
```julia;label=ggplot2plot
RCall.ijulia_setdevice(MIME("image/svg+xml"))
R"""
require(ggplot2, quietly=TRUE)
ggplot($cellmeans, aes(x=load, y=meanRT, color=spkr,group=spkr:prec, linetype=prec)) + geom_point() + geom_line()
"""
```
## Loading the data in R
```julia;id="3b61426f-22c9-4b0d-93cf-7d315305dbc0";
R"""
kb07 <- readRDS("../kronmueller-barr-2007/kb07_exp2_rt.rds")
str(kb07)
"""
```
The positions of the missing observations can be determined from
```julia; label="c969b50b-1f56-4b55-bac8-062bbc03f8bb"
R"(subjitemtbl <- xtabs(~ subj + item, kb07))"
```
```julia; label="7df4f4c7-af71-4def-9243-d2ec66fa2c87"
R"table(subjitemtbl)"
```
All of the experimental factors vary within subject and within item, as can be verified by examining the frequency tables for the experimental and grouping factors. For example
```julia;id="4dc7c2eb-6317-4179-b6fb-25d686808762"
R"xtabs(~ spkr + subj, kb07)"
```
# Formulating a simple model
## Installing the required R package
```julia; label="8f3302d5-9b53-41d8-8967-c1ac5d18ac33"
R"require(lme4, quietly=TRUE)";
```
## Formula and model for simple, scalar random effects
A simple model with main-effects for each of the experimental factors and with random effects for subject and for item is described by the formula `rt_trunc ~ 1 + spkr + prec + load + (1|subj) + (1|item)`. In the *MixedModels* package, which uses the formula specifications from the *[StatsModels](https://github.com/JuliaStats/StatsModels.jl)*[ package](https://github.com/JuliaStats/StatsModels.jl), a formula must be wrapped in a call to the `@formula` macro.
```julia;label="b75a7b72-e736-4297-841c-29f56b69ee75"
f1 = @formula(rt_trunc ~ 1 + spkr + prec + load + (1|subj) + (1|item));
```
In R a formula can stand alone.
```julia; label="23342a51-9904-4043-bfe3-64a333d5b569"
R"f1 <- rt_trunc ~ 1 + spkr + prec + load + (1|subj) + (1|item)";
```
```julia; label="b5e23c28-0079-4636-a842-43372799a628"
m1 = fit(MixedModel, f1, kb07)
```
The first fit of such a model can take several seconds because the Just-In-Time (JIT) compiler must analyze and compile a considerable amount of code. (All of the code in the *MixedModels* package is Julia code.) Subsequent fits of this or similar models are much faster.
The comparable model fit with lme4 in R is
```julia; label="55d58b5c-98b5-415f-932d-e44cea162714"
R"""
m1 <- lmer(f1, kb07, REML=FALSE)
summary(m1, corr=FALSE)
"""
```
The estimated coefficients for the fixed-effects are different from those in the Julia fit because the reference level for `load` is different. Conversion to a factor (CategoricalArray) is done slightly differently.
## Assigning contrasts
For two-level experimental factors, such as `prec`, `spkr` and `load`, in a (nearly) balanced design such as this it is an advantage to use a $\pm1$ encoding of the levels in the model matrix. This is known as "assigning contrasts" in both R and Julia.
In R one assigns a contrasts specification to the factor itself. The "Helmert" contrast type produces a $\pm1$ encoding with two levels.
```julia; label="46cd8179-0e68-4433-a33e-1f6d13270589"
R"""
contrasts(kb07$spkr) <- contr.helmert(2)
contrasts(kb07$prec) <- contr.helmert(2)
contrasts(kb07$load) <- contr.helmert(2)
system.time(m1 <- lmer(f1, kb07, REML=FALSE))
"""
```
```julia; label="93a8563c-a0d3-4364-81cd-1214abf59504"
R"summary(m1, corr=FALSE)"
```
The change in coding results in estimated coefficients (and standard errors) for the experimental factors being half the previous estimate, because of the $\pm1$ coding. Furthermore, the (`Intercept`) estimate now is now a "typical" response over all the conditions. It would be the sample mean if the experiment were completely balanced.
```julia; label="7103d3b3-42a9-4308-badf-74833561f38b"
R"mean(kb07$rt_trunc)"
```
In Julia the contrasts are specified in a dictionary that is passed as an argument to the model constructor.
```julia; label="0f7391b1-290a-4039-8756-4b56defd5afd"
const HC = HelmertCoding();
const contrasts = Dict(:spkr => HC, :prec => HC, :load=> HC);
```
```julia; label="ac02350d-fa34-4551-8d53-236b9e2fb22a"
m1 = fit(MixedModel, f1, kb07, contrasts=contrasts)
```
The model matrix for the fixed effects is
```julia; label="1eb9a6f4-9874-489a-a87a-ee092ab1d873"
Int.(m1.X) # convert to integers for cleaner printing
```
An advantage of the $\pm1$ encoding is that $X'X$ is nearly a multiple of the identity matrix. Furthermore, any interactions of these two-level factors will also have an $\pm1$ encoding.
```julia; label="4956ff06-f8c9-4e3c-a812-386c5901425e"
Int.(m1.X'm1.X)
```
A benchmark of this fit shows that it is quite fast - on the order of a few milliseconds.
```julia; label="662a5f1d-d9f3-479d-a19e-0b1301dcc44a"
m1bmk = @benchmark fit(MixedModel, $f1, $kb07, contrasts = $contrasts)
```
# Model construction versus model optimization
The `m1` object is created in the call to the constructor function, `LinearMixedModel`, then the parameters are optimized or fit in the call to `fit!`. Usually the process of fitting a model will take longer than creating the numerical representation but, for simple models like this, the creation time can be a significant portion of the overall running time.
```julia; label="57e003b2-8097-413e-8038-5efb71a78fac"
bm1construct = @benchmark LinearMixedModel($f1, $kb07, contrasts=$contrasts)
```
```julia; label="4b8855ef-d1fc-4808-8849-3039ede2229a"
bm1fit = @benchmark fit!($m1)
```
# Factors affecting the time to optimize the parameters
The optimization process is summarized in the `optsum` property of the model.
```julia; label="383cf486-0d79-4916-be9c-7e38bb6a34cf"
m1.optsum
```
For this model there are two parameters to be optimized because the objective function, negative twice the log-likelihood, can be *profiled* with respect to all the other parameters. (See section 3 of *[Bates et al. 2015](https://www.jstatsoft.org/article/view/v067i01)* for details.) Both these parameters must be non-negative (i.e. both have a lower bound of zero) and both have an initial value of one. After 28 function evaluations an optimum is declared according to the function value tolerance, either $10^{-8}$ in absolute terms or $10^{-12}$ relative to the current value.
The optimization itself has a certain amount of setup and summary time but the majority of the time is spent in the evaluation of the objective - the profiled log-likelihood.
Each function evaluation is of the form
```julia; label="acb04b8e-5d64-4214-b8f4-d3a4ba347304"
θ1 = m1.θ;
m1objective = @benchmark objective(updateL!(setθ!($m1, $θ1)))
```
On this machine 28 function evaluations, each taking around 50 microseconds, gives the total function evaluation time of at least 1.4 ms., which is practically all of the time to fit the model.
The majority of the time for the function evaluation for this model is in the call to `updateL!`
```julia; label="226ff3f2-cb95-47e0-9a99-8006f40e9c8d"
m1update = @benchmark updateL!($m1)
```
This is an operation that updates the lower Cholesky factor (often written as `L`) of a blocked sparse matrix.
There are 4 rows and columns of blocks. The first row and column correspond to the random effects for subject, the second to the random effects for item, the third to the fixed-effects parameters and the fourth to the response. Their sizes and types are
```julia; label="058438f9-5c52-45e7-a089-cbf79da9a142"
describeblocks(m1)
```
There are two lower-triangular blocked matrices in the model representation: `A` with fixed entries determined by the model and data, and `L` which is updated for each evaluation of the objective function. The type of the `A` block is given before the size and the type of the `L` block is after the size. For scalar random effects, generated by a random-effects term like `(1|G)`, the (1,1) block is always diagonal in both `A` and `L`. Its size is the number of levels of the grouping factor, `G`.
Because subject and item are crossed, the (2,1) block of `A` is dense, as is the (2,1) block of `L`. The (2,2) block of `A` is diagonal because, like the (1,1) block, it is generated from a scalar random effects term. However, the (2,2) block of `L` ends up being dense as a result of "fill-in" in the sparse Cholesky factorization. All the blocks associated with the fixed-effects or the response are stored as dense matrices but their dimensions are (relatively) small.
# Increasing the model complexity
In general, adding more terms to a model will increase the time required to fit the model. However, there is a big difference between adding fixed-effects terms and adding complexity to the random effects.
## Increasing the complexity of the fixed effects
Adding the two- and three-factor interactions to the fixed-effects terms increases the time required to fit the model.
```julia; label="e9700f1c-70fb-4948-8852-ae3c414e3350"
f2 = @formula(rt_trunc ~ 1 + spkr*prec*load + (1|subj) + (1|item));
```
```julia; label="27f9620e-a1c2-4129-a822-e7ad29b59a5d"
m2 = fit(MixedModel, f2, kb07, contrasts=contrasts)
```
(Notice that none of the interactions are statistically significant.)
```julia; label="9ed9385d-c394-44ad-b75d-493478a4d2d4"
m2bmk = @benchmark fit(MixedModel, $f2, $kb07,contrasts=$contrasts)
```
In this case, the increase in fitting time is more because the number of function evaluations to determine the optimum increases than because of increased evaluation time for the objective function.
```julia; label="6f9a7d92-5b47-44e8-b173-e06dcdb1fe6e"
m2.optsum.feval
```
```julia; label="fb57bcef-00e9-4e5c-baaa-d73e65a4a99a"
θ2 = m2.θ;
m2objective = @benchmark objective(updateL!(setθ!($m2, $θ2)))
```
```julia; label="86500356-654e-4040-8be4-0cc8be8b69a5"
R"""
f2 <- rt_trunc ~ 1 + spkr*prec*load + (1|subj) + (1|item)
system.time(m2 <- lmer(f2, kb07, REML=FALSE))
"""
```
```julia; label="fcf82605-4553-4ea1-9d73-4c6139104345"
R"summary(m2, corr=FALSE)"
```
## Increasing complexity of the random effects
Another way in which the model can be extended is to switch to vector-valued random effects. Sometimes this is described as having *random slopes*, so that a subject not only brings their own shift in the typical response but also their own shift in the change due to, say, `Load` versus `No Load`. Instead of just one, scalar, change associated with each subject there is an entire vector of changes in the coefficients.
A model with a random slopes for each of the experimental factors for both subject and item is specified as
```julia; label="486d81b1-7773-41f9-a4f7-78b24cc9e941"
f3 = @formula(
rt_trunc ~ 1 + spkr*prec*load + (1+spkr+prec+load|subj) +
(1+spkr+prec+load|item)
);
```
```julia; label="50476044-96c5-4d0d-ae94-af11952e43ad"
m3 = fit(MixedModel, f3, kb07, contrasts=contrasts)
```
```julia; label="04bcef1b-663c-4c48-a770-c30b4826d608"
m3bmk = @benchmark fit(MixedModel, $f3, $kb07, contrasts=$contrasts)
```
```julia; label="79ed4770-8e37-42f2-b352-dab04afc60c8"
R"""
f3 <-
rt_trunc ~ 1 + spkr*prec*load + (1+spkr+prec+load|subj) +
(1+spkr+prec+load|item)
system.time(m3 <- lmer(f3, kb07, REML=FALSE,
control=lmerControl(calc.derivs=FALSE)))
"""
```
```julia; label="6861176a-0345-43cd-89ad-80b444951844"
R"summary(m3, corr=FALSE)"
```
There are several interesting aspects of this model fit.
First, the number of parameters optimized directly has increased substantially. What was previously a 2-dimensional optimization has now become 20 dimensional
```julia; label="787a1964-28f4-417f-a094-9ced14148b8e"
m3.optsum
```
and the number of function evaluations to convergence has gone from under 40 to over 600.
The time required for each function evaluation has also increased considerably,
```julia; label="2ee69c0d-cd7a-490f-89b2-c2552ac3292d"
θ3 = m3.θ;
m3objective = @benchmark objective(updateL!(setθ!($m3, $θ3)))
```
resulting in much longer times for model fitting - about three-quarters of a second in Julia and over 6 seconds in R.
Notice that the estimates of the fixed-effects coefficients and their standard errors have not changed substantially except for the standard error of `prec`, which is also the largest effect.
The parameters in the optimization for this model can be arranged as two lower-triangular 4 by 4 matrices,
```julia; label="7cc502a0-7696-4996-b2db-9e6701f33514"
m3.λ[1]
```
```julia; label="b1cf832d-4e1b-451a-a132-c69294a20c02"
m3.λ[2]
```
which generate the covariance matrices for the random effects. The cumulative proportion of the variance in the principal components of these covariance matrices, available as
```julia; label="15d7379c-57fc-4d52-8c18-1481fda27a3a"
m3.rePCA
```
show that 93% of the variation in the random effects for subject is in the first principal direction and 99% in the first two principal directions. The random effects for item also have 99% of the variation in the first two principal directions.
Furthermore the estimates of the standard deviations of the "slope" random effects are much smaller than the those of the intercept random effects except for the `prec` coefficient random effect for `item`, which suggests that the model could be reduced to `rt_trunc ~ 1 + spkr*prec*load + (1|subj) + (1+prec|item)` or even `rt_trunc ~ 1 + spkr+prec+load + (1|subj) + (1+prec|item)`.
```julia; label="8a69369e-7364-488e-8c0b-14316e582c52"
f4 = @formula(rt_trunc ~ 1 + spkr+prec+load + (1|subj) + (1+prec|item));
```
```julia; label="41a58fb3-3c29-4427-a3dc-36e4dd74e029"
m4 = fit(MixedModel, f4, kb07, contrasts=contrasts)
```
```julia; label="4bb28a01-bd5a-40b6-a409-24e7afc420a1"
m4bmk = @benchmark fit(MixedModel, $f4, $kb07, contrasts=$contrasts)
```
```julia; label="49e90bc7-cf24-4469-8d5f-e255d4907367"
m4.optsum.feval
```
```julia; label="32340782-4866-481c-b69d-8e847e88f94f"
R"""
f4 <- rt_trunc ~ 1 + spkr+prec+load + (1+prec|item) + (1|subj)
system.time(m4 <- lmer(f4, kb07, REML=FALSE, control=lmerControl(calc.derivs=FALSE)))
"""
```
```julia; label="04a92ba7-9e3a-41d4-ae6c-cab893059ab2"
R"summary(m4, corr=FALSE)"
```
```julia; label="c3714a60-9b60-4d1d-b986-2d4dcef0dd12"
θ4 = m4.θ;
m4objective = @benchmark objective(updateL!(setθ!($m4, $θ4)))
```
These two model fits can be compared with one of the information criteria, `AIC` or `BIC`, for which "smaller is better". They both indicate a preference for the smaller model, `m4`.
These criteria are values of the objective, negative twice the log-likelihood at convergence, plus a penalty that depends on the number of parameters being estimated.
Because model `m4` is a special case of model `m3`, a likelihood ratio test could also be used. The alternative model, `m3`, will always produce an objective that is less than or equal to that from the null model, `m4`. The difference in these value is similar to the change in the residual sum of squares in a linear model fit. This objective would be called the *deviance* if there was a way of defining a saturated model but it is not clear what this should be. However, if there was a way to define a deviance then the difference in the deviances would be the same as the differences in these objectives, which is
```julia; label="d488b9dc-29f8-48be-8ea9-f55a6cf34c6f"
diff(objective.([m3, m4]))
```
This difference is compared to a $\chi^2$ distribution with degrees of freedom corresponding to the difference in the number of parameters
```julia; label="9516ad28-2fe3-4949-940c-3c414a820c2a"
diff(dof.([m4, m3]))
```
producing a p-value of about 14%.
```julia; label="d3119341-f167-43e5-bd5f-7a77908e09be"
R"pchisq(26.7108, 20, lower.tail=FALSE)"
```
## Going maximal
The "maximal" model as proposed by *[Barr et al., 2013](https://www.sciencedirect.com/science/article/pii/S0749596X12001180)* would include all possible interactions of experimental and grouping factors.
```julia; label="2a3d112e-0e21-40c9-a11e-7421fa2d4d8e"
f5 = @formula(rt_trunc ~ 1 + spkr*prec*load + (1+spkr*prec*load|subj) + (1+spkr*prec*load|item));
```
```julia; label="6ed09133-6f69-484b-aba5-0b7df5c17e22"
m5 = fit(MixedModel, f5, kb07, contrasts=contrasts)
```
```julia; label="99678241-e57e-44e8-a00c-5fbec0307b21"
m5bmk = @benchmark fit(MixedModel, $f5, $kb07, contrasts=$contrasts)
```
As is common in models with high-dimensional vector-valued random effects, the dominant portion of the variation is in the first few principal components
```julia; label="4a336cba-52f2-4a5f-9575-5732a34b5af"
m5.rePCA
```
For both the subjects and the items practically all the variation of these 8-dimensional random effects is in the first 4 principal components.
The dimension of $\theta$, the parameters in the optimization, increases considerably
```julia; label="62b06fb4-11e2-444c-8146-ade3a5bbfd14"
θ5 = m5.θ;
length(θ5)
```
Of these 72 parameters, 36 are estimated from variation between items, yet there are only 32 items.
Because the dimension of the optimization problem has gotten much larger the number of function evaluations to convergence increases correspondingly.
```julia; label="9d4d4eb2-795a-4744-87f6-cf21322a32ee"
m5.optsum.feval
```
Also, each function evaluation requires more time
```julia; label="8acdc87b-8821-4631-816e-8b5f7bc572f0"
m5objective = @benchmark objective(updateL!(setθ!($m5, $θ5)))
```
almost all of which is for the call to `updateL!`.
```julia; label="472ac8c0-f230-4cc4-987f-9bd8ae69bb37"
@benchmark updateL!($m5)
```
This model takes a long time to fit using lme4.
```julia; label="5de55af1-4d1c-406e-9165-41a34fc31dc9"
R"""
f5 <- rt_trunc ~ 1 + spkr*prec*load + (1+spkr*prec*load|subj) +
(1+spkr*prec*load|item)
system.time(m5 <- lmer(f5, kb07, REML=FALSE, control=lmerControl(calc.derivs=FALSE)))
"""
```
```julia; label="60bb1d90-ffda-48b1-a357-f5f304c3229d"
R"summary(m5, corr=FALSE)"
```
## A model of intermediate complexity
To provide more granularity in the plots of execution time shown below, fit one more model without random effects for the third-order interaction of the experimental factors.
```julia; label="576cb7e7-6174-44ee-8ee5-e1d04e91b3b6"
f6 = @formula(rt_trunc ~ 1 + spkr*prec*load +
(1+spkr+prec+load+spkr&prec+spkr&load+prec&load|subj) +
(1+spkr+prec+load+spkr&prec+spkr&load+prec&load|item));
```
```julia; label="d2a0b78f-ad03-47e3-96ba-2c8cf968474b"
m6 = fit(MixedModel, f6, kb07, contrasts=contrasts)
```
```julia; label="094bdb87-c6a5-4429-b98f-0653a4a9f99f"
m6bmk = @benchmark fit(MixedModel, $f6, $kb07, contrasts=$contrasts)
```
```julia; label="a00a2dc7-7f49-4086-bc57-0faf83a8b220"
θ6 = m6.θ;
length(θ6)
```
```julia; label="7a76c506-c7f7-42bd-81d1-d5480d0bcee5"
m6objective = @benchmark objective(updateL!(setθ!($m6, $θ6)))
```
```julia; label="28c77c0c-8b06-467b-a080-95aaa7f2853b"
@benchmark updateL!($m6)
```
# Summary of goodness of fit
Apply the goodness of fit measures to `m1` to `m6` creating a data frame
```julia; label="2c25ebdb-19e7-47e2-8b20-27c5cfb0e633"
const mods = [m1, m2, m3, m4, m5, m6];
gofsumry = DataFrame(dof=dof.(mods), deviance=deviance.(mods),
AIC = aic.(mods), AICc = aicc.(mods), BIC = bic.(mods))
```
Here `dof` or degrees of freedom is the total number of parameters estimated in the model and `deviance` is simply negative twice the log-likelihood at convergence, without a correction for a saturated model. All the information criteria are on a scale of "smaller is better" and all would select model 4 as "best".
```julia; label="b7b0245b-5bc7-43d5-88b3-2b05c794ca46"
map(nm -> argmin(getproperty(gofsumry,nm)), (AIC=:AIC, AICc=:AICc, BIC=:BIC))
```
The benchmark times are recorded in nanoseconds. The median time in seconds to fit each model is evaluated as
```julia; label="bd5b6f70-3a07-4e76-9215-d0bd119e8235"
fits = [median(b.times) for b in [m1bmk, m2bmk, m3bmk, m4bmk, m5bmk, m6bmk]] ./ 10^9;
```
```julia; label="7c67a088-7186-46f3-9117-1a524f718a86"
nfe(m) = length(coef(m));
nre1(m) = MixedModels.vsize(first(m.reterms));
nlv1(m) = MixedModels.nlevs(first(m.reterms));
nre2(m) = MixedModels.vsize(last(m.reterms));
nlv2(m) = MixedModels.nlevs(last(m.reterms));
nθ(m) = sum(MixedModels.nθ, m.reterms);
nev = [m.optsum.feval for m in mods];
dimsumry = DataFrame(p = nfe.(mods), q1 = nre1.(mods), n1 = nlv1.(mods),
q2 = nre2.(mods), n2 = nlv2.(mods), nθ = nθ.(mods),
npar = dof.(mods), nev = nev, fitsec = fits)
CSV.write("dimsumry.csv",dimsumry)
```
In this table, `p` is the dimension of the fixed-effects vector, `q1` is the dimension of the random-effects for each of the `n1` levels of the first grouping factor while `q2` and `n2` are similar for the second grouping factor. `nθ` is the dimension of the parameter vector and `npar` is the total number of parameters being estimated, and is equal to `nθ + p + 1`.
`nev` is the number of function evaluations to convergence. Because this number will depend on the number of parameters in the model and several other factors such as the setting of the convergence criteria, only its magnitude should be considered reproducible. As described above,` fitsec` is the median time, in seconds, to fit the model.
# Relationships between model complexity and fitting time
As would be expected, the time required to fit a model increases with the complexity of the model. In particular, the number of function evaluations to convergence increases with the number of parameters being optimized.
```julia; label="091f6473-04a0-4731-8e37-250e341b61a8"
plot(dimsumry, x=:nθ, y=:nev, Guide.xlabel("# of parameters in the optimization"),
Guide.ylabel("Function evaluations to convergence"))
```
The relationship is more-or-less linear, based on this small sample.
Finally, consider the time to fit the model versus the number of parameters in the optimization. On a log-log scale these form a reasonably straight line.
```julia; label="5d5f2226-9bc3-48b8-a872-c787ee864033"
plot(dimsumry, x=:nθ, y=:fitsec, Guide.xlabel("# of parameters in the optimization"),
Guide.ylabel("Median benchmark time (sec) to fit model"), Scale.y_log10,
Scale.x_log10)
```
with a slope of about 2.2
```julia; label="25ccfc8a-fbe6-4b55-a36b-1487e84c83b8"
coef(lm(@formula(log(fitsec) ~ 1 + log(nθ)), dimsumry))
```
or close to a quadratic relationship between time to fit and the number of parameters in the optimization. However, the number of parameters to optimize is itself quadratic in the dimension of the random effects vector in each random-effects term. Thus, increasing the dimension of the vector-valued random effects can cause a considerable increase in the amount of time required to fit the model.
Furthermore, the estimated covariance matrix for high-dimensional vector-valued random effects is singular in `m3`, `m5` and `m6`, as is often the case.