further.Rmd

# Going Further

<div style = 'text-align: center'>
<i class="fas fa-rocket fa-5x" style = 'color:#990024;'></i>
</div>

<br>

This section covers topics that are generally beyond the scope of what would be covered in this introductory document, but may be given their own section over time.


## Other Distributions

As noted in the GLMM section, we are not held to use only GLM family distributions regarding the target variable.  Unfortunately, the tools you have available to do so will quickly diminish.  However, a couple packages could help in this regard with simpler random effects structures.  For example, the <span class="pack">mgcv</span> and <span class="pack">glmmTMB</span> packages allow one access to a variety of target distributions, such as student t, negative binomial, beta, zero-inflated Poisson and more.   If you're willing to go Bayesian, you'll have even more options with <span class="pack">rstanarm</span> and <span class="pack">brms</span>.  I've personally had success with ordinal, beta, truncated normal and more with <span class="pack">brms</span> in particular.

Note also that nothing says that the random effects must come from a normal distribution either.  You probably are going to need some notably strong theoretical reasons for trying something else, but it does come up for some situations.  You'll almost certainly need to use a specialized approach, as most mixed model tools do not offer that functionality out of the box.


## Other Contexts

Here is a list of some other contexts in which you can find random effects models, or extensions of mixed models into other situations.


##### Spatial models

It is often the case we want to take into account the geography of a situation. <span class="emph">Spatial random effects</span> allow one to do so in the continuous case, e.g. with latitude and longitude coordinates, as well as discrete, as with political district. Typical random effects approaches, e.g. with a state random effect, would not correlate state effects.  One might capture geography incidentally, or via cluster level variables such as 'region' indicator.  However, if you're interested in a spatial random effect, use something that can account for it specifically.


##### Survival models

Random effects models in the survival context are typically referred to as <span class="emph">frailty models</span>.  As a starting point, the <span class="pack">survival</span> package that comes with base R can do such models.


##### Item response theory

<span class="emph">Item response theory</span> models are often used with scholastic and other testing data, but are far more general than that, because they are in fact a special type of random effects model. Some IRT models can be explicitly estimated as a mixed model, e.g. with <span class="pack">lme4</span>. See Boeck et al. (2011) The Estimation of Item Response Models with the <span class="func">lmer</span> Function from the <span class="pack">lme4</span> Package in R. I also have some brief demonstration [here](http://m-clark.github.io/sem/item-response-theory.html). Paul Bürkner, the author of <span class="pack">brms</span> also has a nice [overview of IRT models](https://arxiv.org/abs/1905.09501) in the Bayesian context.


##### Multi-membership models

Sometimes observations may belong to more than one cluster of some grouping variable. For example, in a longitudinal setting some individuals may move to other cities or schools, staying in one place longer than another.  Depending on the specifics of the modeling setting, you may need to take a <span class="emph">multi-membership</span> approach to deal with this.


##### Phylogenetic models

In biology, models make take observations that are of the same species. While one can use species as an additional source of variance as in the manner we have demonstrated, the species are not independent as they may come from the same phylogenetic tree/branch.  Bayesian packages are available to do such models (e.g. <span class="pack">MCMCglmm</span> and <span class="pack">brms</span>).


##### Adjacency structures

Similar to spatial and phylogenetic models, the dependency among the groups/clusters themselves can be described in terms of a <span class="emph">markov random field/undirected graph</span>.  In simpler terms, one may think of a situation  where a binary adjacency matrix would denote connections among the nodes/cluster levels.  For example, the clustering may be due to individuals, which themselves might be friends with one another.  One way to deal with such a situation would be similar to spatial models for discrete random units.


##### Gaussian processes

<span class="emph">Gaussian processes</span> are another way to handle dependency in the data, especially over time or space. Some spatial models are in fact a special case of these. One can think of gaussian processes as adding a 'continuous category' random effect.  Consider the effect of age in many models, could that not also be a source of dependency regarding some outcomes? In *Statistical Rethinking*, McElreath has a nice chapter 'Adventures in Covariance' that gets into this a bit, and Gaussian process are widely discussed among the [Stan developers](https://mc-stan.org/docs/2_19/stan-users-guide/gaussian-process-regression.html).  


##### Surveys & Mr. P

Clustering is often a result of sampling design.  Often one would use a survey design approach for proper inference in such situations, and you can use mixed models with survey weights.  However, <span class="emph">multi-level regression with post-stratification</span>, or Mr. P, is an alternative mixed model approach that can potentially lead to better results in the same setting without weighting. One might even be able to [generalize from a sample of Xbox players](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/04/forecasting-with-nonrepresentative-polls.pdf) to the national level!


##### Post-hoc comparisons and multiple testing 

This is not an issue I'm personally all that concerned with, but a lot of folks seem to be.  The 'problem' is that one has a lot of p-values for some model or across a set of models, and is worried about spurious claims of significance.  If one were truly worried about it, they'd be doing different models that would incorporate some sort of regularization, rather than attempting some p-value hack afterwards.  Didn't we talk about [regularization][Comparison to many regressions] somewhere?  Yep, you can use a mixed model approach to compare groups instead, specifying the grouping as a random effect. See [Gelman](http://www.stat.columbia.edu/~gelman/research/published/multiple2f.pdf) for details.


##### Growth mixture models

Often people will assume *latent* clusters of individuals within the data, with model effects differing by these latent groups also. Sometimes called <span class="emph">latent trajectory models</span>, these are conceptually adding a cluster analysis to the mixed model setting for longitudinal data.  While common in structural equation modeling, packages like <span class="pack">flexmix</span> can keep you in the standard model setting, which would be preferable.

##### Embeddings and Neural Nets

Much is made about using neural nets to create word, sentence or other *embeddings*, essentially taking a string of text and converting it to numeric form.  But think about what we've been doing here. We've taken a categorical/string variable like student id, and converted to numeric form (a random effect). Not so complicated a notion really.

Beyond that, one can essentially conduct a mixed model as one would a neural net for standard tabular data, because a neural net incorporates regularization on all parameters, and in particular those group/cluster effects. So if your data was properly coded, you could set up your neural net to produce something very similar to what we have in the matrix form of a random effect model.  This is similar to what mgcv actually does for random effects estimation. By default though, the information extracted from the groups would be done so in a much different way, and would be able to pick up on inter-group correlations as well (much like a spatial random effects model).


## Nonlinear Mixed Effects Models

Earlier we used the <span class="pack">nlme</span> package.  The acronym stands for **nonlinear** mixed effects models.  In this case, we are assuming a specific functional form for a predictor.  A common example is a logistic growth curve[^lgc], and one could use a function like <span class="func">SSlogis</span>.

In other cases we do not specify the functional form, and take a more non-parametric approach.  Here's where the powerful <span class="pack">mgcv</span> package comes in, and there are few if any that have its capabilities for <span class="emph">generalized additive models</span> combined with standard random effects approaches. Depending on the approach you take, you can even get <span class="pack">nlme</span> or <span class="pack">lme4</span> output along with the GAM results.  Highly recommended.

I would also recommend <span class="pack" style = "">brms</span>, which has specific functionality for nonlinear models in general, including IRT models, as well as additive models in the vein of <span class="pack" style = "">mgcv</span>, as it uses the same constructor functions that come with that package. It might be your best bet whether you have a specific nonlinear functional form or not.


## Connections

The incorporation of spatial random effects, additive models, and mixed models altogether under one modeling roof is sometimes referred to as <span class="emph">structured additive regression</span> models, or STARs.  The <span class="pack">mgcv</span> package is at least one place where you can pull this off.  But the notion of a *random effect* is a broad one, and we might think of many such similar effects to add to a model.  

As mentioned previously, thinking of parameters as random, instead of fixed, essentially puts one in the Bayesian mindset.  Moving to that world for your modeling will open up many doors, including expanding your mixed model options.


[^lgc]: Not to be confused with latent growth curve models or logistic regression.