-
Notifications
You must be signed in to change notification settings - Fork 65
/
09-spatial-autocorrelation.Rmd
130 lines (88 loc) · 5.4 KB
/
09-spatial-autocorrelation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# Spatial autocorrelation
# Goals
- Learn to identify and deal with spatial autocorrelation through correlation structures in nlme
- Very briefly introduce the idea of GAMs for dealing with spatial data
# Loading the data
We will work with an example data set from the nlme package. This data set is also used as an example in the accompanying textbook by Pinheiro and Bates.
```{r}
library(nlme)
library(tidyverse)
d <- nlme::Wheat2
glimpse(d)
```
```{r}
ggplot(d, aes(longitude, latitude, size = yield, colour = variety)) +
geom_point()
```
This data set represents yield from various varieties of wheat collected at different latitudes and longitudes.
# Fitting a model with `nlme::gls`
So far we have used the `lme` function from the nlme package to fit linear mixed effect models. This package also has the function `gls`, which lets you use the features of nlme when fitting linear models without random effects.
```{r}
m1 <- gls(yield ~ variety - 1, data = d)
m1
```
We can extract and plot the residuals (by colour) spatially ourselves.
```{r}
d$res <- as.numeric(residuals(m1))
ggplot(d, aes(longitude, latitude, colour = res)) +
geom_point(size = 5) + scale_color_gradient2()
```
We can clearly see that the residuals have spatial clumping patterns. Why is this a problem?
The nlme package comes with a built-in function for plotting a semivariogram. This represents half the average squared difference between residuals at increasing distances from each other. So this is inversely related to correlation. Small values mean that the residuals are very similar to each other. This is a common plot in spatial statistics and there are plenty of references on the topic you can find online.
```{r}
plot(Variogram(m1, form = ~ latitude + longitude, data = d))
```
If we look at the semivariogram, we can take a guess at good initial starting values for a range value and nugget effect. For the `nlme::corSpher()` correlation structure, the nugget represents the intercept value and the range represents the distance at which the semivariogram reaches 1.
By looking at the above semivariogram we can eyeball what those values should be. We will give the correlation function starting values close to what we
expect.
```{r}
m2 <- update(m1,
corr = corSpher(c(30, 0.2), form = ~ latitude + longitude, nugget = TRUE))
m2
```
In fact, this is an example where if we don't give the correlation structure decent starting values it comes up with the wrong answer:
```{r}
m3 <- update(m1,
corr = corSpher(form = ~ latitude + longitude, nugget = TRUE))
m3
```
Let's try plotting the residuals spatially and making another semivariogram. Note again how it is critical that we use `type = "normalized"` in order to incorporate the correlation structure into the residual calculations.
```{r}
d$res2 <- as.numeric(residuals(m2, type = "normalized"))
ggplot(d, aes(longitude, latitude, colour = res2)) + geom_point(size = 5) +
scale_color_gradient2()
plot(Variogram(m2, form = ~ latitude + longitude, resType = "normalized", data = d))
```
That looks much better.
We can also compare the models with AIC:
```{r}
bbmle::AICtab(m1, m2)
```
So we are estimating 2 extra parameters as part of the spatial correlation structure, but the AIC strongly supports adding this model complexity.
We have used just one of a number of spatial correlation structures available in the nlme package. In fact, in their textbook (Mixed-Effects Models in S and S-PLUS), Pinheiro and Bates illustrate that the `nlme::corRatio()` structure provides a slightly better fit to this data set. If you are considering a spatial correlation structure, take a look at your various options: `corExp, corGaus, corLin, corRatio, corSpher`, linked to from `?nlme::corSpatial`
# A quick version with GAMs
A disadvantage of modeling spatial correlation this way is that there is no easy way to extract a spatial surface of predictions. An alternative and common way of dealing with spatial autocorrelation is to model the spatial process as a two-dimensional smooth term with a GAM. GAMs are beyond the scope of this workshop, but it's important to know that they exist.
GAMs are just like GLMs except that predictors can be allowed to follow a smooth wiggly line. And the degree of wiggliness can be determined objectively within the fitting algorithm.
As a demonstration, we will fit a version of this model with a GAM. The main package for fitting these models in R is mgcv. We will refit the `gls` model with maximum likelihood so that we can compare it to the GAM.
```{r}
library(mgcv)
m1_ml <- gls(yield ~ variety - 1, data = d, method = "ML")
m_gam1 <- gam(yield ~ variety - 1, data = d) # the same
bbmle::AICtab(m_gam1, m1_ml)
m_gam2 <- gam(yield ~ variety - 1 + te(latitude, longitude), data = d)
bbmle::AICtab(m_gam1, m_gam2)
```
We can now make it a plot of the spatial predictions:
```{r}
plot(m_gam2, pers = TRUE)
```
And inspect the residuals spatially as before:
```{r}
d$res_gam2 <- as.numeric(residuals(m_gam2))
ggplot(d, aes(longitude, latitude, colour = res_gam2)) +
geom_point(size = 5) + scale_color_gradient2()
```
# Additional information
Pinheiro, J. C., and D. M. Bates. 2000. Mixed-Effects Models in S and S-PLUS. Springer-Verlag, New York, NY, USA.
Zuur, A., E. Ieno, N. Walker, A. Saveliev, and G. Smith. 2009. Mixed effects models and extensions in ecology with R. Springer.
Wood, S. N. 2006. Generalized Additive Models: An Introduction with R. Chapman and Hall, New York.