-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
343 lines (252 loc) · 10.4 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
---
output:
md_document:
variant: markdown_github
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
fig.path = "README-",
message = FALSE,
warning = FALSE
)
```
# allinone
> All-in-on model based custom predictions for species in Alberta
## Install
```{r eval=FALSE}
if (!require("remotes"))
install.packages("remotes")
remotes::install_github("ABbiodiversity/allinone")
```
You will need data and coefficients from the [ABbiodiversity/allinone-coefs](https://github.com/ABbiodiversity/allinone-coefs) repository. Clone or download the contents in zip format and extract into a folder (`dir` in the example).
```{r}
dir <- "~/repos/allinone-coefs"
```
If you don't need the spatial raster files from the ABbiodiversity/allinone-coefs, the `ai_doanload_coefs()` function will grab the coefficients for you and you'll be ready to roll:
```{r}
library(allinone)
#ai_dowload_coefs()
ai_load_coefs()
```
## Command line usage
See all the `r nrow(ai_species())` species that we have coefficients for:
```{r}
tab <- ai_species()
str(tab)
```
Here is the number of species by groups:
```{r}
data.frame(table(tab$Group))
```
### Predictor data
We use an example data set that shows you how to organize the data:
```{r}
## example data to see what is needed and how it is formatted
load(system.file("extdata/example.RData", package="allinone"))
## space climate data frame + veg/soil classes
str(spclim)
## veg+HF composition data matrix
colnames(p_veghf)
## soil+HF composition data matrix
colnames(p_soilhf)
```
### Predict for a species
You need to define the species ID (use the `tab` object to find out) and the bootstrap ID (`i`). The bootstrap ID can be between 1 and 100 (only 1 for mammals and habitat elements).
```{r}
## define species and bootstrap id
spp <- "AlderFlycatcher"
i <- 1
```
#### Composition data
You can use composition data, i.e. giving the areas or proportions of different landcover types (columns) in a spatial unit (rows). The corresponding relative abundance values will be returned in a matrix format:
```{r}
## use composition
z1 <- ai_predict(spp,
spclim=spclim,
veghf=p_veghf,
soilhf=p_soilhf,
i=i)
str(z1)
```
#### Sector effects
Having such a matrix format is ideal when further aggregation is to e performad on the output, e.g. when calculating sector effects. In the example we use only the current landscape here, and show how to use model weights (`wN`) to average the north and south results in the overlap zone:
```{r}
## sector effects
library(mefa4)
lt <- ai_classes()
ltn <- nonDuplicated(lt$north, Label, TRUE)
lts <- nonDuplicated(lt$south, Label, TRUE)
Nn <- groupSums(z1$north, 2, ltn[colnames(z1$north), "Sector"])
Ns <- groupSums(z1$south, 2, lts[colnames(z1$south), "Sector"])
Ns <- cbind(Ns, Forestry=0)
N <- spclim$wN * Nn + (1-spclim$wN) * Ns
colSums(Nn)
colSums(Ns)
colSums(N)
```
#### Classified landcover data
We have classified landcover data when we are making predictions for single polygons (which are aggregated in the composition data case). We can provide `veghf` and `soilhf` as a vector of these classes.
Make sure that the class names are consistent with column names in the example data matrices for the north and south, respectively.
The function now returns a list of vectors:
```{r}
## use land cover classes
z2 <- ai_predict(spp,
spclim=spclim,
veghf=spclim$veghf,
soilhf=spclim$soilhf,
i=i)
str(z2)
## averaging predictions
avg2 <- spclim$wN * z2$north + (1-spclim$wN) * z2$south
str(avg2)
```
### Prediction uncertainty
We can use the bootstrap distribution to calculate uncertainty (i.e. confidence intervals, CI):
```{r eval=FALSE}
v <- NULL
for (i in 1:20) {
zz <- ai_predict(spp,
spclim=spclim,
veghf=spclim$veghf,
soilhf=spclim$soilhf,
i=i)
v <- cbind(v, spclim$wN * zz$north + (1-spclim$wN) * zz$south)
}
t(apply(v[25:30,], 1, quantile, c(0.5, 0.05, 0.95)))
50% 5% 95%
## 25 0.31590764 0.26543390 0.35751442
## 26 0.05219450 0.03893699 0.07041478
## 27 0.09576693 0.07895616 0.11126043
## 28 0.04139684 0.03238818 0.05872012
## 29 0.12563301 0.09836205 0.15970154
## 30 0.06754963 0.05384576 0.08798115
```
This gives the median and the 90% CI. This is currently not available for mammals and habitat elements.
### Predict for multiple species
Once the predictors are organized, loop over the species IDs from `tab` and store the results in an organized fashion.
## Spatial data manipulation
The variables in the `spclim` object can be extracted from the raster layers stored in the [ABbiodiversity/allinone-coefs](https://github.com/ABbiodiversity/allinone-coefs) repository. Clone or download the contents in zip format and extract into a folder (`dir` variable used here).
```{r eval=FALSE}
library(sf)
library(raster)
## you got some coordinates (degree long/lat)
XY <- spclim[,c("POINT_X", "POINT_Y")]
head(XY)
## make a sf data frame
xy <- sf::st_as_sf(XY, coords = c("POINT_X", "POINT_Y"))
## set CRS
xy <- st_set_crs(xy, "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")
## variable names
vn <- c("AHM", "FFP", "MAP", "MAT", "MCMT", "MWMT",
"NSR1CentralMixedwood", "NSR1DryMixedwood", "NSR1Foothills",
"NSR1Mountain", "NSR1North", "NSR1Parkland", "NSR1Shield",
"pAspen", "PeaceRiver", "PET", "pWater", "wN")
## link to the rasters & create a stack
rl <- list()
for (j in vn)
rl[[j]] <- suppressWarnings(raster(
file.path(dir, "spatial", paste0(j, ".tif"))))
rl <- stack(rl)
## transform xy to Alberta TM
xy <- st_transform(xy, proj4string(rl))
plot(rl[["pAspen"]], axes=FALSE, box=FALSE)
plot(xy$geometry, add=TRUE, pch=3)
## extract info
e <- extract(rl, xy)
## absolute difference should be tiny
max(colSums(abs(spclim[,vn] - e[,vn])))
## we need long/lat as in XY too
spclim2 <- data.frame(XY, e)
## let's see what we get
str(ai_predict(spp,
spclim=spclim2,
veghf=spclim$veghf,
soilhf=spclim$soilhf,
i=i))
```
## Batch processing
For batch processing, look into the `inst/docker` folder. The API can be deployed using Docker or Docker Compose.
### Request
Pass a JSON object (Content type: application/json) as the request body with longitude, latitude, veg/soil/HF classes, the species ID and the bootstrap ID (all these arrays in side an object).
```json
{
"long":[-112.6178,-110.9309,-112.8359,-113.3896,-113.8632,-112.299,
-110.5805,-111.7806,-112.3541,-113.4144,-110.2329,-114.6634,-111.5857,
-114.2688,-114.4383,-110.4444,-113.6355,-113.5874,-113.7964,-113.9445,
-119.9851,-112.7504,-113.5144,-112.7433,-118.0259,-116.2997,-114.9535,
-111.387,-111.8138,-111.1588],
"lat":[49.5851,51.3653,49.3464,51.344,
49.5951,51.3239,50.2184,50.9328,49.5974,49.3468,53.4814,53.8818,53.1679,
51.1549,51.3357,52.3465,51.7699,53.748,52.7971,52.1959,59.2693,56.3071,
58.269,56.1541,56.3535,59.8075,53.046,58.2175,58.7773,55.0714],
"veghf":["Crop","RoughP","Crop","Crop","TameP","GrassHerb","GrassHerb",
"GrassHerb","Crop","Crop","Crop","Crop","Crop","Rural","Crop","Crop","TameP",
"Crop","Crop","Crop","TreedFenR","TreedBog4","TreedSwamp","TreedBog5",
"CCDeciduousR","TreedFen8","CCDeciduous2","PineR","ShrubbyFen","Mixedwood2"],
"soilhf":["Crop","RoughP","Crop","Crop","TameP","Blowout","ThinBreak",
"RapidDrain","Crop","Crop","Crop","Crop","Crop","Rural","Crop","Crop",
"TameP","Crop","Crop","Crop","RapidDrain","RapidDrain","RapidDrain",
"RapidDrain","RapidDrain","RapidDrain","RoughP","RapidDrain","RapidDrain",
"RapidDrain"],
"spp":["AlderFlycatcher"],
"i":[1]
}
```
### Response
The response is a JSON (application/json) array with the model averaged relative abundance values:
```json
[0,0.0001,0,0.0001,0.0002,0,0,0,0,0.0001,0.0044,0.0105,0.0063,
0.0045,0.005,0.0055,0.0059,0.0128,0.0116,0.0082,0.1826,0.0326,
0.072,0.0269,0.3291,0.0538,0.1034,0.0447,0.1266,0.0681]
```
### Docker workflow
A Docker workflow is ideal because the size of the application with all the required spatial layers and coefficients is relatively small.
#### Building the image
Change directory to where the Dockerfile is, build and push the image:
```bash
cd inst/docker
export REGISTRY="psolymos"
export TAG="allinone:latest"
## build the image
docker build -t $REGISTRY/$TAG .
docker push $REGISTRY/$TAG
```
#### Deploying
The Docker image can be deployed enywhere. The machine needs to have the Docker Engine installed. Pull the Docker image to the machine you want to run the API on:
```bash
docker pull $REGISTRY/$TAG
```
In production, run the image with port mapping (-p host:container) in the background (-d):
```bash
docker run -d --name aiapi -p 8080:8080 $REGISTRY/$TAG
```
Now use curl to make a request and test the API, it can take a single value, or a vector of values for long, lat, veghf, and soilhf. It is best to provide both `veghf` and `soilhf` and let the model averaging do it's job, i.e. use `"UNK"` soil class in the north. The weights are extracted from a raster (wN) too.
```bash
## vector input with model averaging
curl http://localhost:8080/ -d \
'{"long":[-112.6178,-111.1588],"lat":[49.5851,55.0714],"veghf":["Crop","Mixedwood2"],"soilhf":["Crop","RapidDrain"],"spp":["AlderFlycatcher"],"i":[1]}'
# [0,0.0681]
## single valued input with averaging
curl http://localhost:8080/ -d \
'{"long":[-111.1588],"lat":[55.0714],"veghf":["Mixedwood2"],"soilhf":["RapidDrain"],"spp":["AlderFlycatcher"],"i":[2]}'
# [0.0605]
```
See the logs:
```bash
docker logs aiapi
# [INFO] This is the All-in-one API
# [INFO] Packages loaded
# [INFO] Raster stack loaded
# [INFO] Loading coefs
# [INFO] 2021-07-24 06:36:14 Starting server: http://0.0.0.0:8080
# [INFO] Making predictions for species AlderFlycatcher (birds) i=1
```
Kill and remove the container:
```bash
docker kill aiapi && docker rm aiapi
```
It is advised to add the Docker background process to systemd or use Docker Compose in production. Docker Compose can better handle container restarts and can run multiple replicas (simple round robin load balancing, which requires a proxy server, like Nginx or Caddy).
See the `inst/docker/docker-compose.yml` file for details. Deploy with `docker-compose up -d` which will publish the app on port 8080.
When the containers are already running, and the configuration or images have changed after the container's creation, `docker-compose up` picks up the changes. Tear down with `docker-compose down`.
This Docker image is not very lean because the parent image is huge. Install only the necessary dependencies with a smaller parent image if size is a concern.