README.Rmd

---
output:
  md_document:
    variant: markdown_github
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  fig.path = "README-",
  message = FALSE,
  warning = FALSE
)
```

# allinone

> All-in-on model based custom predictions for species in Alberta

## Install

```{r eval=FALSE}
if (!require("remotes"))
    install.packages("remotes")
remotes::install_github("ABbiodiversity/allinone")
```

You will need data and coefficients from the [ABbiodiversity/allinone-coefs](https://github.com/ABbiodiversity/allinone-coefs) repository. Clone or download the contents in zip format and extract into a folder (`dir` in the example).

```{r}
dir <- "~/repos/allinone-coefs"
```

If you don't need the spatial raster files from the ABbiodiversity/allinone-coefs, the `ai_doanload_coefs()` function will grab the coefficients for you and you'll be ready to roll:

```{r}
library(allinone)

#ai_dowload_coefs()

ai_load_coefs()
```

## Command line usage

See all the `r nrow(ai_species())` species that we have coefficients for:

```{r}
tab <- ai_species()
str(tab)
```

Here is the number of species by groups:

```{r}
data.frame(table(tab$Group))
```

### Predictor data

We use an example data set that shows you how to organize the data:

```{r}
## example data to see what is needed and how it is formatted
load(system.file("extdata/example.RData", package="allinone"))

## space climate data frame + veg/soil classes
str(spclim)

## veg+HF composition data matrix
colnames(p_veghf)

## soil+HF composition data matrix
colnames(p_soilhf)
```

### Predict for a species

You need to define the species ID (use the `tab` object to find out) and the bootstrap ID (`i`). The bootstrap ID can be between 1 and 100 (only 1 for mammals and habitat elements).

```{r}
## define species and bootstrap id
spp <- "AlderFlycatcher"
i <- 1
```

#### Composition data

You can use composition data, i.e. giving the areas or proportions of different landcover types (columns) in a spatial unit (rows). The corresponding relative abundance values will be returned in a matrix format:

```{r}
## use composition
z1 <- ai_predict(spp, 
  spclim=spclim, 
  veghf=p_veghf, 
  soilhf=p_soilhf,
  i=i)
str(z1)
```

#### Sector effects

Having such a matrix format is ideal when further aggregation is to e performad on the output, e.g. when calculating sector effects. In the example we use only the current landscape here, and show how to use model weights (`wN`) to average the north and south results in the overlap zone:

```{r}
## sector effects
library(mefa4)
lt <- ai_classes()
ltn <- nonDuplicated(lt$north, Label, TRUE)
lts <- nonDuplicated(lt$south, Label, TRUE)

Nn <- groupSums(z1$north, 2, ltn[colnames(z1$north), "Sector"])
Ns <- groupSums(z1$south, 2, lts[colnames(z1$south), "Sector"])
Ns <- cbind(Ns, Forestry=0)
N <- spclim$wN * Nn + (1-spclim$wN) * Ns
colSums(Nn)
colSums(Ns)
colSums(N)
```

#### Classified landcover data

We have classified landcover data when we are making predictions for single polygons (which are aggregated in the composition data case). We can provide `veghf` and `soilhf` as a vector of these classes.

Make sure that the class names are consistent with column names in the example data matrices for the north and south, respectively.

The function now returns a list of vectors:


```{r}
## use land cover classes
z2 <- ai_predict(spp, 
  spclim=spclim, 
  veghf=spclim$veghf, 
  soilhf=spclim$soilhf,
  i=i)
str(z2)

## averaging predictions
avg2 <- spclim$wN * z2$north + (1-spclim$wN) * z2$south
str(avg2)
```

### Prediction uncertainty

We can use the bootstrap distribution to calculate uncertainty (i.e. confidence intervals, CI):

```{r eval=FALSE}
v <- NULL

for (i in 1:20) {
    zz <- ai_predict(spp, 
        spclim=spclim, 
        veghf=spclim$veghf, 
        soilhf=spclim$soilhf,
        i=i)
    v <- cbind(v, spclim$wN * zz$north + (1-spclim$wN) * zz$south)
}

t(apply(v[25:30,], 1, quantile, c(0.5, 0.05, 0.95)))
          50%         5%        95%
## 25 0.31590764 0.26543390 0.35751442
## 26 0.05219450 0.03893699 0.07041478
## 27 0.09576693 0.07895616 0.11126043
## 28 0.04139684 0.03238818 0.05872012
## 29 0.12563301 0.09836205 0.15970154
## 30 0.06754963 0.05384576 0.08798115
```

This gives the median and the 90% CI. This is currently not available for mammals and habitat elements.

### Predict for multiple species

Once the predictors are organized, loop over the species IDs from `tab` and store the results in an organized fashion.

## Spatial data manipulation

The variables in the `spclim` object can be extracted from the raster layers stored in the [ABbiodiversity/allinone-coefs](https://github.com/ABbiodiversity/allinone-coefs) repository. Clone or download the contents in zip format and extract into a folder (`dir` variable used here).

```{r eval=FALSE}
library(sf)
library(raster)

## you got some coordinates (degree long/lat)
XY <- spclim[,c("POINT_X", "POINT_Y")]
head(XY)

## make a sf data frame
xy <- sf::st_as_sf(XY, coords = c("POINT_X", "POINT_Y"))
## set CRS
xy <- st_set_crs(xy, "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0")

## variable names
vn <- c("AHM", "FFP", "MAP", "MAT", "MCMT", "MWMT", 
  "NSR1CentralMixedwood", "NSR1DryMixedwood", "NSR1Foothills", 
  "NSR1Mountain", "NSR1North", "NSR1Parkland", "NSR1Shield", 
  "pAspen", "PeaceRiver", "PET", "pWater", "wN")
## link to the rasters & create a stack
rl <- list()
for (j in vn)
  rl[[j]] <- suppressWarnings(raster(
    file.path(dir, "spatial", paste0(j, ".tif"))))
rl <- stack(rl)

## transform xy to Alberta TM
xy <- st_transform(xy, proj4string(rl))

plot(rl[["pAspen"]], axes=FALSE, box=FALSE)
plot(xy$geometry, add=TRUE, pch=3)

## extract info
e <- extract(rl, xy)

## absolute difference should be tiny
max(colSums(abs(spclim[,vn] - e[,vn])))

## we need long/lat as in XY too
spclim2 <- data.frame(XY, e)

## let's see what we get
str(ai_predict(spp, 
  spclim=spclim2, 
  veghf=spclim$veghf, 
  soilhf=spclim$soilhf,
  i=i))
```

## Batch processing

For batch processing, look into the `inst/docker` folder. The API can be deployed using Docker or Docker Compose.

### Request

Pass a JSON object (Content type: application/json) as the request body with longitude, latitude, veg/soil/HF classes, the species ID and the bootstrap ID (all these arrays in side an object).

```json
{
  "long":[-112.6178,-110.9309,-112.8359,-113.3896,-113.8632,-112.299,
    -110.5805,-111.7806,-112.3541,-113.4144,-110.2329,-114.6634,-111.5857,
    -114.2688,-114.4383,-110.4444,-113.6355,-113.5874,-113.7964,-113.9445,
    -119.9851,-112.7504,-113.5144,-112.7433,-118.0259,-116.2997,-114.9535,
    -111.387,-111.8138,-111.1588],
  "lat":[49.5851,51.3653,49.3464,51.344,
    49.5951,51.3239,50.2184,50.9328,49.5974,49.3468,53.4814,53.8818,53.1679,
    51.1549,51.3357,52.3465,51.7699,53.748,52.7971,52.1959,59.2693,56.3071,
    58.269,56.1541,56.3535,59.8075,53.046,58.2175,58.7773,55.0714],
  "veghf":["Crop","RoughP","Crop","Crop","TameP","GrassHerb","GrassHerb",
    "GrassHerb","Crop","Crop","Crop","Crop","Crop","Rural","Crop","Crop","TameP",
    "Crop","Crop","Crop","TreedFenR","TreedBog4","TreedSwamp","TreedBog5",
    "CCDeciduousR","TreedFen8","CCDeciduous2","PineR","ShrubbyFen","Mixedwood2"],
  "soilhf":["Crop","RoughP","Crop","Crop","TameP","Blowout","ThinBreak",
    "RapidDrain","Crop","Crop","Crop","Crop","Crop","Rural","Crop","Crop",
    "TameP","Crop","Crop","Crop","RapidDrain","RapidDrain","RapidDrain",
    "RapidDrain","RapidDrain","RapidDrain","RoughP","RapidDrain","RapidDrain",
    "RapidDrain"],
  "spp":["AlderFlycatcher"],
  "i":[1]
}
```

### Response

The response is a JSON (application/json) array with the model averaged relative abundance values:

```json
[0,0.0001,0,0.0001,0.0002,0,0,0,0,0.0001,0.0044,0.0105,0.0063,
0.0045,0.005,0.0055,0.0059,0.0128,0.0116,0.0082,0.1826,0.0326,
0.072,0.0269,0.3291,0.0538,0.1034,0.0447,0.1266,0.0681]
```

### Docker workflow

A Docker workflow is ideal because the size of the application with all the required spatial layers and coefficients is relatively small.

#### Building the image

Change directory to where the Dockerfile is, build and push the image:

```bash
cd inst/docker

export REGISTRY="psolymos"
export TAG="allinone:latest"

## build the image
docker build -t $REGISTRY/$TAG .

docker push $REGISTRY/$TAG
```

#### Deploying

The Docker image can be deployed enywhere. The machine needs to have the Docker Engine installed. Pull the Docker image to the machine you want to run the API on:

```bash
docker pull $REGISTRY/$TAG
```

In production, run the image with port mapping (-p host:container) in the background (-d):

```bash
docker run -d --name aiapi -p 8080:8080 $REGISTRY/$TAG
```

Now use curl to make a request and test the API, it can take a single value, or a vector of values for long, lat, veghf, and soilhf. It is best to provide both `veghf` and `soilhf` and let the model averaging do it's job, i.e. use `"UNK"` soil class in the north. The weights are extracted from a raster (wN) too.

```bash
## vector input with model averaging
curl http://localhost:8080/ -d \
  '{"long":[-112.6178,-111.1588],"lat":[49.5851,55.0714],"veghf":["Crop","Mixedwood2"],"soilhf":["Crop","RapidDrain"],"spp":["AlderFlycatcher"],"i":[1]}'
# [0,0.0681]

## single valued input with averaging
curl http://localhost:8080/ -d \
  '{"long":[-111.1588],"lat":[55.0714],"veghf":["Mixedwood2"],"soilhf":["RapidDrain"],"spp":["AlderFlycatcher"],"i":[2]}'
# [0.0605]
```

See the logs:

```bash
docker logs aiapi
# [INFO] This is the All-in-one API
# [INFO] Packages loaded
# [INFO] Raster stack loaded
# [INFO] Loading coefs
# [INFO] 2021-07-24 06:36:14 Starting server: http://0.0.0.0:8080
# [INFO] Making predictions for species AlderFlycatcher (birds) i=1
```

Kill and remove the container:

```bash
docker kill aiapi && docker rm aiapi
```

It is advised to add the Docker background process to systemd or use Docker Compose in production. Docker Compose can better handle container restarts and can run multiple replicas (simple round robin load balancing, which requires a proxy server, like Nginx or Caddy).

See the `inst/docker/docker-compose.yml` file for details. Deploy with `docker-compose up -d` which will publish the app on port 8080.

When the containers are already running, and the configuration or images have changed after the container's creation, `docker-compose up` picks up the changes. Tear down with `docker-compose down`.

This Docker image is not very lean because the parent image is huge. Install only the necessary dependencies with a smaller parent image if size is a concern.