gpdfa_paper.Rmd

---
title: Smoothed dynamic factor analysis for identifying trends in multivariate time series
author: Eric J. Ward$^1$, Sean C. Anderson$^2$, Mary E. Hunsicker$^3$, Michael A. Litzow$^4$ 
output:
  pdf_document:
    fig_caption: yes
    includes:
      in_header: options.sty
    latex_engine: xelatex
  word_document: default
  html_document: default
  bookdown::pdf_document2:
csl: mee.csl
bibliography: gpdfa.bib
---

Running head: Smooth DFA for multivariate trend analysis

```{r setup, include=FALSE, echo=FALSE}
knitr::opts_chunk$set(echo = TRUE, tidy=FALSE, tidy.opts=list(width.cutoff=60), warning = FALSE, message = FALSE)
library(bayesdfa)
library(knitr)
library(tidyverse)
library(ggsidekick)
library(ggrepel)
library(viridis)
library(gridExtra)
library(rstan)
library(ggrepel)
library(cowplot)
```

$^1$Conservation Biology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle WA, 98112, USA\
email: eric.ward@noaa.gov  
$^2$Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC, V6T 6N7, Canada  
$^3$Fish Ecology Division, Northwest Fisheries Science Center, National Marine Fisheries Service,  
National Oceanic and Atmospheric Administration, 2725 Montlake Blvd E, Seattle WA, 98112, USA\
$^4$Shellfish Assessment Program, Alaska Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 301 Research Court. Kodiak, AK, 99615, USA\
  
<!-- Open Research Statement: Code to replicate these analyses is in our repository (https://github.com/fate-ewi/gpdfa) and as an R package on CRAN, (https://cran.r-project.org/web/packages/bayesdfa/index.html). All data analyzed are previously published, but also included in our repository.    -->

\break 
  
## Abstract
\setlength\parindent{24pt}

1. Ecological processes are rarely directly observable, and inference often relies on estimating hidden or latent processes. State space models have become widely used for this task, because of their ability to simultaneously estimate multiple sources of variation (natural variability and variance attributed to observation errors). For multivariate time series, a second aim is often dimension reduction, or estimating a number of latent processes that is smaller than the number of observed time series. Dynamic factor analysis (DFA) has been used for performing time series dimension reduction, where latent processes are modeled as random walks. Whereas this may be suitable for some situations, random walks may be too flexible for other cases.

2. Here, we introduce a new class of models, where latent processes are modeled as smooth functions (basis splines, penalized splines, or Gaussian process models). We implement these models in our bayesdfa R package, which uses the rstan package for fitting. After evaluating model performance with simulated data, we apply conventional models and our smooth trend models to two long-term datasets from the west coast of the United States: (1) a 35-year dataset of pelagic juvenile rockfishes and (2) a 39-year dataset of fisheries catches.

3. Our simulations demonstrate that models matching the underlying trend smoothness make better out-of-sample predictions, but this advantage diminishes with increasing levels of observation error. For both case studies, the best smooth trend models had higher predictive accuracy, and yielded more precise predictions, compared to the conventional approach.

4. The smooth trend factor models introduced here offer a new approach for state space dimension reduction of multivariate time series. These flexible Bayesian models may be particularly useful for data that are clumped in time, for data with high signal-to-noise ratios, and generally for data where the underlying trend is assumed to be relatively smooth. 
<!-- DFA represents a powerful and underutilized tool -->
<!--These Bayesian models are flexible and extendable to other classes of splines including hierarchical. -->

## Key words
Dynamic factor analysis, smooth spline, B-spline, Gaussian process, Bayesian modeling, Stan

\break 

## Introduction

Ecological data can be characterized by multiple sources of variability, including stochastic natural variation, and errors associated with data collection (observation, sampling, and measurement errors). Disentangling these sources of variability is often challenging, and necessitates the use of statistical methods, such as state space models. These approaches have become ubiquitous in ecology, particularly for time-series data [@auger-methe2020]---in part because these models allow researchers to make inferences about ecological processes that are not directly observable. Applications of these models include estimating population change over time [@clark2004], movement dynamics [@patterson2008], and understanding spatiotemporal variation [@anderson2019b].

Estimating multiple sources of variation in state space models is numerically complex, and can be constrained explicitly or implicitly in ecological models via model assumptions. For example, discrete-time state-space models of population trajectories generally assume latent population size $n_t$ at time $t$ can be approximated by an autoregressive process in log-space, $x_{t+1}=f(x_{t})+\epsilon_t$, where $f()$ represents some function, $x_{t}=\log(n_{t})$, and $\epsilon_t$ are normally distributed process deviations representing stochastic variability of the natural system [@dennis2006]. Without the autoregressive constraint, the variance of the stochastic noise $\epsilon_t$ is not estimable in the presence of an observation or data model. Separating these sources of variability is critical to generating unbiased estimates of population trends or density dependence [@knape2008a]. If inference is not dependent on parameters of ecological interest (e.g., growth rates, density dependence), a wide range of alternative semi-parametric approaches exist that can be used to model the trajectory of $x_t$, including generalized additive models [GAMs, @wood2011] and Gaussian process models [@roberts2013]. Because these models are not autoregressive with discrete time steps, the flexibility or 'wiggliness' of the model can be adjusted as part of the model fitting. In addition to their flexibility, these semi-parametric models may be better suited for situations when data are patchily distributed in time or unequally spaced, making estimation of process and observation errors more difficult.

Challenges posed by univariate time-series models also apply to multivariate models, with the additional complexity that the number of latent time series may be variable, $k=1, \ldots ,m$, where $m$ is the number of time series observed. At one extreme, $k=m$, and each time series corresponds to a unique latent process. Motivating questions in analyzing these models include estimating correlated latent processes or trends, or estimating effects of environmental covariates [@hovel2016]. At the other extreme, $k=1$, where each time series represents repeated measurements of the same process, with optional offsets included for each time series (e.g., offsets allowing for differing detectability). Applications focused on estimating a single trend from multivariate data include the development of ecological indicators. Models with intermediate numbers of latent states $1<k<m$ require mapping of time series to latent trends. These may be specified a priori [@ward2010c] or estimated within the modeling framework using dimension reduction techniques. 

Many statistical approaches have been proposed in recent years for clustering or estimating common signals in multivariate time series [@liao2005]. Examples include clustering based on similarities among time series features [@sarda-espinosa2019], identifying common patterns in the frequency domain [@holan2018], and clustering based on neural networks [@cherif2011]. Application of these methods to ecological data has been limited, in part because many of these approaches identify clusters from raw data and ignore observation error. An alternative approach that has been used in ecology to map collections of multivariate time series to latent processes, while accounting for observation error, is dynamic factor analysis (DFA) [@zuur2003b; @zuur2003c]. DFA is an extension of factor analysis for time series data, and estimates a small number of unobserved processes ('trends'), that can describe observed data. Mapping time series to trends is done via estimated factor loadings---these allow each time series to be modeled as a mixture of estimated latent trends, rather than assigning each time series to a single trend.

To date, applications of DFA models in ecology and other fields have assumed that underlying trends are modeled as a random walk, $x_{t+1}=x_{t}+\epsilon_t$. The objective of this paper is to introduce a new class of DFA models based on smooth functions, instead of autoregressive processes. Recent work has highlighted the application of hierarchical GAMs for multiple data sources [@pedersen2019]. These approaches are flexible and likely to provide similar inference to DFA for a single latent trend; however, these methods have not been extended to include more than one process. We illustrate two options for modeling smooth functions for latent trends: splines ('B-splines' or penalized 'P-splines') and Gaussian process models. We compare both approaches to conventional autoregressive DFA models using simulated data and, as real world applications, using two marine fish datasets from the west coast of the USA. All data and code for replicating our analysis are available on Github (*https://github.com/fate-ewi/gpdfa*) and Zenodo (*http://doi.org/10.5281/zenodo.4891992*), and in our existing R package 'bayesdfa' [@ward2019].

## Methods

### Dynamic Factor Model  

The basic DFA model can be written as a multivariate state space model, consisting of a latent process model and observation or data model. In its simplest form, the process model is expressed as a random walk, $\textbf{x}_{t+1}=\textbf{x}_{t}+\textbf{w}_{t}$, where $\textbf{w}_{t}\sim \mathrm{MVN}(\textbf{0},\textbf{Q})$. For identifiability constraints, the covariance matrix $\textbf{Q}$ is generally constrained to be an identity matrix [@zuur2003b; @holmes2012b]. Additional features may be incorporated into the process model, including autoregressive or moving-average coefficients, covariates, or deviations that are more extreme than that of the normal distribution [@ward2019]. The observation model in a DFA is expressed as a linear combination of trends $\textbf{x}_{t}$ and a matrix of loadings coefficients $\textbf{Z}$, $\textbf{y}_{t} = \textbf{Z}\textbf{x}_{t}+\textbf{B}\textbf{d}_{t}+\textbf{e}_{t}$. In addition to the trends and loadings, time-varying covariates $\textbf{d}_t$ may be optionally included and linked to the observations through estimated coefficients $\textbf{B}$. The vector $\textbf{e}_t$ represents residual observation error, which is typically modeled as a diagonal matrix, $\textbf{e}_{t}\sim \mathrm{MVN}(0,\textbf{R})$, although off-diagonal elements may be estimated [@holmes2020]. Further details of the Bayesian implementation of the DFA model and extensions are provided in @ward2019.

### Modeling trends as Gaussian processes

Conventional DFA models with trends modeled as random walks are flexible, but for some datasets these models may be inappropriate. If data generating processes are not well approximated by a random walk, other models may be more suitable. As a first alternative to the random walk model, we treat the trends as a Gaussian process [@roberts2013]. A discrete-time Gaussian process model of trends treats the vector representing the $k^{\mathrm{th}}$ trend as a stochastic process, where $\textbf{x}_{k}$ is drawn from a multivariate normal distribution. As data in a DFA are generally standardized (mean 0, standard deviation 1), we can assume the mean of each trend to be 0, and all inference about the Gaussian process centers around the covariance matrix, $\textbf{x}_{k} \sim \mathrm{MVN}(\textbf{0},\boldsymbol{\Sigma})$. Rather than estimate each element of $\boldsymbol{\Sigma}$ independently, smooth covariance functions or 'kernels' are chosen to represent the covariance between points in time (typical choices include the exponential, Gaussian, and Matérn functions). For the purpose of our DFA modeling, we adopt a Gaussian kernel so that the covariance between points $i$ and $j$ at times ${t}_{i}$ and ${t}_{j}$ on trend $k$ can be expressed as $\mathrm{cov}({x}_{i,k},{x}_{j,k})={\sigma}_{k}^{2}\exp\left(\frac {-{\left({t}_{i}-{t}_{j} \right)}^{2}}{2{\theta}_{k}^{2}} \right)$, where ${\sigma}_{k}$ controls the magnitude of variation, and ${\theta}_{k}$ controls how smoothly correlation decreases as time points become further apart. We allow each trend to have its own covariance parameters (${\theta}_{k}$, ${\sigma}_{k}$), allowing each to have differing degrees of smoothness. Because of potential computation issues in high dimensionality problems such as spatial models [@latimer2009b; @anderson2019b], we also allow this Gaussian process model to be expressed as a Gaussian predictive process model. The difference between the predictive process approach and the full Gaussian process model is that instead of modeling the $\textbf{x}_{t}$ themselves as random variables, random variables are modeled at a subset of locations $\textbf{x}_{k}^{*}$ (referred to as 'knots') and projected to the locations of the data $\textbf{x}_{k}$. If we assume $\textbf{x}_{k}^{*} \sim \mathrm{MVN}(\textbf{0},{\boldsymbol{\Sigma}}^{*})$, then this projection can be done as ${{{x}_{k}=\boldsymbol{\Sigma'}}_{k,{k}^{*}}\boldsymbol{\Sigma}}^{*-1}{x}_{k}^{*}$, where the matrix $\boldsymbol{\Sigma'}_{k,{k}^{*}}$ is the transpose of the matrix describing the covariance between ${x}_{k}$ and ${x}_{k}^{*}$. The location of ${k}^{*}$ can be spaced equally or depend on data; we assume that the ${k}^{*}$ are equally spaced within each time series (with the endpoints also acting as knots). 

### Modeling trends as splines

As an alternative model of latent trends in a DFA, we use a series of smoothing functions, known as basis splines ('B-splines'), or penalized basis splines ('P-splines'). These models can be thought of as a special case of Gaussian process models [@kimeldorf1970a], and offer flexibility similar to the more familiar generalized additive models [@wood2011]. Splines are represented as a series of piecewise polynomial functions, where higher order polynomials result in more flexible curves [@hastie1992]. A common choice of the order of these polynomials is a cubic or 3rd degree, and will be the focus of our implementation for DFA. An additional input to splines is the locations of the control points (knots) between polynomial segments---more knots translates into a more flexible function, but also one with more parameters to estimate. We assume knots to be uniformly distributed over the time series. Uniform knot vectors may be appropriate for data collected at regular intervals, but for observations more patchily distributed in time, defining knots based on quantiles or other metrics may be warranted. Mathematically, modeling the trends in a DFA with B-splines can be expressed as a linear combination of the recursive B-spline weights $\textbf{B}$ and estimated coefficients $\textbf{a}$, $\textbf{x}_{k}=\textbf{a}\textbf{B}$. The matrix $\textbf{B}$ is generated from the raw data prior to estimation [@rcoreteam2020]. In the DFA setting, $\textbf{B}$ is shared across trends, but for trend-specific variability, we allow the coefficients $\textbf{a}$ to have a trend-specific variance, $\textbf{a}_{k}\sim \mathrm{Normal}\left(0,{\sigma}^2_{k} \right)$. P-splines represent an extension of B-splines, with an added penalty for extra wiggliness [@eilers1996; @crainiceanu2005]; this penalty reduces the impact of the number of B-spline basis functions on model fit [@mgcvbook]. For our implementation in bayesdfa, we use a linear penalty with 2nd order difference [@eilers1996]. 

### Simulations to compare model performance

To examine the relative performance of our proposed smooth trend models versus conventional approaches, we conducted a series of simulations to investigate sensitivity to (1) departures from random walks and (2) magnitude of observation error variance. We generated sets of simulated data consisting of 20 time steps and 3 time series. Each dataset was assumed to be generated from a single trend, which we modeled either as a random walk or as a smooth trend with a B-spline. Observed time series were generated from these trends by multiplying random loadings $Z_{i} \sim \mathrm{Normal}(1, 0.1^2)$ and then adding observation error (we used 3 levels of the observation error standard deviation: $\sigma_{obs}$ = 0.25, 0.5, 1). Each set of simulated time series was then fit with the same estimation models: a conventional DFA model estimating the trend as a random walk, smooth trends approximated with a B-spline (7, 13, and 20 knots), a P-spline (13 knots), and a full-rank Gaussian process (20 knots). For each combination of trend model and observation error, we generated a total of 100 simulated datasets.

Estimation was done in a Bayesian framework using our bayesdfa R package [@ward2019]. For the spline models, we assigned priors on the weights $\textbf{a} \sim N(0,1)$. Similarly, we assigned standard half-normal priors for the Gaussian process variances $\sigma_{k} \sim N(0,1)$, and inverse Gamma priors for the scale $\theta_{k} \sim IG(3,1)$. Bayesian estimation in the bayesdfa package is done using Stan and the R package rstan [@standevelopmentteam2016a], which implements Markov chain Monte Carlo (MCMC) using the No-U Turn Sampling (NUTS) algorithm [@hoffman2014; @carpenter2017a]. The relative performance of each estimation model was done using out of sample predictive ability. For each of the simulated time series described above, we randomly held out 10% (2 of every 20 observations) as a test set. Because of the large number of models considered (600 simulated datasets, 4200 estimation models), we only ran 1 MCMC chain (3000 iterations, discarding the first half as warm up), and generated posterior predictions for the test data. The normal log density of the test set was calculated for each MCMC iteration, and the expected log pointwise predictive density (ELPD) was used to summarize these values across draws [@vehtari2017]. 

### Application: 1-trend models of juvenile fish dynamics

As a first application of smooth DFA models, we analyzed time series data of pelagic juvenile rockfishes collected in Southern California (USA). The California Cooperative Oceanic Fisheries Investigations (CalCOFI) program has been conducting quarterly research vessel surveys to collect physical and biological data since 1949, to monitor changes to the California Current Ecosystem [@bograd2003]. The CalCOFI data have been incorporated into models used to assess population status [@maccall2003], and numerous publications have used these time series as indicators of ecosystem state [@mcclatchie2008]. These types of motivating questions also present an opportunity to apply DFA with both conventional and smoothed trends to generate ecosystem state indices. For this application, we focused on the dynamics of three co-occurring species of juvenile rockfishes: aurora rockfish (*Sebastes aurora*), shortbelly rockfish (*S. jordani*), and bocaccio rockfish (*S. paucispinis*). We restricted the time series to data collected since 1985, when sampling has been consistent in space and time on fixed sampling lines [@moser2001]. Though CalCOFI cruises are done throughout the year, we were primarily interested in estimating interannual trends, and further restricted our analysis to considering spring cruises from 1-April to 22-May when densities of most rockfish species are highest [@mosek2000]. All data were retrieved using the software R [@rcoreteam2020] and the rerddap package [@chamberlain2020].

With only three time series, we focused on DFA models with one trend and a single observation error variance shared across species. Other types of models, including hierarchical GAMs [@pedersen2019] or models allowing estimated offsets, may also be useful in this type of application. Where the DFA model differs is that unlike models with random intercepts or additive terms, the DFA factor loadings $\textbf{Z}$ are multiplicative and may be close to zero. These cases may arise when a particular time series has a low signal-to-noise ratio, or if there is low correspondence with the latent trends estimated among all other time series. In addition to estimating a conventional 1-trend DFA model with a latent autoregressive process, we evaluated smooth 1-trend models (trend estimated with a B-spline, P-spline, or Gaussian process). Because we had no a priori hypotheses about the complexity of these smoothed factor models, we evaluated a range of models for each (Table 1), using equally spaced knots.

### Application: 2-trend models of commercial fisheries catches  

As a slightly more complex example of the smooth factor analysis model, we examined the performance of 2-trend models, using a dataset of commercial fisheries catches (landings) from the west coast of the USA. This dataset consists of 13 species or groups reported annually from multiple fisheries over a 39-year period (1981--2019)[@pfmc2020]. Landings on the US West Coast are dominated by Pacific hake (also Pacific whiting, *Merluccius productus*), but also include substantial catches of rockfishes (*Sebastes spp.*) and flatfishes (e.g., Dover sole, *Solea solea*). Over the course of the last four decades, these species have experienced variability associated with population dynamics and the environment, but the patterns of landings also reflects a dynamic fisheries management process. Examples of changes include temporarily closing areas to fishing to protect species of conservation concern, and implementing catch share programs. These processes, combined with environmental conditions that have been positive for many species, have resulted in many increasing populations [@warlick2018]. Given these various management and ecological changes, it is important to summarize patterns of landings, and identify common trends as indicators for management and ecosystem status [@harvey2018b].

As with our previous example, we compared conventional DFA models to those modeling the trends with smooth functions. Preliminary model comparisons with 1-trend models suggested that 2-trend models were most supported by the data, and thus will be the focus of our analysis. In addition to modeling the 2-trend model with conventional DFA, we evaluated spline and Gaussian process models with equally spaced knots (Table 1). All models included a single observation error variance, shared across time series.

### Estimation and model selection  

For each model considered in our applications, we ran 3 parallel MCMC chains for 4000 iterations each, discarding the first 50% of the samples. We assessed convergence using split-$\hat{R}$ and effective samples size [@gelman2013BDA] along with trace plots. We used the loo package to calculate the approximate ELPD [@vehtari2017], and the Leave One Out Information Criterion [LOOIC, @vehtari2017; @vehtari2020] as a model selection tool [@ward2019], which approximates leave-one-out cross-validation. Preliminary model checks using LOOIC for the models included in our analysis indicated that many models had 1--4 data points that had high Pareto-k statistics (possibly because of model-misspecification or model flexibility, @vehtari2017). To avoid re-fitting these models, we implemented moment matching in the loo package [@vehtari2020; @paananen2021].

## Results

### Simulations to compare model performance

Our simulations were designed to explore the relative performance of DFA models that estimate trends as random walks versus our proposed smooth trends, when trends depart from random walks and are corrupted by observation error. Our results suggested that when observation error is relatively high, there is little difference between the smooth trend and random walk DFA models (Fig. 1). As observation error decreases, ELPD favours smooth-trend models when the underlying trend is smooth and random walk models when the underlying trend is random walk (Fig. 1). The largest ELPD weight for the smooth trend model occurred when observation error was low (0.25) and the number of knots in the estimation model was closest to that of the simulation model ('BS7' Fig. 1 left panel). The P-spline and Gaussian process models provided weak support to the true data generating model with low observation error, but models were indistinguishable with higher levels of observation errors. Results across knots for the B-spline estimation model demonstrates that as the number of knots in the smooth trend models increases, so too does flexibility, and the smooth trend approach becomes similar to the random walk (Fig. 1 left panel). 

```{r echo=FALSE}
m = readRDS("output/calcofi_models.rds")
fit = m[[6]]
r = bayesdfa::rotate_trends(fit, conf_level = 0.9)
post_means = apply(r$Z_rot,2,mean)
post_low = apply(r$Z_rot,2,quantile,0.025)
post_hi = apply(r$Z_rot,2,quantile,0.975)
# 
# # compare 6-knot GP and BS model
# fit_bs = m[[2]]
# fit_gp = m[[7]]
# x_bs = apply(rstan::extract(fit_bs$model,"x")$x,3,mean)
# x_gp = apply(rstan::extract(fit_gp$model,"x")$x,3,mean)
```

### Application: 1-trend models of juvenile fish dynamics

For our application of smooth dynamic factor models to the CalCOFI juvenile rockfish dataset, we found that the full rank Gaussian process DFA model and B-spline model with 30 knots had slightly lower LOOIC values compared to alternative models (Table 1), and these high dimensional models performed slightly better than the conventional DFA model. Varying the number of knots for the P-spline models and Gaussian process models resulted in qualitatively similar data support (Table 1), while the predictive accuracy of the B-spline model increased with more knots. This greater flexibility allowed more complex models to better capture recent variability in rockfish densities (Fig. 2). Trend 1 can be seen as largely capturing the variability in the timeseries of aurora rockfish, which had the loading that was largest in magnitude (`r round(post_means[1],2)`, 90% credible interval = `r round(post_low[1],2)`--`r round(post_hi[1],2)`). Bocaccio rockfish also loaded positively on trend 1, though the effect was weaker (`r round(post_means[3],2)`, 90% credible interval = `r round(post_low[3],2)`--`r round(post_hi[3],2)`). The loading for shortbelly rockfish was smallest in magnitude (`r round(post_means[2],2)`, 90% credible interval = `r round(post_low[2],2)`--`r round(post_hi[2],2)`). 

### Application: 2-trend models of commercial fisheries catches  

When DFA models were applied to commercial fisheries landings data from the west coast of the USA, the model with the lowest LOOIC was the P-spline model with 30 knots (second was the B-spline model with 6 knots). The first trend exhibited nearly linear change from 1981--2001 and was relatively stationary from 2001--2019 (Fig. 3). The second trend represented change from the early 1990s, with the strongest change occurring 2010--present. Estimates of the loadings from the best model indicated many species or species groups loaded negatively on trend 1 (lingcod, sablefish, rockfishes), but arrowtooth flounder and Pacific whiting had opposite loadings (Fig. 3). Trend 2 from this model appeared to contrast species with relatively stationary catches before declining in 2010 (e.g., arrowtooth flounder, *Atheresthes stomas*) versus Petrale sole (*Eopsetta jordani*)---one of the only non-whiting species that has experienced positive catches since 2010. Predictions across all models appeared to characterize the trends of most species, and trends from the best model generated more precise predictions relative to the random walk (Fig. 4), although neither model was able to capture the variability in Pacific whiting catches since 2000.

While low dimensional Gaussian process and spline models performed similarly (Table 1), comparing higher order models demonstrates the contrast between approaches. As more knots were added to spline models, the wiggliness of the estimated trends generally increased for the B-spline models but remained smooth for the P-spline approach (Fig. 5). Like the P-spline models, trends from the Gaussian process models did not become more wiggly as more knots were added, though the credible intervals of estimated trends were wider than either of the spline approaches (Fig. 5). Estimates of $\theta_{k}$ for this Gaussian process model were relatively large (8.13, 4.78), allowing correlation between neighboring points to decrease slowly and neighboring points further away to have a larger effect. In contrast, the full rank Gaussian process model was most supported for the CalCOFI data --- this model had a relatively small value of $\theta_{k}$ = 1.15, allowing correlation between adjacent points to decrease rapidly, translating into greater flexibility.  

## Discussion

Dynamic factor analysis represents a flexible approach for using state space models to capture latent processes in multivariate time series [@zuur2003b; @zuur2003c]. For some ecological processes --- particularly those with high variability --- random walks may be too constraining, while for others, using a random walk may be overly complex. Examples of cases where random walks may overfit trends may exist when there are large temporal gaps between observations, or data are collected from systems with high signal to noise ratios. 
As alternatives to the conventional random walk, we illustrate how DFA trends may be modeled using Gaussian process models smooth functions (B-splines, P-splines). These alternatives are flexible in that their smoothness may be specified a priori by the user, and compared via model selection. As the variability of latent trends is nearly always fixed in a conventional DFA for identifiability [@zuur2003b; @holmes2012b], adopting an alternative model of the trend does not limit inference or change the meaning of other parameters (e.g., loadings). Based on our application of these approaches to simulated data, smooth trend DFA models may be better supported in situations where the data generating process is more smooth than a random walk; examples included processes that are highly autocorrelated or have large amounts of environmental forcing.

In both of our case studies, comparing smooth DFA models to conventional ones, we found that using smooth functions to model DFA trends resulted in models with higher predictive ability (as measured with LOOIC). Our two case studies contrast two datasets with different degrees of variability. The CalCOFI dataset on juvenile rockfish abundance represents data with relatively high variability --- both because of the sampling process, and because the nature of fish recruitment in space and time is stochastic. 
<!-- In comparing conventional 1-trend DFA versus smooth trend DFA models to the CalCOFI data, the conventional random walk had difficulty in capturing extremes (Fig. 2), while the spline and Gaussian process models generally did better (Table 1).  -->
Our second example consisted of applying DFA models to time series of fisheries catches; these data are generally less variable than the CalCOFI data because catches are aggregated across a large spatial area and individual vessels. Like the CalCOFI example, we found that smooth trend DFA models were better supported over the conventional random walk; however, the models receiving the most support were lower dimension models (e.g., P-spline with 30 knots, B-spline with 6 knots). For both of our case studies, knot locations were assigned uniformly, and these results would be expected to change slightly if the knot locations were adjusted. For models with missing data, or datasets with unevenly distributed replicate samples, it may be important to consider non-uniform knot locations. 

Our case studies also highlighted that predictions from smooth trend models that use splines or Gaussian processes may be nearly identical, raising the question of which approach may be better to use in practice. Spline models can give equivalent predictions to Gaussian process models with the same kernel used in our models [@kimeldorf1970a]; however, the smoothing approaches differ slightly between these models. Analysts using these methods with DFA may be more interested in applying the Gaussian process model if inference about covariance parameters is of interest, while the B-spline or P-spline models may be computationally faster in many other applications. P-splines offer the additional advantage of being less sensitive to the number of knots. 

Because of their flexibility, applications of LOOIC or related model selection tools to state-space models, including the DFA models in our analysis, may result in poor diagnostics (e.g., high Pareto-k statistics). Alternative approaches for evaluating predictive performance may be used, including the ELPD obtained via k-fold cross validation [@vehtari2017; @vehtari2020]. Rather than performing parameter estimation once per model, as was done in our analysis using the loo package, calculating ELPD is more computationally challenging because with cross-validation, a model must be fit once per fold. With this added cost comes new opportunities, in that cross validation methods specific to time series data may be more easily applied. Commonly used approximations like LOOIC represent an approximation to leave-one-out cross validation where each data point is held out in turn. An alternative approach for time series data is that the observations in each time step can be treated as a fold, and held out in turn. Extensions of this time series approach include leave-future-out cross-validation, where data points are only used to predict future observations, not historical ones [@burkner2020a]. 
<!--While the specific tool used to assess model performance may be tailored to the research questions being addressed, the types of flexible trend models included in our analysis represent a robust approach for DFA that may also be considered in hindcasting or forecasting scenarios. -->

There are a number of possible extensions to the smooth-function DFA models described in this paper.
<!--One extension would be to extend the P-spline models to implement alternative penalty terms [@mgcvbook].-->
One extension would be to further constrain the wiggliness defined by the Gaussian process rate of correlation decay ($\theta$) via a prior such as the penalized complexity (PC) prior [@simpson2017].
Such a prior which would allow one to more easily impart prior beliefs about the parameter scale.
Second, the smooth trends could themselves be hierarchical: the trends could share their wiggliness, draw smoothing parameters from a shared distribution, or share a global smoother combined with group-specific smoothers [@pedersen2019].

DFA represents a powerful and underutilized tool for dimension reduction of multivariate time series. Our extensions of conventional methods to implement smooth trends enhances the flexibility of this tool for estimating latent processes, and offers a robust approach for DFA that may also be useful in hindcasting or forecasting scenarios. 

## Author contributions  

MH and ML secured funding for the initial development of the bayesdfa package and related publications. EW and SA did the majority of the code development, though all authors were involved in testing and simulation. EW and SA wrote the initial draft of the manuscript, with all authors providing critical edits and comments on figures. All authors reviewed the final draft and gave approval for publication.

\break

## Tables 

Table 1. Leave One Out Information Criterion (LOOIC) and Expected Log Posterior Density (ELPD) with standard errors in parentheses for each of the models applied to our cases studies (CalCOFI time series of juvenile rockfishes, and the time series of commercial groundfish landings from the west coast of the USA). For each model, knots (or locations of control points) are assumed to be uniformly spaced over the time series. To aid in interpretation, the minimum LOOIC value has been subtracted from each case study and ELPD values have been subtracted from the maximum (0 for each metric reflects the most supported model).

```{r table1, echo=FALSE}

loo_calcofi = readRDS("output/calcofi_loos.rds")
loo_landings = readRDS("output/landings_loos.rds")

# 1, 2/4/6 (b-spline), 13/15/17 (ps), 7/9/11 (gp), - landings
#  - calcofi

tab = data.frame("Trend model"=c("Random walk", rep("B-spline",3), rep("P-spline",3), rep("Gaussian process",4)))

tab$Knots = c(NA, rep(seq(6,30,by=12),3),"Full rank")

indices = c(1,2,4,6, 13,15,17, 7,9,11, 12)
min_calcofi = min(round(unlist(lapply(loo_calcofi, getElement, "looic")),2)[indices])
min_landings = min(round(unlist(lapply(loo_landings, getElement, "looic")),2)[indices])

tab$`CalCOFI LOOIC` = paste0(round(unlist(lapply(loo_calcofi, getElement, "looic")[indices]) - min_calcofi,2), " (",round(unlist(lapply(loo_calcofi, getElement, "se_looic")),2)[indices],")")

max_elpd = max(round(unlist(lapply(loo_calcofi, getElement, "elpd_loo")[indices]),2))
tab$`CalCOFI ELPD` = paste0(round(max_elpd - unlist(lapply(loo_calcofi, getElement, "elpd_loo")[indices]),2), " (",round(unlist(lapply(loo_calcofi, getElement, "se_elpd_loo")),2)[indices],")")

tab$`Landings LOOIC` = paste0(round(unlist(lapply(loo_landings, getElement, "looic")[indices]) - min_landings,2), " (",round(unlist(lapply(loo_landings, getElement, "se_looic")),2)[indices],")")

max_elpd = max(round(unlist(lapply(loo_landings, getElement, "elpd_loo")[indices]),2))
tab$`Landings ELPD` = paste0(round(max_elpd - unlist(lapply(loo_landings, getElement, "elpd_loo")[indices]),2), " (",round(unlist(lapply(loo_landings, getElement, "se_elpd_loo")),2)[indices],")")

knitr::kable(tab)
```

\break

## Figure Captions 
Figure 1. Simulation results, showing the difference in expected log pointwise predictive density (ELPD) between each model and the conventional DFA model with trends estimated as random walks. Data were generated from either a B-spline with 7 knots, or a random walk, and 3 levels of observation error were explored (0.25, 0.5, 1). Results are shown for each combination of observation error and estimation model (B-spline with 7, 13, or 20 knots; P-spline with 13 knots; Gaussian Process with 20 knots, random walk). Each boxplot corresponds to 100 ELPD point estimates, and each facet represents a different data generating model (B-spline or random walk).

Figure 2. Standardized densities of juvenile shortbelly rockfish (*Sebastes  jordani*) collected in the CalCOFI survey, and estimates of latent trends for three candidate models, representing a range of flexibility in splines compared to the conventional random walk. In addition to the conventional DFA model with a latent random walk (included in all panels for reference), predictions from a full rank Gaussian process model, and B-spline model with 12 knots and 24 knots are shown. The posterior mean from each model is shown as a solid line, and 90% credible intervals are shown with ribbons. 

Figure 3. Estimated trends and loadings from the 2-trend DFA model applied to commercial groundfish landings off the west coast of the United States. The model results with lowest LOOIC is shown, a model that allows trends to be approximated with P-splines (30 knots). The posterior mean for each trend is shown, with ribbons representing 90% credible intervals. The loadings of each species on each trend are shown as points, with lines representing 90% credible intervals. 

Figure 4. Estimated landings for 2 species included in our analysis, with contrasting trends (lingcod, Pacific whiting). Posterior means and 90% credible intervals (ribbons) for two candidate models are shown: a P-spline trend model with 30 knots, and a random walk model representing the conventional DFA.

Figure 5. Estimated trends for the 2-trend model of fisheries landings on the west coast of the USA. Shown are results for the B-spline and Gaussian process models with 6 and 18 knots (or control points). Solid lines represent the posterior means and 90% credible intervals are shown as ribbons.

\break  

```{r fig1, echo=FALSE, fig.cap="", fig.height=5, fig.pos="placeHere"}

source("code/simulation_plot.r")
g
```

\break  
  
```{r fig2, echo=FALSE, fig.cap="", fig.height=7, fig.pos="placeHere"}
m = readRDS("output/calcofi_models.rds")

# standardize raw data
x = readRDS("output/calcofi_data.rds")
scaled_x = group_by(x, ts) %>%
  dplyr::mutate(obs = (obs-mean(obs,na.rm=T))/sd(obs,na.rm=T))
scaled_x$Species = c("aurora","shortbelly","bocaccio")[scaled_x$ts]
scaled_x = dplyr::filter(scaled_x, Species=="shortbelly")
scaled_x$Model = NA

# Predictions from AR(1) model
trend_df1 = dfa_fitted(m[[1]], conf_level=0.9) %>% 
  dplyr::filter(ID==3)
trend_df1$time = seq(min(x$time),max(x$time))
trend_df1$Model = "Random walk"  

trend_df3 = dfa_fitted(m[[3]], conf_level=0.9) %>% 
  dplyr::filter(ID==3)
trend_df3$time = seq(min(x$time),max(x$time))
trend_df3$Model = "B-spline"
trend_df3 = rbind(trend_df3, trend_df1)
trend_df3b = dfa_fitted(m[[13]], conf_level=0.9) %>% 
  dplyr::filter(ID==3)
trend_df3b$time = seq(min(x$time),max(x$time))
trend_df3b$Model = "P-spline"
trend_df3 = rbind(trend_df3, trend_df3b)
trend_df3$group = "Spline (n=12)"

# predictions from B-spline model
trend_df2 = dfa_fitted(m[[5]], conf_level=0.9) %>% 
  dplyr::filter(ID==3)
trend_df2$time = seq(min(x$time),max(x$time))
trend_df2$Model = "B-spline"
trend_df2 = rbind(trend_df2, trend_df1)
trend_df2b = dfa_fitted(m[[16]], conf_level=0.9) %>% 
  dplyr::filter(ID==3)
trend_df2b$time = seq(min(x$time),max(x$time))
trend_df2b$Model = "P-spline"
trend_df2 = rbind(trend_df2, trend_df2b)
trend_df2$group = "Spline (n=24)"

# predictions from GP model
trend_df4 = dfa_fitted(m[[12]], conf_level=0.9) %>% 
  dplyr::filter(ID==3)
trend_df4$time = seq(min(x$time),max(x$time))
trend_df4$Model = "Gaussian process"
trend_df4 = rbind(trend_df4, trend_df1)
trend_df4$group = "Gaussian process (full rank)"

trend_df = rbind(trend_df2, trend_df3, trend_df4)

# trend_df$Model[which(trend_df$Model!="Random walk" & trend_df$group!="Gaussian process (full rank)")] = "B-spline"
#  trend_df$Model[which(trend_df$Model!="Random walk" & trend_df$group=="Gaussian process (full rank)")] = "Gaussian process"
   
my_colors <- c(rev(viridis(n=3, end=0.8)), gplots::col2hex("grey30"))

#trend_df$Model = factor(trend_df$Model, levels = c("B-spline","P-spline","Gaussian process","Random walk"))
g1 = ggplot(trend_df, aes(time,estimate,group=Model,fill=Model,col=Model)) +
  geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.2,col=NA) +
  geom_line() +
  theme_sleek() +
  facet_wrap(~group, nrow=3, scale="free_y") + 
  theme(strip.background =element_rect(fill="white")) + 
  geom_point(data=scaled_x, aes(x=time,y=obs,group=NA),alpha=0.7,size=3,col="grey30") +
  scale_color_manual(values = my_colors) + 
  scale_fill_manual(values = my_colors) + 
  #scale_color_manual("Species",
  #  values = c("aurora" = "#440154FF", "shortbelly" = "#2A788EFF", "bocaccio" = "#7AD151FF")) +
  ylab("Standardized densities and trend") +
  xlab("Year") + 
  theme(strip.text.x = element_text(size = 12))

print(g1)
```

\break   
  
  
```{r fig3, echo=FALSE, fig.cap="", fig.height=7, fig.pos="placeHere"}
  # make plots for best model
  m = readRDS("output/landings_models.rds")
  rotated = rotate_trends(m[[17]], conf_level = 0.9)

  n_ts <- dim(rotated$Z_rot)[2]
  n_trends <- dim(rotated$Z_rot)[3]
  n_years <- dim(rotated$trends_mean)[2]
  years <- 1981:2019
  # convert to df for ggplot
  df <- data.frame(
    x = c(t(rotated$trends_mean)),
    lo = c(t(rotated$trends_lower)),
    hi = c(t(rotated$trends_upper)),
    trend = paste0("Trend ", sort(rep(seq_len(n_trends), n_years))),
    time = rep(years, n_trends)
  )

  my_colors <- c(viridis(n=3, end=0.8))[1]

  # make faceted ribbon plot of trends
  p1 <- ggplot(df, aes_string(x = "time", y = "x")) +
    geom_ribbon(aes_string(ymin = "lo", ymax = "hi"), alpha = 0.4,col=NA, fill=my_colors) +
    geom_line(col=my_colors) + 
    facet_wrap("trend",nrow=2,scale="free_y") +
    xlab("Time") + 
    ylab("Trend") + 
    theme_sleek() +
    #scale_color_manual(values = my_colors) + 
  #scale_fill_manual(values = my_colors) + 
    theme(strip.background =element_rect(fill="white")) + 
    theme(strip.text.x = element_text(size = 12))
  
# bring in raw data
d = read.csv("data/port_landings_table2.csv", stringsAsFactors = FALSE)
d = dplyr::select(d, -Year)
for(i in 1:ncol(d)) {
  d[,i] = log(as.numeric(d[,i]))
}

loadings = as.data.frame(cbind(apply(rotated$Z_rot,c(2,3),mean), apply(rotated$Z_rot,c(2,3),sd)))
names(loadings) = c("Trend1_mean","Trend2_mean","Trend1_sd","Trend2_sd")
names(d)[1] = "Pacific whiting"
names(d)[4] = "Pacific cod"
names(d)[5] = "Misc roundfish"
names(d)[8] = "Arrowtooth flounder"
names(d)[9:13] = c("Dover sole","English sole","Petrale sole","Misc flatfish","Misc groundfish")
loadings$species = names(d)

p2 = ggplot(dplyr::filter(loadings, species!="Misc roundfish",species!="Misc flatfish",species!="Misc groundfish"),
  aes(Trend1_mean,Trend2_mean,label=species)) +
  geom_point(col=my_colors) +
  geom_errorbar(aes(ymin=Trend2_mean - Trend2_sd,ymax=Trend2_mean + Trend2_sd),alpha=0.6,col=my_colors,width=0) +
  geom_errorbarh(aes(xmin=Trend1_mean - Trend1_sd,xmax=Trend1_mean + Trend1_sd),alpha=0.6,col=my_colors,height=0) +
  geom_text_repel(size=5,col=my_colors) +
  #scale_color_manual(values = my_colors) + 
  #scale_fill_manual(values = my_colors) + 
  theme_sleek() +
  xlab("Loading on trend 1") +
  ylab("Loading on trend 2")

gridExtra::grid.arrange(p1,p2,nrow=2)
```

\break  
  
  
```{r fig4, echo=FALSE, fig.pos="placeHere", fig.cap="", fig.height=7}
# pull in predictions for best model and conventional random walk

whiting_rw = data.frame("Species"="Pacific whiting",
                     year=1981:2019,
                     Model = "Random walk",
                     pred = apply(bayesdfa::predicted(m[[1]])[,,,1],3,mean),
                     lower = apply(bayesdfa::predicted(m[[1]])[,,,1],3,quantile,0.025),
                     upper = apply(bayesdfa::predicted(m[[1]])[,,,1],3,quantile,0.975))

whiting_bs = data.frame("Species"="Pacific whiting",
                     year=1981:2019,
                     Model = "P-spline",
                     pred = apply(bayesdfa::predicted(m[[17]])[,,,1],3,mean),
                     lower = apply(bayesdfa::predicted(m[[17]])[,,,1],3,quantile,0.025),
                     upper = apply(bayesdfa::predicted(m[[17]])[,,,1],3,quantile,0.975))

lingcod_rw = data.frame("Species"="Lingcod",
                     year=1981:2019,
                     Model = "Random walk",
                     pred = apply(bayesdfa::predicted(m[[1]])[,,,3],3,mean),
                     lower = apply(bayesdfa::predicted(m[[1]])[,,,3],3,quantile,0.025),
                     upper = apply(bayesdfa::predicted(m[[1]])[,,,3],3,quantile,0.975))

lingcod_bs = data.frame("Species"="Lingcod",
                     year=1981:2019,
                     Model = "P-spline",
                     pred = apply(bayesdfa::predicted(m[[17]])[,,,3],3,mean),
                     lower = apply(bayesdfa::predicted(m[[17]])[,,,3],3,quantile,0.025),
                     upper = apply(bayesdfa::predicted(m[[17]])[,,,3],3,quantile,0.975))

df = rbind(lingcod_bs, lingcod_rw, whiting_bs, whiting_rw)

# bring in raw observations
d = read.csv("data/port_landings_table2.csv", stringsAsFactors = FALSE)
obs = data.frame(year = rep(1981:2019,2),
  y = c(d[["Lingcod"]],d[["P..Whiting"]]),
  "Species"=c(rep("Lingcod",39), rep("Pacific whiting",39)))
obs = dplyr::group_by(obs, Species) %>%
  dplyr::mutate(y = (y-mean(y))/sd(y),
    Model="Random walk")

my_colors <- c(rev(viridis(n=3, end=0.8)), gplots::col2hex("grey30"))[c(3,4)]

g1 = ggplot(df, aes(year, pred, group=Model, col=Model,fill=Model)) +
  geom_ribbon(aes(ymin=lower,ymax=upper),alpha=0.3,col=NA) +
  geom_line() +
  theme_sleek() +
  xlab("") +
  ylab("Standardized landings (mt)") +
  scale_color_manual(values = my_colors) + 
  scale_fill_manual(values = my_colors) + 
  #scale_fill_viridis(end=0.8, discrete = TRUE) +
  #scale_color_viridis(end=0.8, discrete = TRUE) +
  facet_wrap(~ Species, nrow=2) +
  theme(strip.background =element_rect(fill="white")) +
  geom_point(data=obs, aes(year, y),size=2, col="grey30",alpha=0.5) + 
  theme(strip.text.x = element_text(size = 12))

print(g1)
```
  
\break  

```{r fig5, echo=FALSE, fig.pos="placeHere", fig.cap="", fig.height=7}
  m = readRDS("output/landings_models.rds")
  # make plot of trends as a function of knots
  
  # 18 knot B-spline
  r = rotate_trends(m[[4]], conf_level = 0.9)
  bs_1 = data.frame("knots"=18,
                    "trend"="Trend 1 (n=18)",
                  "mean"=r$trends_mean[1,],
                  "lo"=r$trends_lower[1,],
                  "hi"=r$trends_upper[1,],
                  "year"=1981:2019)  
  bs_2 = data.frame("knots"=18,
                    "trend"="Trend 2 (n=18)",
                  "mean"=r$trends_mean[2,],
                  "lo"=r$trends_lower[2,],
                  "hi"=r$trends_upper[2,],
                  "year"=1981:2019)  
  # 6 knot B-spline
  r = rotate_trends(m[[2]], conf_level = 0.9)
  bs_3 = data.frame("knots"=6,
                    "trend"="Trend 1 (n=6)",
                  "mean"=r$trends_mean[1,],
                  "lo"=r$trends_lower[1,],
                  "hi"=r$trends_upper[1,],
                  "year"=1981:2019)  
  bs_4 = data.frame("knots"=6,
                    "trend"="Trend 2 (n=6)",
                  "mean"=r$trends_mean[2,],
                  "lo"=r$trends_lower[2,],
                  "hi"=r$trends_upper[2,],
                  "year"=1981:2019)  
  
  # 18 knot B-spline
  r = rotate_trends(m[[15]], conf_level = 0.9)
  ps_1 = data.frame("knots"=18,
                    "trend"="Trend 1 (n=18)",
                  "mean"=r$trends_mean[1,],
                  "lo"=r$trends_lower[1,],
                  "hi"=r$trends_upper[1,],
                  "year"=1981:2019)  
  ps_2 = data.frame("knots"=18,
                    "trend"="Trend 2 (n=18)",
                  "mean"=r$trends_mean[2,],
                  "lo"=r$trends_lower[2,],
                  "hi"=r$trends_upper[2,],
                  "year"=1981:2019)  
  # 6 knot P-spline
  r = rotate_trends(m[[13]], conf_level = 0.9)
  ps_3 = data.frame("knots"=6,
                    "trend"="Trend 1 (n=6)",
                  "mean"=-r$trends_mean[1,],
                  "lo"=-r$trends_lower[1,],
                  "hi"=-r$trends_upper[1,],
                  "year"=1981:2019)  
  ps_4 = data.frame("knots"=6,
                    "trend"="Trend 2 (n=6)",
                  "mean"=-r$trends_mean[2,],
                  "lo"=-r$trends_lower[2,],
                  "hi"=-r$trends_upper[2,],
                  "year"=1981:2019)  
  
  # 18 knot B-spline
  r = rotate_trends(m[[9]], conf_level = 0.9)
  gp_1 = data.frame("knots"=18,
                    "trend"="Trend 1 (n=18)",
                  "mean"=-r$trends_mean[1,],
                  "lo"=-r$trends_lower[1,],
                  "hi"=-r$trends_upper[1,],
                  "year"=1981:2019)  
  gp_2 = data.frame("knots"=18,
                    "trend"="Trend 2 (n=18)",
                  "mean"=-r$trends_mean[2,],
                  "lo"=-r$trends_lower[2,],
                  "hi"=-r$trends_upper[2,],
                  "year"=1981:2019) 
  # 6 knot B-spline
  r = rotate_trends(m[[7]], conf_level = 0.9)
  gp_3 = data.frame("knots"=6,
                    "trend"="Trend 1 (n=6)",
                  "mean"=-r$trends_mean[1,],
                  "lo"=-r$trends_lower[1,],
                  "hi"=-r$trends_upper[1,],
                  "year"=1981:2019)  
  gp_4 = data.frame("knots"=6,
                    "trend"="Trend 2 (n=6)",
                  "mean"=-r$trends_mean[2,],
                  "lo"=-r$trends_lower[2,],
                  "hi"=-r$trends_upper[2,],
                  "year"=1981:2019) 
  
  bs = rbind(bs_3, bs_4, bs_1, bs_2)
  gp = rbind(gp_3, gp_4, gp_1, gp_2)
  ps = rbind(ps_3, ps_4, ps_1, ps_2)
  bs$model = "B-spline"
  gp$model = "Gaussian process"
  ps$model = "P-spline"
  
  #df = rbind(bs,gp,ps)
  df = rbind(bs,gp,ps)
  #df$Model = as.factor(df$model)
  my_colors <- rev(c(viridis(n=3, end=0.8)))

  df$trend = factor(df$trend, levels = c("Trend 1 (n=6)", "Trend 1 (n=18)",
                                            "Trend 2 (n=6)", "Trend 2 (n=18)"))
  p1 = ggplot(df, aes(year, mean, col=model,fill=model)) + 
    geom_ribbon(aes(ymin=lo,ymax=hi,fill=model),alpha=0.5,fill=NA,linetype=1) + 
    geom_line(size=1.1,alpha=0.7) +     
    facet_wrap(~ trend, scale="free_y") +
    theme_sleek() + 
    xlab("") + 
    ylab("") + 
    scale_color_manual(values = my_colors) + 
    scale_fill_manual(values = my_colors) + 
    theme(strip.background =element_rect(fill="white")) + 
    theme(strip.text.x = element_text(size = 12))
  print(p1)
```

\break
  
  
# References