Skip to content

Commit

Permalink
updated infantgts example
Browse files Browse the repository at this point in the history
  • Loading branch information
gcorani committed Nov 21, 2023
1 parent a238d64 commit 69a5a64
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 72 deletions.
117 changes: 45 additions & 72 deletions vignettes/bayesRecon.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ vignette: >
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
#eval=TRUE ### !!!! set to FALSE here to render only the text !!!!
eval=FALSE ### !!!! set to FALSE here to render only the text !!!!
eval=TRUE ### !!!! set to FALSE here to render only the text !!!!
#eval=FALSE
)
set.seed(42)
```
Expand All @@ -37,21 +37,17 @@ the `bayesRecon` package. We provide three examples:

1. *Temporal hierarchy for a count time series*: we build a temporal hierarchy over a count time series, produce the base forecasts using `glarma` and reconcile them via Bottom-Up Importance Sampling (BUIS).

2. *Temporal hierarchy for a smooth time series*: we build a temporal hierarchy over a smooth time series, compute the base forecasts using `ets` and we reconcile them in closed form using Gaussian reconciliation.
2. *Temporal hierarchy for a smooth time series*: we build a temporal hierarchy over a smooth time series, compute the base forecasts using `ets` and we reconcile them in closed form using Gaussian reconciliation. The covariance matrix is diagonal.

3. *Hierarchical of smooth time series*: this is an example of a cross-sectional hierarchy. We generate the base forecasts using `ets` and we reconcile them via Gaussian reconciliation.
The covariance matrix is full and estimated via shrinkage.

4. *Cross-sectional hierarchy of count time series*
<mark> LZ </mark>


<mark>toglierei le citazioni</mark>

# Installation

The package is available at this
[CRAN page](https://cran.r-project.org/package=bayesRecon).
It can be installed and loaded with the usual commands:
The package, available at this
[CRAN page](https://cran.r-project.org/package=bayesRecon), can be installed and loaded with the usual commands:
```{r install, eval=FALSE}
install.packages('bayesRecon', dependencies = TRUE)
```
Expand All @@ -61,24 +57,20 @@ install.packages('bayesRecon', dependencies = TRUE)
library(bayesRecon)
```

# Carparts: temporal hierarchy over a count time series
We select a monthly time series of counts from the *carparts* dataset
<mark>citerei piuttosto il package expsmooth [@expsmooth_pkg]</mark>
[@hyndman2008forecasting].
The data set contains time series of sales of cars part from Jan. 1998 to Mar. 2002. We select in particular time series #2655, which we make it available as `bayesRecon::carparts_example`.
<mark>la chiamerei piuttosto carparts_example</mark>

# Temporal hierarchy over a count time series
We select a monthly time series of counts from the *carparts* dataset, available from
the expsmooth package [@expsmooth_pkg].
The data set contains time series of sales of cars part from Jan. 1998 to Mar. 2002.
For this example we select time series #2655, which we make available as `bayesRecon::carparts_example`.

This time series has a skewed distribution of values.

```{r carpart-plot, dpi=300, out.width = "100%", fig.align='center', fig.cap="**Figure 1**: Carpart - monthly car part sales.", fig.dim = c(6, 3)}
layout(mat = matrix(c(1, 2), nrow = 1, ncol = 2), widths = c(2, 1))
plot(carparts_example, xlab = "Time", ylab = "Car part sales", main = NULL)
hist(carparts_example, xlab = "Car part sales", main = NULL)
```
<br><br>
We divide the time series into train and test, such that the test set
contains the last 12 months.
We divide the time series into train and test; the test set contains the last 12 months.
```{r train-test}
train <- window(carparts_example, end = c(2001, 3))
test <- window(carparts_example, start = c(2001, 4))
Expand All @@ -89,29 +81,31 @@ test <- window(carparts_example, start = c(2001, 4))
<!-- 1. we build a temporal hierarchy; -->
<!-- 2. we compute the *base forecasts*) at each temporal scale. -->

We build the temporal hierarchy using the `temporal aggregation` function. We aggregate at *2-Monthly*, *Quarterly*, *4-Monthly*, *Biannual*, and *Annual* by specifying the `agg_levels` argument.
We build the temporal hierarchy using the `temporal aggregation` function.
We specify the aggregation levels using the
`agg_levels` argument; in this case they are
*2-Monthly*, *Quarterly*, *4-Monthly*, *Biannual*, and *Annual*.

``` {r temp-agg}
train.agg <- bayesRecon::temporal_aggregation(train, agg_levels = c(2, 3, 4, 6, 12))
levels <- c("Annual", "Biannual", "4-Monthly", "Quarterly", "2-Monthly", "Monthly")
names(train.agg) <- levels
```

The function returns a list of aggregated time series, ordered from the top (most aggregated) to the bottom (most disagreggated); they are plotted below.
The function returns a list of aggregated time series, ordered from the most aggregated (top of the hierarchy) to the most disagreggated (bottom of the hierarchy). We plot them below.
``` {r temp-agg-plot, dpi=300, fig.show="hold", out.width="100%", out.heigth="100%", fig.align='center', fig.cap="**Figure 2**: The aggregated time series of the temporal hierarchy.", fig.dim=c(6,3.5)}
par(mfrow = c(2, 3), mai = c(0.6, 0.6, 0.5, 0.5))
for (l in levels) {
plot(train.agg[[l]], xlab = "Time", ylab = "Car part sales", main = l)
}
```
<br><br>
We compute the *base forecasts* using `glarma` ([R CRAN](https://cran.r-project.org/package=glarma)),
We compute the *base forecasts* using the package [`glarma`](https://cran.r-project.org/package=glarma),
a package specific for forecasting count time series.
By iterating over the levels of the hierarchy,
we forecast 12 steps ahead at the monthly level, 4 steps ahead at the quarterly level, etc.
At each level, we fit a `glarma` model and we forecast using the function `glarma::forecast.glarma`.
We adopt a Poisson predictive distribution.
Each forecast is constituted by $D=20000$ integer samples.
We forecast 12 steps ahead at the monthly level, 4 steps ahead at the quarterly level, etc. by iterating over the levels of the hierarchy,
At each level, we fit a `glarma` model
with Poisson predictive distribution and we forecast;
each forecast is constituted by 20000 integer samples.


Eventually we collect the samples of the 28 predictive distributions (one at the *Annual* level, two at the *Biannual* level, etc.) in a list.
Expand Down Expand Up @@ -177,9 +171,9 @@ A <- recon.matrices$A
To reconcile using Bottom-Up Important Sampling (BUIS) we
we use the function `reconc_BUIS`.
It requires the $\mathbf{S}$ matrix, the *base forecasts*, the type (`in_type`) of the base forecasts.
The `in_type` argument can be set to "samples" if the base forecasts are in the form of samples, as in our case. Alternatively, it can be set to "params" if the base forecasts are parametric distributions; in this case we pass the the parameters to the reconciliation algorithm.
The `in_type` argument can be set to "samples" if the base forecasts are in the form of samples, as in our case. Alternatively, it can be set to "params" if the base forecasts are parametric distributions; in this case we pass the the parameters to the reconciliation algorithm. This is shown in the Gaussian reconciliation example.

("samples" in this example) and the specification of the base forecasts distributions ("discrete" in this example, since the predictive distributions are Poisson).
We need specifying the type of samples ("discrete" in this example, since the predictive distributions are Poisson).
<mark>se sono samples, perchè è necessario dare il tipo??</mark>


Expand Down Expand Up @@ -237,22 +231,20 @@ metrics
```

Note the improvements of the reconciled forecasts compared to the base forecasts.

# Temporal hierarchy over a smooth time series

We now consider a *monthly* time series (N1485) from the M3 forecasting competition [@makridakis2000m3].
<mark> citerei il pacchetto mcomp da cui l'abbiamo presa </mark>
We select a smooth monthly time series (N1485) from the M3 forecasting competition [@makridakis2000m3]. The data set is available in
the Mcomp package [@mcomp_pkg] and we make available this time series as `bayesRecon::m3_example`.

It is available from `bayesRecon::M3_example`.
<mark>lo chiamerei m3_example</mark>

```{r m3-plot, dpi=300, out.width = "100%", fig.align='center', fig.cap="**Figure 3**: M3 - N1485 time series.", fig.dim = c(6, 3)}
plot(M3_example$train, xlab = "Time", ylab = "y", main = "N1485")
```
<br>
We build the temporal hierarchy using the `temporal_aggregation` function.
If we do not specify the aggregation levels with the `agg_levels` argument, it generates by default all the feasible aggregation; in this case: *2-Monthly*, *Quarterly*, *4-Monthly*, *Biannual*, and *Annual*.

Without specifying `agg_levels`, the function generates by default all the feasible aggregation: 2-Monthly, Quarterly, 4-Monthly, Biannual, and Annual.

```{r m3-agg}
train.agg <- bayesRecon::temporal_aggregation(M3example$train)
Expand All @@ -261,8 +253,9 @@ names(train.agg) <- levels
```


We generate the base forecasts using `ets` from the ([forecast](https://cran.r-project.org/package=forecast)) package [@pkg_forecast]. We predict up
to 18 months ahead; the number*h* of steps ahead forecasted at each level is different. The predictive distribution is Gaussian at every level.
We generate the base forecasts using `ets` from the [forecast](https://cran.r-project.org/package=forecast) package [@pkg_forecast].
At every level we predict 18 months ahead.
All the predictive distributions are Gaussian.
```{r m3-fore}
# install.packages("forecast", dependencies = TRUE)
library(forecast)
Expand Down Expand Up @@ -292,8 +285,6 @@ for (level in train.agg) {
```

Using the function `get_reconc_matrices`, we get matrix $\mathbf{S}$.
<mark>perchè la matrice prende 18 mesi? poi
dovremmo dire come la rappresentiamo</mark>

```{r m3-rmat, dpi=300, out.width = '70%', fig.align='center', fig.cap="**Figure 4**: M3 - The aggregating matrix A (red=1, yellow=0).", fig.dim = c(8, 8)}
rmat <- get_reconc_matrices(agg_levels = c(2, 3, 4, 6, 12), h = 18)
Expand All @@ -306,7 +297,7 @@ axis(1, at=1:ncol(rmat$A), label=1:ncol(rmat$A), las=1)
axis(2, at=c(23,22,19,15,9), label=levels[1:5], las=2)
```
<br>
The function `reconc_gaussian` implements the exact closed-form Gaussian reconciliation.
The function `reconc_gaussian` implements the exact Gaussian reconciliation.
We also run `reconc_BUIS`, to check the consistency
between the two approaches.

Expand Down Expand Up @@ -334,7 +325,7 @@ round(rbind(
))
```

We now compare the *base forecasts* and the *reconciled forecasts* with their prediction intervals.
We now compare *base forecasts* and *reconciled forecasts*:

```{r m3-plotfore, dpi=300, out.width = "100%", fig.align='center', fig.cap="**Figure 5**: M3 - Base and reconciled forecasts. The black line shows the actual data (dashed in the test). The orange line is the mean of the base forecasts, the blu line is the reconciled mean. We also show the 95% prediction intervals.", fig.dim = c(6, 4)}
yhat.mu <- tail(sapply(fc, "[[", 1), 18)
Expand Down Expand Up @@ -369,26 +360,18 @@ polygon(c(xtest, rev(xtest)), c(yreconc.mu, rev(yreconc.lo95)),
```


<mark>
ma è scelta apposta per avere il fcast riconciliato perfetto?
ne sceglierei uno più realistico..
</mark>

# Gaussian reconciliation of a cross-sectional hierarchy

In this example, we consider the *infantgts* hierarchical time series [@hyndman2011optimal].
<mark> citare il pacchetto hts </mark>
It is available via `bayesRecon::infantMortality`.
In this example, we consider the hierarchical time series *infantgts* [@hyndman2011optimal], which is available
from the 'hts' package [@hts_pkg]
We make it also available as `bayesRecon::infantMortality`.

It contains counts of infant mortality (deaths) in Australia, disaggregated by state and sex (male and female).
State codes refer to: New South Wales (NSW), Victoria (VIC), Queensland (QLD), South Australia (SA), Western Australia (WA), Northern Territory (NT), Australian Capital Territory (ACT), and Tasmania (TAS).
These are *yearly* time series (1933-2003) which together constitute
a grouped time series.

We want to forecast the next year, 2004. We use the auto.arima forecasting models from the package `forecast` ([R CRAN](https://cran.r-project.org/package=forecast)) to generate Gaussian predictions for each node of the hierarchy.
<\mark> userei ets anche in questo caso <mark>
#State codes refer to: New South Wales (NSW), Victoria (VIC), Queensland (QLD), South Australia (SA), Western Australia (WA), Northern Territory (NT), Australian Capital Territory (ACT), and Tasmania (TAS).

For each model, we collect the residuals to take into account correlations.
We want to forecast one year ahead. We use 'auto.arima' from the [`forecast` ](https://cran.r-project.org/package=forecast) package.
We collect the residuals, which we will later use to compute the covariance matrix.

```{r infants-forecasts}
# install.packages("forecast", dependencies = TRUE)
Expand Down Expand Up @@ -439,18 +422,10 @@ axis(1, at=1:ncol(A), label=names(infantMortality)[12:27], las=2)
axis(2, at=c(1:11), label=rev(names(infantMortality)[1:11]), las=2)
```
<br>

We can now use Gaussian reconciliation, to ensure *coherent* forecasts.

We require an estimate of the covariance of the base forecasts. We can estimate it
with the sample covariance of the in-sample errors. However, this is often
a poor estimate due to the high variance, especially when the number of series in the hierarchy
is larger than the length of the series. @wickramasuriya2019optimal proposed an estimator
of the covariance that shrinks the off-diagonal elements towards 0.

The function `bayesRecon::schaferStrimmer_cov` implements this estimator by following
@schafer2005shrinkage to compute the optimal
shrinkage intensity.
We use the shrinkage estimator @schafer2005shrinkage
to estimate the covariance matrix of the residuals, as
in @wickramasuriya2019optimal.
The estimator is implemented by the function `bayesRecon::schaferStrimmer_cov`.

```{r infants reconc}
# means
Expand All @@ -461,7 +436,8 @@ print(paste("The estimated shrinkage intensity is", round(shrink.res$lambda_star
Sigma <- shrink.res$shrink_cov
```

The function `reconc_gaussian` implements the closed-form Gaussian reconciliation. It requires the covariance matrix and the vector of means.
We pass the covariance matrix and the vector of means to
the function `reconc_gaussian`, to perform the Gaussian reconciliation.

```{r infants-recon}
recon.gauss <- bayesRecon::reconc_gaussian(S,
Expand All @@ -472,8 +448,5 @@ recon.gauss <- bayesRecon::reconc_gaussian(S,
(A %*% recon.gauss$bottom_reconciled_mean) - recon.gauss$upper_reconciled_mean
```

# BUIS reconciliation of a cross-sectional hierarchy
<mark> LZ </mark>

# References
<div id="refs"></div>
8 changes: 8 additions & 0 deletions vignettes/references.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
@Manual{mcomp_pkg,
title = {Mcomp: Data from the M-Competitions},
author = {Rob Hyndman},
year = {2018},
note = {R package version 2.8},
url = {https://CRAN.R-project.org/package=Mcomp},
}
@Article{pkg_forecast,
title = {Automatic time series forecasting: the forecast package for {R}},
author = {Rob J Hyndman and Yeasmin Khandakar},
Expand Down

0 comments on commit 69a5a64

Please sign in to comment.