[BUG] Error on using `DifferencingTransform` in `AutoRegressivePipeline` #267

d-a-bunin · 2024-03-07T20:51:29Z

🐛 Bug Report

Error on using DifferencingTransform in AutoRegressivePipeline:

ValueError: Test should go after the train without gaps

Some users also told that in similar scenarios they saw error like:

ValueError: Inverse transform can be applied only to full train or test that should be in the future

Expected behavior

No error. Or at least understand the reason of the error and how to avoid it.

How To Reproduce

from loguru import logger

from etna.pipeline import AutoRegressivePipeline
from etna.models import CatBoostMultiSegmentModel
from etna.transforms import DifferencingTransform
from etna.transforms import DateFlagsTransform
from etna.metrics import MAE


def generate_ts():
    import numpy as np
    from etna.datasets import TSDataset
    from etna.datasets import generate_ar_df

    df = generate_ar_df(
        start_time="2020-01-01",
        periods=100,
        n_segments=10,
        freq="D",
        random_seed=0,
    )

    # make strictly positive
    df["target"] = np.abs(df["target"]) + 1

    df_wide = TSDataset.to_dataset(df)
    ts = TSDataset(df=df_wide, freq="D")
    return ts


def main():
    ts = generate_ts()

    transforms = [
        DateFlagsTransform(),
        DifferencingTransform(in_column="target", inplace=True)
    ]
    model = CatBoostMultiSegmentModel()
    pipeline = AutoRegressivePipeline(model=model, transforms=transforms, horizon=7)

    logger.info("Running backtest")
    metrics = [MAE()]
    _ = pipeline.backtest(ts=ts, metrics=metrics, n_folds=3, n_jobs=1)


if __name__ == "__main__":
    main()

Environment

No response

Additional context

No response

Checklist

Bug appears at the latest library version

The text was updated successfully, but these errors were encountered:

d-a-bunin · 2024-03-28T16:20:18Z

It seems like the problem is in the core logic of AutoRegressivePipeline and requirements of DifferencingTransform.

DifferencingTransform requires the data in inverse_transform to always go right after the data that it saw during fit. Otherwise, it can't reconstruct the data because it doesn't know data point that goes before data we are inverse transforming.

In AutoRegressivePipeline we make fit on training data like in Pipeline and doesn't refit it during forecasting, so DifferencingTransform fits only on training data. During forecasting AutoRegressivePipeline forecasts step steps ahead every iteration and at the end of the iteration it calls inverse_transform on forecasted piece. On the second iteration this piece goes after train with gap of size step, and, as a result, DifferencingTransform fails.

I currently doesn't know any obvious way to solve this problem.

d-a-bunin added the bug Something isn't working label Mar 7, 2024

d-a-bunin added this to etna board Mar 7, 2024

d-a-bunin moved this from New to Todo in etna board Mar 7, 2024

github-project-automation bot moved this to New in etna board Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Error on using `DifferencingTransform` in `AutoRegressivePipeline` #267

[BUG] Error on using `DifferencingTransform` in `AutoRegressivePipeline` #267

d-a-bunin commented Mar 7, 2024 •

edited

Loading

d-a-bunin commented Mar 28, 2024

[BUG] Error on using DifferencingTransform in AutoRegressivePipeline #267

[BUG] Error on using DifferencingTransform in AutoRegressivePipeline #267

Comments

d-a-bunin commented Mar 7, 2024 • edited Loading

🐛 Bug Report

Expected behavior

How To Reproduce

Environment

Additional context

Checklist

d-a-bunin commented Mar 28, 2024

[BUG] Error on using `DifferencingTransform` in `AutoRegressivePipeline` #267

[BUG] Error on using `DifferencingTransform` in `AutoRegressivePipeline` #267

d-a-bunin commented Mar 7, 2024 •

edited

Loading