Skip to content

[BUG] Segmentation fault on catboost model during forecasting with prediction intervals #1258

Open
1 task done
Mr-Geekman opened this issue May 3, 2023 · 0 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@Mr-Geekman
Copy link
Contributor

Mr-Geekman commented May 3, 2023

🐛 Bug Report

If you make a forecast with prediction intervals using catboost model the segmentation fault can occur.

The error looks like:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Expected behavior

No errors.

How To Reproduce

import numpy as np
import pandas as pd

from etna.models import CatBoostMultiSegmentModel
from etna.transforms import LagTransform, DateFlagsTransform
from etna.datasets import TSDataset
from etna.pipeline import Pipeline


def get_ts() -> TSDataset:
    rng = np.random.default_rng(0)

    periods = 100
    df1 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df1["segment"] = "segment_1"
    df1["target"] = rng.uniform(10, 20, size=periods)

    df2 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df2["segment"] = "segment_2"
    df2["target"] = rng.uniform(-15, 5, size=periods)

    df = pd.concat([df1, df2]).reset_index(drop=True)
    df = TSDataset.to_dataset(df)
    tsds = TSDataset(df, freq="D")

    return tsds


def main():
    ts = get_ts()
    model = CatBoostMultiSegmentModel(iterations=100)
    transforms = [DateFlagsTransform(), LagTransform(in_column="target", lags=list(range(3, 10)))]
    pipeline = Pipeline(model=model, transforms=transforms, horizon=3)

    pipeline.fit(ts)
    pipeline.forecast(prediction_interval=True)


if __name__ == "__main__":
    main()

Observations:

  • Problem happens inside backtest in _forecast_backtest_pipeline method on second pipeline
  • If you call backtest instead of forecast error doesn't happen
  • If you rewrite _run_all_folds without parallel execution using list-comprehension, the error remains
  • Running catboost in logging_level="Debug" doesn't clear up the situation
  • If you run pipeline.forecast(prediction_interval=True, num_folds=5), the error happens on fold 3
  • If you run pipeline.forecast(prediction_interval=True, num_folds=8), the error happens on fold 1
  • Changing random_seed doesn't change the fold on which pipeline fails
  • Removing operations from tslogger from _forecast_backtest_pipeline doesn't change the error
  • Removing DateFlagsTransform from transforms stops the error
    • It can give a clue that problem can be with categoricals
  • Removing LagTransform from transform doesn't stop the error
  • Setting thread_count=1 doesn't stop the error

I haven't succeeded to reproduce the problem on installation from the scratch, so it isn't really obvious what leads to the problem.

Environment

No response

Additional context

No response

Checklist

  • Bug appears at the latest library version
@Mr-Geekman Mr-Geekman added the bug Something isn't working label May 3, 2023
@github-project-automation github-project-automation bot moved this to Specification in etna board May 3, 2023
@Mr-Geekman Mr-Geekman moved this from Specification to In Review in etna board Jun 2, 2023
@Mr-Geekman Mr-Geekman moved this from In Review to Hold in etna board Jun 2, 2023
@Mr-Geekman Mr-Geekman moved this from Hold to Specification in etna board Jun 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
Status: Specification
Development

No branches or pull requests

1 participant