Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Segmentation fault on catboost model during forecasting with prediction intervals #30

Closed
1 task done
Mr-Geekman opened this issue Aug 14, 2023 · 1 comment
Closed
1 task done
Labels
bug Something isn't working

Comments

@Mr-Geekman
Copy link

Issue by Mr-Geekman
Wednesday May 03, 2023 at 13:53 GMT
Originally opened as tinkoff-ai#1258


🐛 Bug Report

If you make a forecast with prediction intervals using catboost model the segmentation fault can occur.

The error looks like:

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

Expected behavior

No errors.

How To Reproduce

import numpy as np
import pandas as pd

from etna.models import CatBoostMultiSegmentModel
from etna.transforms import LagTransform, DateFlagsTransform
from etna.datasets import TSDataset
from etna.pipeline import Pipeline


def get_ts() -> TSDataset:
    rng = np.random.default_rng(0)

    periods = 100
    df1 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df1["segment"] = "segment_1"
    df1["target"] = rng.uniform(10, 20, size=periods)

    df2 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
    df2["segment"] = "segment_2"
    df2["target"] = rng.uniform(-15, 5, size=periods)

    df = pd.concat([df1, df2]).reset_index(drop=True)
    df = TSDataset.to_dataset(df)
    tsds = TSDataset(df, freq="D")

    return tsds


def main():
    ts = get_ts()
    model = CatBoostMultiSegmentModel(iterations=100)
    transforms = [DateFlagsTransform(), LagTransform(in_column="target", lags=list(range(3, 10)))]
    pipeline = Pipeline(model=model, transforms=transforms, horizon=3)

    pipeline.fit(ts)
    pipeline.forecast(prediction_interval=True)


if __name__ == "__main__":
    main()

Observations:

  • Problem happens inside backtest in _forecast_backtest_pipeline method on second pipeline
  • If you call backtest instead of forecast error doesn't happen
  • If you rewrite _run_all_folds without parallel execution using list-comprehension, the error remains
  • Running catboost in logging_level="Debug" doesn't clear up the situation
  • If you run pipeline.forecast(prediction_interval=True, num_folds=5), the error happens on fold 3
  • If you run pipeline.forecast(prediction_interval=True, num_folds=8), the error happens on fold 1
  • Changing random_seed doesn't change the fold on which pipeline fails
  • Removing operations from tslogger from _forecast_backtest_pipeline doesn't change the error
  • Removing DateFlagsTransform from transforms stops the error
    • It can give a clue that problem can be with categoricals
  • Removing LagTransform from transform doesn't stop the error
  • Setting thread_count=1 doesn't stop the error

I haven't succeeded to reproduce the problem on installation from the scratch, so it isn't really obvious what leads to the problem.

Environment

No response

Additional context

No response

Checklist

  • Bug appears at the latest library version
@Mr-Geekman Mr-Geekman added the bug Something isn't working label Aug 14, 2023
@Mr-Geekman Mr-Geekman moved this to Specification in etna board Aug 15, 2023
@d-a-bunin
Copy link
Collaborator

I haven't been able to reproduce it on new installations, let's close it.

@d-a-bunin d-a-bunin closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2024
@github-project-automation github-project-automation bot moved this from Specification to Done in etna board Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants