Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inconsistent state after TSDataset.train_test_split #272

Open
1 task done
d-a-bunin opened this issue Mar 18, 2024 · 0 comments · May be fixed by #545
Open
1 task done

[BUG] Inconsistent state after TSDataset.train_test_split #272

d-a-bunin opened this issue Mar 18, 2024 · 0 comments · May be fixed by #545
Assignees
Labels
bug Something isn't working

Comments

@d-a-bunin
Copy link
Collaborator

d-a-bunin commented Mar 18, 2024

🐛 Bug Report

We have an inconsistent state after TSDataset.train_test_split if there were any transforms applied to dataset. Inconsistency is observed between columns and regressors.

Expected behavior

Regressors are subset of columns.

How To Reproduce

Code:

import pandas as pd
from etna.datasets import TSDataset
from etna.transforms import LagTransform


def main():
    df = pd.read_csv("examples/data/example_dataset.csv")
    ts = TSDataset(df=TSDataset.to_dataset(df), freq="D")

    transform = LagTransform(in_column="target", lags=[1, 2, 3], out_column="lags")
    ts.fit_transform([transform])
    print(f"TS features: {ts.columns.get_level_values('feature')}")
    print(f"TS regressors: {ts.regressors}")

    ts_train, ts_test = ts.train_test_split(test_size=24)
    print(f"TS-train features: {ts_train.columns.get_level_values('feature')}")
    print(f"TS-train regressors: {ts_train.regressors}")


if __name__ == "__main__":
    main()

Result:

TS features: Index(['lags_1', 'lags_2', 'lags_3', 'target', 'lags_1', 'lags_2', 'lags_3',
       'target', 'lags_1', 'lags_2', 'lags_3', 'target', 'lags_1', 'lags_2',
       'lags_3', 'target'],
      dtype='object', name='feature')
TS regressors: ['lags_1', 'lags_3', 'lags_2']
TS-train features: Index(['target', 'target', 'target', 'target'], dtype='object', name='feature')
TS-train regressors: ['lags_1', 'lags_3', 'lags_2']

Environment

No response

Additional context

No response

Checklist

  • Bug appears at the latest library version
@d-a-bunin d-a-bunin added the bug Something isn't working label Mar 18, 2024
@d-a-bunin d-a-bunin moved this from New to Specification in etna board Mar 18, 2024
@d-a-bunin d-a-bunin moved this from Specification to Todo in etna board Mar 18, 2024
@d-a-bunin d-a-bunin moved this from Todo to In Progress in etna board Dec 25, 2024
@d-a-bunin d-a-bunin self-assigned this Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

1 participant