Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Train with Ludwig does not return panda dataframe on Colab #1163

Closed
2 tasks done
xzdandy opened this issue Sep 19, 2023 · 3 comments · Fixed by #1164
Closed
2 tasks done

Model Train with Ludwig does not return panda dataframe on Colab #1163

xzdandy opened this issue Sep 19, 2023 · 3 comments · Fixed by #1164
Assignees
Labels
Bug 🐞 EVA is not working as expected
Milestone

Comments

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 19, 2023

Search before asking

  • I have searched the EvaDB issues and found no similar bug report.

Bug

09-19-2023 05:49:28 ERROR [plan_executor:plan_executor.py:execute_plan:0182] Batch constructor not properly called.
Expected pandas.DataFrame
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/limit_executor.py", line 40, in exec
    for batch in child_executor.exec(**kwargs):
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/project_executor.py", line 45, in exec
    batch = apply_project(batch, self.target_list, self.catalog())
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/executor_utils.py", line 46, in apply_project
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/executor_utils.py", line 46, in <listcomp>
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/usr/local/lib/python3.10/dist-packages/evadb/expression/function_expression.py", line 129, in evaluate
    outcomes = self._apply_function_expression(func, batch, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/evadb/expression/function_expression.py", line 188, in _apply_function_expression
    return func_args.apply_function_expression(func)
  File "/usr/local/lib/python3.10/dist-packages/evadb/models/storage/batch.py", line 173, in apply_function_expression
    return Batch(expr(self._frames))
  File "/usr/local/lib/python3.10/dist-packages/evadb/models/storage/batch.py", line 43, in __init__
    raise ValueError(
ValueError: Batch constructor not properly called.
Expected pandas.DataFrame

Environment

https://colab.research.google.com/drive/1omSVrAOQvWmGkdbGSTWyMlQA90n1eWvQ#scrollTo=_7m-QQG5U3_C

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@xzdandy xzdandy added the Bug 🐞 EVA is not working as expected label Sep 19, 2023
@xzdandy xzdandy added this to the v0.3.5 milestone Sep 19, 2023
@xzdandy xzdandy self-assigned this Sep 19, 2023
@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 19, 2023

The problem can not be reproduced on a local linux environment. The suspicious is the backend changes on Colab and a Dask dataframe is returned.

@xzdandy xzdandy linked a pull request Sep 19, 2023 that will close this issue
2 tasks
@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 19, 2023

Interestingly, both colab and local environment have dask==2023.3.2 installed, which makes a difference is distributed==2023.8.1 is installed on Colab but not locally. However, with better error message, we can confirm the return type of LudwigModel.predict becomes dask.dataframe.core.DataFrame instead of pandas.DataFrame.

09-19-2023 06:56:48 ERROR [plan_executor:plan_executor.py:execute_plan:0182] Batch constructor not properly called.
Expected pandas.DataFrame, got <class 'dask.dataframe.core.DataFrame'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/plan_executor.py", line 178, in execute_plan
    yield from output
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/limit_executor.py", line 40, in exec
    for batch in child_executor.exec(**kwargs):
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/project_executor.py", line 45, in exec
    batch = apply_project(batch, self.target_list, self.catalog())
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/executor_utils.py", line 46, in apply_project
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/usr/local/lib/python3.10/dist-packages/evadb/executor/executor_utils.py", line 46, in <listcomp>
    batches = [expr.evaluate(batch) for expr in project_list]
  File "/usr/local/lib/python3.10/dist-packages/evadb/expression/function_expression.py", line 129, in evaluate
    outcomes = self._apply_function_expression(func, batch, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/evadb/expression/function_expression.py", line 188, in _apply_function_expression
    return func_args.apply_function_expression(func)
  File "/usr/local/lib/python3.10/dist-packages/evadb/models/storage/batch.py", line 174, in apply_function_expression
    return Batch(expr(self._frames))
  File "/usr/local/lib/python3.10/dist-packages/evadb/models/storage/batch.py", line 43, in __init__
    raise ValueError(
ValueError: Batch constructor not properly called.
Expected pandas.DataFrame, got <class 'dask.dataframe.core.DataFrame'>

@xzdandy
Copy link
Collaborator Author

xzdandy commented Sep 19, 2023

Installing distributed==2023.8.1 locally, we still can not reproduce the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐞 EVA is not working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant