Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-944062: Implementation and functionality of pivot differs from PySpark and is not user-friendly #1093

Closed
FlorianWilhelm opened this issue Oct 17, 2023 · 3 comments · Fixed by #1130
Labels
feature New feature or request

Comments

@FlorianWilhelm
Copy link

What is the current behavior?

Currently the pivot method is a method of a dataframe as shown in the docs. You thus have to pass the pivot column and even the distinct values for pivoting. It is not possible use pivot in combination with a GroupedDataframe.

This is not compatible to how PySpark implements pivot. Here pivot is a function on a GroupedDataframe. And the specific values don't need to be specified.

What is the desired behavior?

Implement the same behavior as in PySpark.

How would this improve snowflake-snowpark-python?

It will be easier to migrate from PySpark to Snowpark and Pyspark's pivot behaviour is just more user-friendly as in most cases you group before you apply it.

References, Other Background

@FlorianWilhelm FlorianWilhelm added the feature New feature or request label Oct 17, 2023
@github-actions github-actions bot changed the title Usage of pivot differs from PySpark SNOW-944062: Usage of pivot differs from PySpark Oct 17, 2023
@FlorianWilhelm FlorianWilhelm changed the title SNOW-944062: Usage of pivot differs from PySpark Implementation and functionality of pivot differs from PySpark and is not user-friendly Oct 17, 2023
@FlorianWilhelm FlorianWilhelm changed the title Implementation and functionality of pivot differs from PySpark and is not user-friendly SNOW-944062: Implementation and functionality of pivot differs from PySpark and is not user-friendly Oct 17, 2023
@FlorianWilhelm
Copy link
Author

Thanks @sfc-gh-aalam!

@sfc-gh-aalam
Copy link
Contributor

@FlorianWilhelm please note that right now, adding values is still required. I haven't designed a good approach to evaluate all the distinct columns lazily. For now you can use Dataframe.distinct().collect() to collect all distinct values.

@FlorianWilhelm
Copy link
Author

FlorianWilhelm commented Nov 20, 2023

@sfc-gh-aalam Okay, would it then make sense to keep this issue open until this is resolved? I guess API-parity with Spark is an important goal for the acceptance of Snowpark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants