SNOW-944062: Implementation and functionality of `pivot` differs from PySpark and is not user-friendly #1093

FlorianWilhelm · 2023-10-17T15:13:12Z

What is the current behavior?

Currently the pivot method is a method of a dataframe as shown in the docs. You thus have to pass the pivot column and even the distinct values for pivoting. It is not possible use pivot in combination with a GroupedDataframe.

This is not compatible to how PySpark implements pivot. Here pivot is a function on a GroupedDataframe. And the specific values don't need to be specified.

What is the desired behavior?

Implement the same behavior as in PySpark.

How would this improve `snowflake-snowpark-python`?

It will be easier to migrate from PySpark to Snowpark and Pyspark's pivot behaviour is just more user-friendly as in most cases you group before you apply it.

References, Other Background

The text was updated successfully, but these errors were encountered:

FlorianWilhelm · 2023-11-17T18:46:48Z

Thanks @sfc-gh-aalam!

sfc-gh-aalam · 2023-11-17T18:51:21Z

@FlorianWilhelm please note that right now, adding values is still required. I haven't designed a good approach to evaluate all the distinct columns lazily. For now you can use Dataframe.distinct().collect() to collect all distinct values.

FlorianWilhelm · 2023-11-20T10:09:01Z

@sfc-gh-aalam Okay, would it then make sense to keep this issue open until this is resolved? I guess API-parity with Spark is an important goal for the acceptance of Snowpark.

FlorianWilhelm added the feature New feature or request label Oct 17, 2023

github-actions bot changed the title ~~Usage of pivot differs from PySpark~~ SNOW-944062: Usage of pivot differs from PySpark Oct 17, 2023

FlorianWilhelm changed the title ~~SNOW-944062: Usage of pivot differs from PySpark~~ Implementation and functionality of pivot differs from PySpark and is not user-friendly Oct 17, 2023

FlorianWilhelm changed the title ~~Implementation and functionality of pivot differs from PySpark and is not user-friendly~~ SNOW-944062: Implementation and functionality of pivot differs from PySpark and is not user-friendly Oct 17, 2023

sfc-gh-aalam mentioned this issue Nov 5, 2023

Add RelationalGroupedDataFrame.pivot() #1130

Merged

5 tasks

sfc-gh-aalam closed this as completed in #1130 Nov 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-944062: Implementation and functionality of `pivot` differs from PySpark and is not user-friendly #1093

SNOW-944062: Implementation and functionality of `pivot` differs from PySpark and is not user-friendly #1093

FlorianWilhelm commented Oct 17, 2023

FlorianWilhelm commented Nov 17, 2023

sfc-gh-aalam commented Nov 17, 2023

FlorianWilhelm commented Nov 20, 2023 •

edited

Loading

SNOW-944062: Implementation and functionality of pivot differs from PySpark and is not user-friendly #1093

SNOW-944062: Implementation and functionality of pivot differs from PySpark and is not user-friendly #1093

Comments

FlorianWilhelm commented Oct 17, 2023

What is the current behavior?

What is the desired behavior?

How would this improve snowflake-snowpark-python?

References, Other Background

FlorianWilhelm commented Nov 17, 2023

sfc-gh-aalam commented Nov 17, 2023

FlorianWilhelm commented Nov 20, 2023 • edited Loading

SNOW-944062: Implementation and functionality of `pivot` differs from PySpark and is not user-friendly #1093

SNOW-944062: Implementation and functionality of `pivot` differs from PySpark and is not user-friendly #1093

How would this improve `snowflake-snowpark-python`?

FlorianWilhelm commented Nov 20, 2023 •

edited

Loading