SNOW-870432: use_logical_type for inferring timezone in pandas df #1134

sfc-gh-aalam · 2023-11-08T23:08:10Z

Please answer these questions before submitting your pull requests. Thanks!

What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

Fixes #870432, SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991
Fill out the following pre-review checklist:
- I am adding a new automated test(s) to verify correctness of my new code
- I am adding new logging messages
- I am adding a new telemetry message
- I am adding new credentials
- I am adding a new dependency
Please describe how your code solves the related issue.

Port the fix from snowflake python connector to use_logical_type when inferring datatype from parquet files.

sfc-gh-sfan · 2023-11-09T22:21:45Z

tests/integ/test_dataframe.py

@@ -1478,6 +1479,41 @@ def test_create_dataframe_with_semi_structured_data_types(session):
    )


+@pytest.mark.skipif(not is_pandas_available, reason="pandas is required")
+def test_create_dataframe_with_pandas_df(session):


Do we need to port some changes to sp connector too to support use_logical_type in stored proc?

That's right. We need to port this to sproc too

https://github.com/snowflakedb/Stored-Proc-Python-Connector/pull/140

sfc-gh-sfan · 2023-11-10T17:07:36Z

src/snowflake/snowpark/session.py

@@ -2008,6 +2014,7 @@ def create_dataframe(
                quote_identifiers=True,
                auto_create_table=True,
                table_type="temporary",
+                use_logical_type=True,


Is this a behavior change (i.e. wrong behavior -> correct behavior)? Should we highlight the implication in change log?

I'll update the changelog detailing what exactly is going to change

…timestamp-correctly

sfc-gh-yixie

Code looks good. One comment on the changelog.

sfc-gh-yixie · 2024-01-02T22:38:18Z

CHANGELOG.md

+  - Earlier timestamp columns without a timezone would be inferred as `LongType()` but will now be correctly inferred as `TimestampType(TimestampTimeZone.NTZ)`.
+  - Earlier timestamp columns without a timezone would be converted to nanosecond epochs, but will now be correctly be maintained as timestamp values.


Are these two lines for the same logic branch? If so please consider merging them.

sure. I'll merge them

SNOW-870432: use_logical_type for inferring timezone in pandas dfs

d1a758e

sfc-gh-aalam marked this pull request as ready for review November 9, 2023 19:43

sfc-gh-aalam requested a review from a team as a code owner November 9, 2023 19:43

sfc-gh-aalam requested review from sfc-gh-mkeller and sfc-gh-achandrasekaran November 9, 2023 19:43

sfc-gh-sfan reviewed Nov 9, 2023

View reviewed changes

sfc-gh-aalam added 2 commits November 9, 2023 15:30

changelog and dependency updates

beb7bf0

fix release number

182537f

sfc-gh-sfan reviewed Nov 10, 2023

View reviewed changes

provide additional details about the correct behavior

0ea1145

sfc-gh-aalam requested a review from sfc-gh-yixie November 10, 2023 18:46

sfc-gh-sfan approved these changes Nov 11, 2023

View reviewed changes

sfc-gh-aalam added 8 commits November 13, 2023 15:04

use session param to control behavior

ee7e756

changelog updates

99bd36f

Merge branch 'main' into aalam-SNOW-870432-use-logical-type-to-infer-…

9c10caa

…timestamp-correctly

Merge branch 'main' into aalam-SNOW-870432-use-logical-type-to-infer-…

460895e

…timestamp-correctly

Merge branch 'main' into aalam-SNOW-870432-use-logical-type-to-infer-…

15ad5e5

…timestamp-correctly

merge with main

0f10c16

Merge branch 'main' into aalam-SNOW-870432-use-logical-type-to-infer-…

e74d5c2

…timestamp-correctly

fix merge

ae9fc0f

sfc-gh-aling approved these changes Jan 2, 2024

View reviewed changes

sfc-gh-yixie reviewed Jan 2, 2024

View reviewed changes

sfc-gh-yixie requested a review from sfc-gh-mabrennan January 2, 2024 22:39

simplify changelog

c816451

sfc-gh-yixie approved these changes Jan 3, 2024

View reviewed changes

sfc-gh-aalam merged commit 4b81f5d into main Jan 3, 2024
56 of 57 checks passed

sfc-gh-aalam deleted the aalam-SNOW-870432-use-logical-type-to-infer-timestamp-correctly branch January 3, 2024 18:59

github-actions bot locked and limited conversation to collaborators Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-870432: use_logical_type for inferring timezone in pandas df #1134

SNOW-870432: use_logical_type for inferring timezone in pandas df #1134

sfc-gh-aalam commented Nov 8, 2023 •

edited

Loading

sfc-gh-sfan Nov 9, 2023

sfc-gh-aalam Nov 9, 2023

sfc-gh-aalam Nov 9, 2023

sfc-gh-sfan Nov 10, 2023

sfc-gh-aalam Nov 10, 2023

sfc-gh-yixie left a comment

sfc-gh-yixie Jan 2, 2024

sfc-gh-aalam Jan 2, 2024

		- Earlier timestamp columns without a timezone would be inferred as `LongType()` but will now be correctly inferred as `TimestampType(TimestampTimeZone.NTZ)`.
		- Earlier timestamp columns without a timezone would be converted to nanosecond epochs, but will now be correctly be maintained as timestamp values.

SNOW-870432: use_logical_type for inferring timezone in pandas df #1134

SNOW-870432: use_logical_type for inferring timezone in pandas df #1134

Conversation

sfc-gh-aalam commented Nov 8, 2023 • edited Loading

sfc-gh-sfan Nov 9, 2023

Choose a reason for hiding this comment

sfc-gh-aalam Nov 9, 2023

Choose a reason for hiding this comment

sfc-gh-aalam Nov 9, 2023

Choose a reason for hiding this comment

sfc-gh-sfan Nov 10, 2023

Choose a reason for hiding this comment

sfc-gh-aalam Nov 10, 2023

Choose a reason for hiding this comment

sfc-gh-yixie left a comment

Choose a reason for hiding this comment

sfc-gh-yixie Jan 2, 2024

Choose a reason for hiding this comment

sfc-gh-aalam Jan 2, 2024

Choose a reason for hiding this comment

sfc-gh-aalam commented Nov 8, 2023 •

edited

Loading