SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991

dwelden · 2023-08-04T14:44:00Z

Please answer these questions before submitting your issue. Thanks!

What version of Python are you using?

Python 3.10.5
What operating system and processor architecture are you using?

Windows-10-10.0.19044-SP0
What are the component versions in the environment (pip freeze)?
pipfreeze.txt
What did you do?

import pandas as pd
from snowflake.snowpark import Session

df = pd.read_sas(source_file, format='sas7bdat', encoding='cp1252')
session.write_pandas(df, target_table, overwrite=overwrite)

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 152130 entries, 0 to 152129
Data columns (total 5 columns):
 #   Column              Non-Null Count   Dtype
---  ------              --------------   -----
 0   ProductKey          152130 non-null  object
 1   DistributionCenter  152130 non-null  object
 2   DATE                152130 non-null  datetime64[ns]
 3   Discount            152130 non-null  float64
 4   Revenue             151830 non-null  float64
dtypes: datetime64[ns](1), float64(2), object(2)
memory usage: 5.8+ MB

>>> df
                                           ProductKey DistributionCenter       DATE  Discount  Revenue
0       Javier's Deep Cleansing Hair Shampoo 2.718 dl               Cary 2013-02-10       0.0      0.0
1       Javier's Deep Cleansing Hair Shampoo 2.718 dl               Cary 2013-02-17       0.0      0.0
2       Javier's Deep Cleansing Hair Shampoo 2.718 dl               Cary 2013-02-24       0.0      0.0
3       Javier's Deep Cleansing Hair Shampoo 2.718 dl               Cary 2013-03-03       0.0      0.0
4       Javier's Deep Cleansing Hair Shampoo 2.718 dl               Cary 2013-03-10       0.0      0.0
...                                               ...                ...        ...       ...      ...
152125            Sujatha's Sensitve skin Lotion 8 dl            Toronto 2015-02-15       0.0  27027.0
152126            Sujatha's Sensitve skin Lotion 8 dl            Toronto 2015-02-22       0.0  28124.0
152127            Sujatha's Sensitve skin Lotion 8 dl            Toronto 2015-03-01       0.0  25574.0
152128            Sujatha's Sensitve skin Lotion 8 dl            Toronto 2015-03-08       0.0  25812.0
152129            Sujatha's Sensitve skin Lotion 8 dl            Toronto 2015-03-15       0.0  25532.0

[152130 rows x 5 columns]

What did you expect to see?

Expected Pandas datetime column to be written to Snowflake as a TIMESTAMP_NTZ. Dates are not loading properly.

PRODUCTKEY	DISTRIBUTIONCENTER	DATE	DISCOUNT	REVENUE
Cleanser	Cary	Invalid Date	0	0
Shampoo	Cary	Invalid Date	0	0
...	...	...	...	...

Can you set logging to DEBUG and collect the logs?

sas2snow2.log

The text was updated successfully, but these errors were encountered:

dwelden · 2023-08-04T14:46:14Z

The data set used is DCSKINPRODUCT.sas7bdat from SAS-Visual-Forecasting---sample-data-sets

DCSKINPRODUCT.zip

-- Query Snowflake to examine the invalid dates
select to_varchar("DATE") from dcskinproduct limit 20;

43113053-09-24 17:00:00.000
43132219-01-30 16:00:00.000
43151384-06-05 17:00:00.000
43170549-10-11 17:00:00.000
43189715-02-16 16:00:00.000
43208880-06-22 17:00:00.000
43228045-10-28 17:00:00.000
43247211-03-05 16:00:00.000
43266376-07-10 17:00:00.000
43285541-11-15 16:00:00.000
43304707-03-23 17:00:00.000
43323872-07-27 17:00:00.000
43343037-12-02 16:00:00.000
43362203-04-09 17:00:00.000
43381368-08-13 17:00:00.000
43400533-12-19 16:00:00.000
43419699-04-25 17:00:00.000
43438864-08-30 17:00:00.000
43458030-01-05 16:00:00.000
43477195-05-13 17:00:00.000

The data type datetime64[ns] is stored internally as an int64 nanoseconds timedelta as of 1970-01-01. Given the Year values Snowflake has here, my suspicion is that the write_pandas is evaluating as if the timedelta were in milliseconds.

>>> df['DATE'][0], df['DATE'].astype('int64')[0]
(Timestamp('2013-02-10 00:00:00'), 1360454400000000000)

select to_varchar(timestampadd('milliseconds', 1360454400000000000, '1970-01-01'::timestamp));
43113053-09-25 00:00:00.000

select to_varchar(timestampadd('nanoseconds', 1360454400000000000, '1970-01-01'::timestamp));
2013-02-10 00:00:00.000

dwelden · 2023-08-04T14:47:20Z

For now, I have the following workaround in place, but this should not be required.

    # Map datetime columns to string (needed because of bug in write_pandas)
    for column in df.columns.to_list():
        if is_datetime64_dtype(df[column]):
            df[column] = df[column].dt.strftime('%Y-%m-%d %H:%M:%S.%f')

ericpettengill · 2023-11-09T17:39:52Z

linking snowflake-connector-python#600 and comment. New release of snowflake-connector-python adds use_logical_type param

sfc-gh-aalam · 2023-11-09T19:13:46Z

closing this issue as use_logical_type is working as expected. Please open a new issue if you see more issues.

dwelden · 2023-11-14T14:29:56Z

When will this paramter be available in Snowpark? The new parameter is not available in Snowpark for Python version 1.9.0 or 1.10.0 snowflake.snowpark.Session.write_pandas.

>>> session.write_pandas(df, target_table, use_logical_type=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Session.write_pandas() got an unexpected keyword argument 'use_logical_type'
>>> from snowflake.snowpark.version import VERSION as snowpark_version
>>> print(snowpark_version)
(1, 10, 0)

sfc-gh-aalam · 2023-11-14T18:18:00Z

It is coming in the next release.

sfc-gh-aalam · 2023-11-14T18:20:33Z

My bad. I thought this was snowflake-connector-python issue

ameryisafreeelf · 2024-07-31T21:04:24Z

BUMP- I'm currently running into related issues, happy to open a new issue but I assume this one is still open for a good reason. Thanks in advance for any eyes here.

Edit:
Python deps

Python 3.10.13
pandas 2.2.2
snowflake 0.6.0
snowflake-connector-python 3.7.1
snowflake-snowpark-python 1.13.0

I'm successfully passing use_logical_type to write_pandas(), but at the Snowflake level, this seems to be allowing the COPY from staged Parquet -> Snowflake to write junk into tables. Most notably, I have the following case:

I'm taking a datetime.datetime that gets written to parquet with use_logical_type. I believe this allows for parquet-specific encoding (under the hood, storing the timestamp as an integer offset from a unix epoch).
The copy command from Parquet to Snowflake results in Invalid date being written to a column of type TIMESTAMP_NTZ(9).

As a user, this is hard for me to triage because the staged parquets are difficult for me to peek into. In my opinion, the write_pandas command should not be able to write something like Invalid Date to Snowflake in a TIMESTAMP column. It'd be nice if Snowflake couldn't store that to begin with, but I assume there's a good reason for that and it's a layer deeper than the issue at hand.

Is there a good workaround for this? My current fix is just to avoid using use_logical_type altogether.

dwelden added bug Something isn't working needs triage Initial RCA is required labels Aug 4, 2023

github-actions bot changed the title ~~write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date~~ SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date Aug 4, 2023

sfc-gh-stan mentioned this issue Oct 2, 2023

SNOW-884196: Snowflake connector write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date snowflakedb/snowflake-connector-python#1677

Closed

sfc-gh-aalam closed this as completed Nov 9, 2023

sfc-gh-aalam reopened this Nov 14, 2023

sfc-gh-aalam mentioned this issue Nov 14, 2023

SNOW-870432: use_logical_type for inferring timezone in pandas df #1134

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991

SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991

dwelden commented Aug 4, 2023 •

edited

Loading

dwelden commented Aug 4, 2023 •

edited

Loading

dwelden commented Aug 4, 2023

ericpettengill commented Nov 9, 2023

sfc-gh-aalam commented Nov 9, 2023

dwelden commented Nov 14, 2023 •

edited

Loading

sfc-gh-aalam commented Nov 14, 2023

sfc-gh-aalam commented Nov 14, 2023

ameryisafreeelf commented Jul 31, 2024 •

edited

Loading

SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991

SNOW-886649: write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #991

Comments

dwelden commented Aug 4, 2023 • edited Loading

dwelden commented Aug 4, 2023 • edited Loading

dwelden commented Aug 4, 2023

ericpettengill commented Nov 9, 2023

sfc-gh-aalam commented Nov 9, 2023

dwelden commented Nov 14, 2023 • edited Loading

sfc-gh-aalam commented Nov 14, 2023

sfc-gh-aalam commented Nov 14, 2023

ameryisafreeelf commented Jul 31, 2024 • edited Loading

dwelden commented Aug 4, 2023 •

edited

Loading

dwelden commented Aug 4, 2023 •

edited

Loading

dwelden commented Nov 14, 2023 •

edited

Loading

ameryisafreeelf commented Jul 31, 2024 •

edited

Loading