-
Notifications
You must be signed in to change notification settings - Fork 477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-884196: Snowflake connector write_pandas inserts datetime64[ns] to Snowflake as an Invalid Date #1677
Comments
Forgot to include the sample data from the dataframe showing the date values expected:
|
Potentially a duplicate of #600? |
It may be related to 600. I did not get the error message described there. It simply wrote invalid values to Snowflake. |
Are all data written as |
The data set used is DCSKINPRODUCT.sas7bdat from SAS-Visual-Forecasting---sample-data-sets -- Query Snowflake to examine the invalid dates
select to_varchar("DATE") from dcskinproduct limit 20;
43113053-09-24 17:00:00.000
43132219-01-30 16:00:00.000
43151384-06-05 17:00:00.000
43170549-10-11 17:00:00.000
43189715-02-16 16:00:00.000
43208880-06-22 17:00:00.000
43228045-10-28 17:00:00.000
43247211-03-05 16:00:00.000
43266376-07-10 17:00:00.000
43285541-11-15 16:00:00.000
43304707-03-23 17:00:00.000
43323872-07-27 17:00:00.000
43343037-12-02 16:00:00.000
43362203-04-09 17:00:00.000
43381368-08-13 17:00:00.000
43400533-12-19 16:00:00.000
43419699-04-25 17:00:00.000
43438864-08-30 17:00:00.000
43458030-01-05 16:00:00.000
43477195-05-13 17:00:00.000 The data type datetime64[ns] is stored internally as an int64 nanoseconds timedelta as of 1970-01-01. Given the Year values Snowflake has here, my suspicion is that the Snowflake connector is evaluating as if the timedelta were in milliseconds. >>> df['DATE'][0], df['DATE'].astype('int64')[0]
(Timestamp('2013-02-10 00:00:00'), 1360454400000000000) select to_varchar(timestampadd('milliseconds', 1360454400000000000, '1970-01-01'::timestamp));
43113053-09-25 00:00:00.000
select to_varchar(timestampadd('nanoseconds', 1360454400000000000, '1970-01-01'::timestamp));
2013-02-10 00:00:00.000 |
For now, I have the following workaround in place, but this should not be required. # Map datetime columns to string (needed because of bug in write_pandas)
for column in df.columns.to_list():
if is_datetime64_dtype(df[column]):
df[column] = df[column].dt.strftime('%Y-%m-%d %H:%M:%S.%f') |
I see. IIRC there is an ongoing rolling out of a feature that could impact how timestamp in parquet is interpreted, and |
I confirmed with @sfc-gh-yuliu and it is related to the new logical type (nano second). There is an ongoing effort to rolling out this change and connector would need to support it with a new param ( |
Duplicate of snowflakedb/snowpark-python#991 |
This has been fixed - please see linked ticket |
Python version
Python 3.10.5
Operating system and processor architecture
Windows-10-10.0.19044-SP0
Installed packages
What did you do?
What did you expect to see?
Expected Pandas datetime column to be written to Snowflake as a TIMESTAMP_NTZ. Dates are not loading properly.
Can you set logging to DEBUG and collect the logs?
The text was updated successfully, but these errors were encountered: