-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support micro seconds precisison during copy unload #492
base: master
Are you sure you want to change the base?
Support micro seconds precisison during copy unload #492
Conversation
@@ -193,4 +193,88 @@ class ConversionsSuite extends FunSuite { | |||
|
|||
assert(expect == result.toString()) | |||
} | |||
|
|||
test("Data with micro-seconds and nano-seconds precision should be correctly converted"){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a unit test. It proves that it can parse timestamp with micro/nano-seconds.
Could you please add an integration test (AKA end-to-end test)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/snowflakedb/spark-snowflake/blob/master/src/it/scala/net/snowflake/spark/snowflake/SnowflakeResultSetRDDSuite.scala#L711 is the test case for new Arrow Format read (non-COPY-UNLOAD).
It is disabled for COPY UNLOAD. It will be great if you can make it work.
// COPY UNLOAD can't be run because it only supports millisecond(0.001s). | ||
if (!params.useCopyUnload) { | ||
val result = sparkSession.sql("select * from test_table_timestamp") | ||
val result = sparkSession.sql("select * from test_table_timestamp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mingli-Rui Yea sorry, haven't finished the change yet, should've mentioned this. Will push another commit this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great to add new test cases instead of changing existing one, e.g. test("testTimestamp copy unload")
.
- With the new test, you can set below options in the new test cases only
thisConnectorOptionsNoTable += ("timestamp_ntz_output_format" -> "YYYY-MM-DD HH24:MI:SS.FF6")
thisConnectorOptionsNoTable += ("timestamp_ltz_output_format" -> "TZHTZM YYYY-MM-DD HH24:MI:SS.FF6")
thisConnectorOptionsNoTable += ("timestamp_tz_output_format" -> "TZHTZM YYYY-MM-DD HH24:MI:SS.FF6")
- With the new test case, you can test the new internal parameter ( I suggested in another comment) can disable/enable the test.
@sfc-gh-mrui The test cases has been fixed please review. Thanks! |
@@ -193,7 +196,23 @@ private[snowflake] object Conversions { | |||
* Parse a string exported from a Snowflake TIMESTAMP column | |||
*/ | |||
private def parseTimestamp(s: String, isInternalRow: Boolean): Any = { | |||
// Need to handle the nano seconds filed separately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add an internal parameter to enable/disable the change? It's enabled by default.
If some users is broken, they can disable the fix as a workaround.
For example, https://github.com/snowflakedb/spark-snowflake/blob/master/src/main/scala/net/snowflake/spark/snowflake/Parameters.scala#LL163C14-L163C14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. do you mean to have a parameter such as internal_support_micro_second_during_unload
? IMO, we should always try to support micro second level precision since by default direct JDBC supports this precision. It would make this part of the code confusing if we have two code paths and two set of timestamp patterns..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree users need to use this fix. This is why we can set the internal parameter as true by default.
This is the internal policy to introduce a parameter to disable the fix if possible.
The internal parameter can be removed later. So the code will be clean up accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Will push a new commit to add this parameter.
@arthurli1126 Let's me know if you are ready for review. So far, below test case failed.
|
@sfc-gh-mrui Yea the tests pass in my local and with instacart's snowflake account(based on aws), I realized the test only failed with azure and gcp copy unload so wondering if there's anything snowflake does differently when unload to gcp/azure. However, I'm not able to verify it due to my company only has aws deployment. Is this something we can help me with or can you provide a testing temp GCP deployment account? |
Looks I can sign up for a trial account with snowflake on GCP. Let me also try that. |
…b#493) Spark connector: 2.11.2 JDBC: 2.13.28
…nowflakedb#502) * SNOW-763124 Support uploading files with down-scoped token for Snowflake Gcs accounts * Update JDBC and SC version
…nowflakedb#504) * SNOW-770051 Fix a potential wrong result issue for crossing schema join/union * Revise error message * Simplify canUseSameConnection() * Use toSet.size == 1 for duplication check.
* SNOW-824420 Update SC version to 2.12.0 * Update spark version in build_image.sh
…:arthurli1126/spark-snowflake into support-micro-seconds-during-copy-unload
Sorry about the long break finally got chance to look into this again. For workflows in different envs, does it need approval from your side to run? @sfc-gh-mrui |
Hi @sfc-gh-mrui, I was wondering if there's a way to trigger the test again? Thanks 🙏 |
Snowflake connector uses SimpleDateFormat (legacy) for data time parsing during copy unload, internally, it only supports milliseconds. For instance, for string
2023-03-01 07:54:56.191173
it would consider it carries 191173 milliseconds so it will add191000 / 1000 / 60 = 3 mins 11s and put 173 microseconds to milliseconds filed:
2023-03-01 07:58:07.173000.