Support micro seconds precisison during copy unload #492

arthurli1126 · 2023-03-08T17:41:20Z

Snowflake connector uses SimpleDateFormat (legacy) for data time parsing during copy unload, internally, it only supports milliseconds. For instance, for string 2023-03-01 07:54:56.191173 it would consider it carries 191173 milliseconds so it will add
191000 / 1000 / 60 = 3 mins 11s and put 173 microseconds to milliseconds filed:
2023-03-01 07:58:07.173000.

sfc-gh-mrui · 2023-03-10T22:48:10Z

src/test/scala/net/snowflake/spark/snowflake/ConversionsSuite.scala

@@ -193,4 +193,88 @@ class ConversionsSuite extends FunSuite {

    assert(expect == result.toString())
  }
+
+  test("Data with micro-seconds and nano-seconds precision should be correctly converted"){


This is a unit test. It proves that it can parse timestamp with micro/nano-seconds.
Could you please add an integration test (AKA end-to-end test)?

sfc-gh-mrui

https://github.com/snowflakedb/spark-snowflake/blob/master/src/it/scala/net/snowflake/spark/snowflake/SnowflakeResultSetRDDSuite.scala#L711 is the test case for new Arrow Format read (non-COPY-UNLOAD).
It is disabled for COPY UNLOAD. It will be great if you can make it work.

sfc-gh-mrui · 2023-03-15T20:10:33Z

src/it/scala/net/snowflake/spark/snowflake/SnowflakeResultSetRDDSuite.scala

-    // COPY UNLOAD can't be run because it only supports millisecond(0.001s).
-    if (!params.useCopyUnload) {
-      val result = sparkSession.sql("select * from test_table_timestamp")
+    val result = sparkSession.sql("select * from test_table_timestamp")


This test case failed.

@Mingli-Rui Yea sorry, haven't finished the change yet, should've mentioned this. Will push another commit this week.

It will be great to add new test cases instead of changing existing one, e.g. test("testTimestamp copy unload").

With the new test, you can set below options in the new test cases only

thisConnectorOptionsNoTable += ("timestamp_ntz_output_format" -> "YYYY-MM-DD HH24:MI:SS.FF6") thisConnectorOptionsNoTable += ("timestamp_ltz_output_format" -> "TZHTZM YYYY-MM-DD HH24:MI:SS.FF6") thisConnectorOptionsNoTable += ("timestamp_tz_output_format" -> "TZHTZM YYYY-MM-DD HH24:MI:SS.FF6")

With the new test case, you can test the new internal parameter ( I suggested in another comment) can disable/enable the test.

arthurli1126 · 2023-03-20T06:12:36Z

@sfc-gh-mrui The test cases has been fixed please review. Thanks!

sfc-gh-mrui · 2023-03-20T21:06:38Z

src/main/scala/net/snowflake/spark/snowflake/Conversions.scala

@@ -193,7 +196,23 @@ private[snowflake] object Conversions {
    * Parse a string exported from a Snowflake TIMESTAMP column
    */
  private def parseTimestamp(s: String, isInternalRow: Boolean): Any = {
+    // Need to handle the nano seconds filed separately


Could you please add an internal parameter to enable/disable the change? It's enabled by default.
If some users is broken, they can disable the fix as a workaround.
For example, https://github.com/snowflakedb/spark-snowflake/blob/master/src/main/scala/net/snowflake/spark/snowflake/Parameters.scala#LL163C14-L163C14

hmm.. do you mean to have a parameter such as internal_support_micro_second_during_unload ? IMO, we should always try to support micro second level precision since by default direct JDBC supports this precision. It would make this part of the code confusing if we have two code paths and two set of timestamp patterns..

I agree users need to use this fix. This is why we can set the internal parameter as true by default.
This is the internal policy to introduce a parameter to disable the fix if possible.
The internal parameter can be removed later. So the code will be clean up accordingly.

Thanks for the suggestion. Will push a new commit to add this parameter.

sfc-gh-mrui · 2023-03-28T18:20:39Z

@arthurli1126 Let's me know if you are ready for review. So far, below test case failed.

23/03/27 19:12:32 INFO DAGScheduler: Job 109 finished: collect at DataTypesIntegrationSuite.scala:244, took 0.006617 s
[info] - filter on timestamp column *** FAILED ***

arthurli1126 · 2023-03-29T07:00:04Z

@sfc-gh-mrui Yea the tests pass in my local and with instacart's snowflake account(based on aws), I realized the test only failed with azure and gcp copy unload so wondering if there's anything snowflake does differently when unload to gcp/azure. However, I'm not able to verify it due to my company only has aws deployment. Is this something we can help me with or can you provide a testing temp GCP deployment account?

arthurli1126 · 2023-03-29T15:17:39Z

Looks I can sign up for a trial account with snowflake on GCP. Let me also try that.

…b#493) Spark connector: 2.11.2 JDBC: 2.13.28

…nowflakedb#502) * SNOW-763124 Support uploading files with down-scoped token for Snowflake Gcs accounts * Update JDBC and SC version

… session sharing (snowflakedb#503)

…nowflakedb#504) * SNOW-770051 Fix a potential wrong result issue for crossing schema join/union * Revise error message * Simplify canUseSameConnection() * Use toSet.size == 1 for duplication check.

* SNOW-824420 Update SC version to 2.12.0 * Update spark version in build_image.sh

…:arthurli1126/spark-snowflake into support-micro-seconds-during-copy-unload

arthurli1126 · 2023-07-23T15:52:31Z

Sorry about the long break finally got chance to look into this again. For workflows in different envs, does it need approval from your side to run? @sfc-gh-mrui

arthurli1126 · 2023-07-28T03:12:45Z

Hi @sfc-gh-mrui, I was wondering if there's a way to trigger the test again? Thanks 🙏

Support micro seconds precisison during copy unload

91398eb

arthurli1126 requested review from sfc-gh-sshankar, sfc-gh-ema, sfc-gh-mrui and sfc-gh-bli as code owners March 8, 2023 17:41

Fixed typo

52adb3e

arthurli1126 mentioned this pull request Mar 8, 2023

Support micro seconds timestamp precision with copy unload #491

Open

sfc-gh-mrui reviewed Mar 10, 2023

View reviewed changes

sfc-gh-mrui suggested changes Mar 10, 2023

View reviewed changes

Support copy unload in IT test timestamp

285795a

sfc-gh-mrui reviewed Mar 15, 2023

View reviewed changes

arthurli1126 added 2 commits March 20, 2023 01:08

Fixed IT test

0dedb0b

Making sure the spark session exists for other test cases

371a675

sfc-gh-mrui reviewed Mar 20, 2023

View reviewed changes

Added supportMicroSecs param and refactor IT test

6151285

arthurli1126 and others added 11 commits June 23, 2023 13:45

Merge branch 'master' into support-micro-seconds-during-copy-unload

b373cd1

SNOW-760569 Bump spark connector and depenencies versions (snowflaked…

afed2e8

…b#493) Spark connector: 2.11.2 JDBC: 2.13.28

SNOW-760569 Upgrade to use JDBC 3.13.29 (snowflakedb#497)

ded0e0d

SNOW-763124 Support uploading files with down-scoped token for Gcs (s…

f6b8e6c

…nowflakedb#502) * SNOW-763124 Support uploading files with down-scoped token for Snowflake Gcs accounts * Update JDBC and SC version

SNOW-796952 Add option to disable pre- and post-action validation for…

86a0fa9

… session sharing (snowflakedb#503)

SNOW-770051 Fix a wrong result issue for crossing schema join/union (s…

4698777

…nowflakedb#504) * SNOW-770051 Fix a potential wrong result issue for crossing schema join/union * Revise error message * Simplify canUseSameConnection() * Use toSet.size == 1 for duplication check.

SNOW-824475 Support Spark 3.4 (snowflakedb#510)

21f4a50

SNOW-824420 Update SC version to 2.12.0 (snowflakedb#512)

0be43cc

* SNOW-824420 Update SC version to 2.12.0 * Update spark version in build_image.sh

pulled latest change from master

5e7bddd

Merge branch 'support-micro-seconds-during-copy-unload' of github.com…

4de6d57

…:arthurli1126/spark-snowflake into support-micro-seconds-during-copy-unload

Fixed merge

aaff19f

arthurli1126 requested a review from sfc-gh-mrui March 13, 2024 04:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support micro seconds precisison during copy unload #492

Support micro seconds precisison during copy unload #492

arthurli1126 commented Mar 8, 2023 •

edited

Loading

sfc-gh-mrui Mar 10, 2023 •

edited

Loading

sfc-gh-mrui left a comment •

edited

Loading

sfc-gh-mrui Mar 15, 2023

arthurli1126 Mar 15, 2023

sfc-gh-mrui Mar 20, 2023

arthurli1126 commented Mar 20, 2023

sfc-gh-mrui Mar 20, 2023

arthurli1126 Mar 20, 2023

sfc-gh-mrui Mar 21, 2023

arthurli1126 Mar 21, 2023

sfc-gh-mrui commented Mar 28, 2023

arthurli1126 commented Mar 29, 2023

arthurli1126 commented Mar 29, 2023

arthurli1126 commented Jul 23, 2023

arthurli1126 commented Jul 28, 2023

Support micro seconds precisison during copy unload #492

Are you sure you want to change the base?

Support micro seconds precisison during copy unload #492

Conversation

arthurli1126 commented Mar 8, 2023 • edited Loading

sfc-gh-mrui Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

sfc-gh-mrui left a comment • edited Loading

Choose a reason for hiding this comment

sfc-gh-mrui Mar 15, 2023

Choose a reason for hiding this comment

arthurli1126 Mar 15, 2023

Choose a reason for hiding this comment

sfc-gh-mrui Mar 20, 2023

Choose a reason for hiding this comment

arthurli1126 commented Mar 20, 2023

sfc-gh-mrui Mar 20, 2023

Choose a reason for hiding this comment

arthurli1126 Mar 20, 2023

Choose a reason for hiding this comment

sfc-gh-mrui Mar 21, 2023

Choose a reason for hiding this comment

arthurli1126 Mar 21, 2023

Choose a reason for hiding this comment

sfc-gh-mrui commented Mar 28, 2023

arthurli1126 commented Mar 29, 2023

arthurli1126 commented Mar 29, 2023

arthurli1126 commented Jul 23, 2023

arthurli1126 commented Jul 28, 2023

arthurli1126 commented Mar 8, 2023 •

edited

Loading

sfc-gh-mrui Mar 10, 2023 •

edited

Loading

sfc-gh-mrui left a comment •

edited

Loading