-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement Spark-compatible CAST from string to timestamp types #335
Conversation
@vaibhawvipul You can run |
return Ok(None); | ||
} | ||
|
||
// Define regex patterns and corresponding parsing functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex approach is a good way to quickly get support for all of the format variations and fix the correctness issue but could also be quite expensive. It would be good to add some criterion benchmarks so that we can understand what performance looks like (does not have to be part of this PR).
Thanks @andygrove , setting timestamp as UTC worked! Any idea about this error?
|
seems like, test is failing in ANSI mode, debugging that. |
I have enabled a "string to timestamp" cast test case that passes. We are supporting all the timestamps mentioned in the issue.
Just a note -
|
There is an improved version of the ANSI testing in #351 |
Thanks @vaibhawvipul this is looking great. I will review today. |
I would be fine with handling this in a follow on PR |
This is looking great @vaibhawvipul. I think this is close to being ready to merge and then have some follow on issues for remaining items. I think the one thing I would like to see before merging is that we can run the fuzz test without causing any panics, so just replacing unwraps with error handling. |
@andygrove all unwraps are removed, I am handling errors in all of my functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vaibhawvipul. I think this is at a good point to merge and then we can file issues for follow up items, such as:
- Add support for 3 and 5 digit years
- Testing with try_cast and ANSI
- Passing all fuzz tests
- Updating compatibility guide
As a follow up, I would also suggest adding/verifying support for any timezone. (See this test for instance: datafusion-comet/spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala Line 262 in 064cb47
Also see - https://docs.rs/arrow-cast/50.0.0/arrow_cast/parse/fn.string_to_datetime.html |
I filed #376 for the follow on work |
…pache#335) * casting str to timestamp * fix format * fixing failed tests, using char as pattern * bug fixes * hangling microsecond * make format * bug fixes and core refactor * format code * removing print statements * clippy error * enabling cast timestamp test case * code refactor * comet spark test case * adding all the supported format in test * fallback spark when timestamp not utc * bug fix * bug fix * adding an explainer commit * fix test case * bug fix * bug fix * better error handling for unwrap in fn parse_str_to_time_only_timestamp * remove unwrap from macro * improving error handling * adding tests for invalid inputs * removed all unwraps from timestamp cast functions * code format
Which issue does this PR close?
Closes #328 .
Rationale for this change
Improve compatibility with Apache Spark
What changes are included in this PR?
Add custom implementation of CAST from string to timestamp rather than delegate to DataFusion
How are these changes tested?