-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initcap behaves differently in Spark and in DataFusion (also Comet) #1052
Comments
Thanks for reporting this bug. |
DataFusion InitCap doesn't support nonASCII |
already fixed in latest DF
UPD: Spark returns
But reg to DF SQL policy apache/datafusion#13706 DF supports a builtin functions to be in sync with PG which returns |
How is that fixed? :) The diff I showed is exactly the same as the diff you meantion below, no? |
Please refer to message update above |
Looks like unicode init cap is getting fixed in DF apache/datafusion#13752 |
Describe the bug
DataFusion's initcap behaves differently than Spark's. While both do "upper-case the first letter of each word and lowercase others", Spark considers as words anything separated by whitespace (' '), while DataFusion considers anything separated by non-ascii-alphanumeric as words. (DF's code would also fail to uppercase or lowercase non-ascii chars, but that doesn't materialize as a separate issue as it considers them separators already in the first place.)
#1051 shows the problem by adding two cases to the test, one using a dash and one using non-ascii letters (from Finnish).
Steps to reproduce
Call initcap with an input containing non-ascii-alphanumeric non-whitespace characters
Expected behavior
Match Spark
Additional context
No response
The text was updated successfully, but these errors were encountered: