-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Add more cast tests and improve test framework #351
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments LGTM
} | ||
|
||
private def generateDecimals(): DataFrame = { | ||
Seq(BigDecimal("123456.789"), BigDecimal("-123456.789"), BigDecimal("0.0")).toDF("a") | ||
} | ||
|
||
private def generateString(r: Random, chars: String, maxLen: Int): String = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: String
is the only type that does not return DataFrame
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO for this
0.0f, | ||
-0.0f) ++ | ||
Range(0, dataSize).map(_ => r.nextFloat()) | ||
values.toDF("a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add nulls here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have pushed an update to add nulls to all generators except for string. I will address this in a future PR when I change the string generator to also return a DataFrame.
castTest(generateBytes(), DataTypes.BooleanType) | ||
} | ||
|
||
ignore("cast ByteType to ByteType") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the same type cast needed? It seems that assertTestsExist ignores this kind of cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check for redundant casts between same type and removed these tests. Spark should optimize redundant casts out.
ignore("cast long to short") { | ||
castTest(generateLongs, DataTypes.ShortType) | ||
test("all valid cast combinations covered") { | ||
val names = testNames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is awesome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. The nice thing is that this will tell us if Spark adds or removes any supported cast operations in the future.
} | ||
|
||
// try_cast() should always return null for invalid inputs | ||
val df2 = spark.sql(s"select try_cast(a as ${toType.sql}) from t") | ||
val df2 = | ||
spark.sql(s"select a, try_cast(a as ${toType.sql}) from t order by a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if a is null, the try_cast return null and will be treated as invalid input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_cast should return null for null inputs and also for invalid inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I mean. If there is a test to cast an int to string and this is supported and valid, but I have a NULL in my dataset, so try_cast returns NULL as well. will that be treated as false negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input dataset has a mix of valid, invalid, and null values, we shoul expect to see try_cast produce null for the rows that are null or invalid. The test is checking that we produce the same results as Spark, so I think we have everything covered, but maybe I am missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to merge this one so that we can start rebasing the other pending CAST PRs, but let's continue the conversation.
Thanks for the review @kazuyukitanimura. I have addressed your feedback. Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @andygrove LGTM
@@ -176,28 +697,38 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
|
|||
// cast() should throw exception on invalid inputs when ansi mode is enabled | |||
val df = data.withColumn("converted", col("a").cast(toType)) | |||
val (expected, actual) = checkSparkThrows(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was improved to handle the case where no exceptions are thrown (some cast operations are infallible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @andygrove
I think its great, especially with dynamic test discovery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. This is comprehensive. Thanks @andygrove
Which issue does this PR close?
Part of #286
Rationale for this change
Get a better idea of which casts are failing.
What changes are included in this PR?
How are these changes tested?
New tests