-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Only allow incompatible cast expressions to run in comet if a config is enabled #362
Conversation
spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala
Outdated
Show resolved
Hide resolved
/** | ||
* Utility for generating markdown documentation from the configs. | ||
* | ||
* This is invoked when running `mvn clean package -DskipTests`. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code had to move to the spark module so that it can access the cast information
// TODO we need to file an issue for adding specific tests for casting | ||
// between decimal types with different precision and scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #375
docs/source/user-guide/configs.md
Outdated
@@ -25,7 +25,7 @@ Comet provides the following configuration settings. | |||
|--------|-------------|---------------| | |||
| spark.comet.ansi.enabled | Comet does not respect ANSI mode in most cases and by default will not accelerate queries when ansi mode is enabled. Enable this setting to test Comet's experimental support for ANSI mode. This should not be used in production. | false | | |||
| spark.comet.batchSize | The columnar batch size, i.e., the maximum number of rows that a batch can contain. | 8192 | | |||
| spark.comet.cast.stringToTimestamp | Comet is not currently fully compatible with Spark when casting from String to Timestamp. | false | | |||
| spark.comet.cast.allowIncompatible | Comet is not currently fully compatible with Spark for all cast operations. Set this config to true to allow them anyway. See compatibility guide for more information. | true | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this document that it defaults to false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Updated.
@@ -65,29 +67,12 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
} else if (!testExists) { | |||
fail(s"Missing test: $expectedTestName") | |||
} | |||
} else if (testExists) { | |||
fail(s"Found test for cast that Spark does not support: $expectedTestName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a particular test that was incorrectly failing due to this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet, but it would fail once we add tests for casting to/from TimestampNTZType because Spark 3.2 doesn't have that one. This change is unrelated to this PR though, so happy to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the list of supported types wouldn't include TimestampNTZType when running against Spark 3.2 .. I will revert this change.
// Filter rows that ends with 's' following by any characters | ||
val queryEndsWith = sql(s"select id from $table where name like '%s'") | ||
checkAnswer(queryEndsWith, Row(3) :: Nil) | ||
withSQLConf(CometConf.COMET_CAST_ALLOW_INCOMPATIBLE.key -> "true") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may need to remove some of these changes because I discovered that some of these tests were failing for me because of a "file already exists" error. I am looking into this now.
def isSupported( | ||
fromType: DataType, | ||
toType: DataType, | ||
timeZoneId: Option[String], | ||
evalMode: String): SupportLevel = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. One drawback of manually hardcoding if a cast is supported, might be it could be out-of-sync with actual situation.
I think we probably can programatically verify the result is correct.
For example, in arrow-rs, it runs through sample data of supported types, and the matrix of supported casts, then runs cast expression to verify if the compatibility info is correct.
But for now this is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. There is definitely more that we can do with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It looks like #340 will take a little longer, so will emrge this now |
…onfig is enabled (apache#362)
Which issue does this PR close?
Part of #286
Rationale for this change
We have discovered numerous compatibility issues with the CAST expression, so we should fall back to Spark for any cast operations that we do not fully support to avoid any data corruption or incorrect results.
This PR adds a new
spark.comet.cast.allowIncompatible
config and implements the mechanism to only allow incompatible casts when this is enabled.What changes are included in this PR?
QueryPlanSerde
and into a newCometCast
objectCometCastSuite
to check that plan ran in Cometcommon
tospark
module and updated it to generate a table showing which cast operations are compatible, incompatible, or unsupportedspark.comet.cast.allowIncompatible
in some tests that depend on incompatible castsHow are these changes tested?
Existing tests in
CometCastSuite