-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing ColumnarToRow when using CometSparkToColumnar #1092
Comments
I'm in the process of creating the first 0.4.0 release candidate and am uploading the jar files to a maven staging repository. It may be worth testing with this newer version. I'll post more details once the jars are available. |
@bmorck You can download 0.4.0-rc1 jar files from https://repository.apache.org/#nexus-search;quick~org.apache.datafusion |
I slightly updated the description to make the query plan readable. |
Removing CometSparkToColumnar + ColumnarToRow from the top of BatchScan (20) looks correct if the Filter (21) cannot be converted to CometFilter.
I'm not sure how is it related to the issue on Project (19)? |
@viirya It's not clear to me that the issue is related to the issue on Project (19) however, I noticed that the appropriate
@huaxingao I didn't port over the PR yet, that's my next step and perhaps that resolves the issue. Will report here if I find that to be the case @andygrove Thanks for providing the new jar! Will try it out. We also are using an internal fork of comet (current changes are only enabling Spark 3.3 on some of the version restricted operators) so I will also rebase to include latest changes |
The only place in Comet planner to remove ColumnarToRowExec is when there is a combination ColumnarToRowExec + CometSparkToColumnarExec. As the combination is a no-op actually, we can remove it from the query plan. And it is removed from the top of BatchScan (20), it is not even close to Project (19), so I'm wondering why it is related. 🤔 Do you have an example query we can reproduce this? |
Describe the bug
I'm working on some internal benchmarks using Comet 0.3.0 with Spark 3.3 and Iceberg. To support iceberg, we are including the config, which inserts the
CometSparkToColumnar
nodes following the BatchScansIn the following example, we see that there is a missing
ColumnarToRow
preceding theProject (19)
operator. This results in the query failing. After analyzing the query optimization, I've found that theEliminateRedundantTransitions
rule, removed aCometSparkToColumnar
and subsequentColumnarToRow
following theBatchScan (20)
operator, due to the subsequentFilter (21)
operator requiring row format.I've modified the filter to be able to be converted to native and we see the query inserts the appropriate
ColumnarToRow
transitions, as shown below. It's unclear if this is a bug in SparksApplyColumnarRulesAndInsertTransitions
rule or if this is unique to Comet, but it seems like incorrect behavior when using theCometSparkToColumnarNode
Steps to reproduce
No response
Expected behavior
All appropriate
ColumnarToRow
operators are inserted and query succeedsAdditional context
No response
The text was updated successfully, but these errors were encountered: