Missing ColumnarToRow when using CometSparkToColumnar #1092

bmorck · 2024-11-15T19:34:49Z

Describe the bug

I'm working on some internal benchmarks using Comet 0.3.0 with Spark 3.3 and Iceberg. To support iceberg, we are including the config, which inserts the CometSparkToColumnar nodes following the BatchScans

"spark.comet.sparkToColumnar.enabled" = "true"
"spark.comet.sparkToColumnar.supportedOperatorList" = "BatchScan"

In the following example, we see that there is a missing ColumnarToRow preceding the Project (19) operator. This results in the query failing. After analyzing the query optimization, I've found that the EliminateRedundantTransitions rule, removed a CometSparkToColumnar and subsequent ColumnarToRow following the BatchScan (20) operator, due to the subsequent Filter (21) operator requiring row format.

> CometSort (27)
>    +- CometColumnarExchange (26)
>       +- BroadcastNestedLoopJoin Cross BuildRight (25)
>          :- Project (19)
>          :  +- CometSortMergeJoin (18)
>          :     :- CometSort (6)
>          :     :  +- ShuffleQueryStage (5)
>          :     :     +- CometExchange (4)
>          :     :        +- CometFilter (3)
>          :     :           +- CometSparkToColumnar (2)
>          :     :              +- BatchScan (1)
>          :     +- CometSort (17)
>          :        +- CometExchange (16)
>          :           +- CometFilter (15)
>          :              +- CometHashAggregate (14)
>          :                 +- ShuffleQueryStage (13)
>          :                    +- CometExchange (12)
>          :                       +- CometHashAggregate (11)
>          :                          +- CometProject (10)
>          :                             +- CometFilter (9)
>          :                                +- CometSparkToColumnar (8)
>          :                                   +- BatchScan (7)
>          +- BroadcastQueryStage (24)
>             +- BroadcastExchange (23)
>                +- * Project (22)
>                   +- * Filter (21)
>                      +- BatchScan (20)
>

I've modified the filter to be able to be converted to native and we see the query inserts the appropriate ColumnarToRow transitions, as shown below. It's unclear if this is a bug in Sparks ApplyColumnarRulesAndInsertTransitions rule or if this is unique to Comet, but it seems like incorrect behavior when using the CometSparkToColumnarNode

>    ColumnarToRow (34)
>    +- CometSort (33)
>       +- AQEShuffleRead (32)
>          +- ShuffleQueryStage (31), Statistics(sizeInBytes=4.2 KiB, rowCount=36)
>             +- CometColumnarExchange (30)
>                +- BroadcastNestedLoopJoin Cross BuildRight (29)
>                   :- Project (21)
>                   :  +- ColumnarToRow (20)
>                   :     +- CometBroadcastHashJoin (19)
>                   :        :- BroadcastQueryStage (8), Statistics(sizeInBytes=319.7 KiB, rowCount=583)
>                   :        :  +- CometBroadcastExchange (7)
>                   :        :     +- AQEShuffleRead (6)
>                   :        :        +- ShuffleQueryStage (5), Statistics(sizeInBytes=409.3 KiB, rowCount=583)
>                   :        :           +- CometExchange (4)
>                   :        :              +- CometFilter (3)
>                   :        :                 +- CometSparkToColumnar (2)
>                   :        :                    +- BatchScan (1)
>                   :        +- CometFilter (18)
>                   :           +- CometHashAggregate (17)
>                   :              +- AQEShuffleRead (16)
>                   :                 +- ShuffleQueryStage (15), Statistics(sizeInBytes=2040.0 B, rowCount=10)
>                   :                    +- CometExchange (14)
>                   :                       +- CometHashAggregate (13)
>                   :                          +- CometProject (12)
>                   :                             +- CometFilter (11)
>                   :                                +- CometSparkToColumnar (10)
>                   :                                   +- BatchScan (9)
>                   +- BroadcastQueryStage (28), Statistics(sizeInBytes=864.0 B, rowCount=36)
>                      +- BroadcastExchange (27)
>                         +- ColumnarToRow (26)
>                            +- CometProject (25)
>                               +- CometFilter (24)
>                                  +- CometSparkToColumnar (23)
>                                     +- BatchScan (22)

Steps to reproduce

No response

Expected behavior

All appropriate ColumnarToRow operators are inserted and query succeeds

Additional context

No response

The text was updated successfully, but these errors were encountered:

andygrove · 2024-11-15T21:09:52Z

I'm in the process of creating the first 0.4.0 release candidate and am uploading the jar files to a maven staging repository. It may be worth testing with this newer version. I'll post more details once the jars are available.

andygrove · 2024-11-15T21:24:01Z

@bmorck You can download 0.4.0-rc1 jar files from https://repository.apache.org/#nexus-search;quick~org.apache.datafusion

huaxingao · 2024-11-15T21:28:53Z

@bmorck Thanks for reporting the bug. When you ran the benchmark, did you use the changes from the upstream Iceberg PR?

viirya · 2024-11-15T22:10:56Z

I slightly updated the description to make the query plan readable.

viirya · 2024-11-15T22:27:13Z

After analyzing the query optimization, I've found that the EliminateRedundantTransitions rule, removed a CometSparkToColumnar and subsequent ColumnarToRow following the BatchScan (20) operator, due to the subsequent Filter (21) operator requiring row format.

Removing CometSparkToColumnar + ColumnarToRow from the top of BatchScan (20) looks correct if the Filter (21) cannot be converted to CometFilter.

In the following example, we see that there is a missing ColumnarToRow preceding the Project (19) operator. This results in the query failing.

I'm not sure how is it related to the issue on Project (19)?

bmorck · 2024-11-15T23:11:47Z

@viirya It's not clear to me that the issue is related to the issue on Project (19) however, I noticed that the appropriate ColumnarToRow operators are injected when the filter is able to be converted to native and hence the EliminateRedundantTransitions rule doesn't remove any nodes. I've seen this occur other queries as well. Here is another example:

> +- AdaptiveSparkPlan (46)
>    +- == Current Plan ==
>       Sort (26)
>       +- Project (25)
>          +- Filter (24)
>             +- Window (23)
>                +- Sort (22)
>                   +- Exchange (21)
>                      +- Union (20)
>                         :- Project (10)
>                         :  +- Filter (9)
>                         :     +- Window (8)
>                         :        +- CometSort (7)
>                         :           +- ShuffleQueryStage (6)
>                         :              +- CometExchange (5)
>                         :                 +- CometProject (4)
>                         :                    +- CometFilter (3)
>                         :                       +- CometSparkToColumnar (2)
>                         :                          +- BatchScan (1)
>                         +- Project (19)
>                            +- Filter (18)
>                               +- Window (17)
>                                  +- Sort (16)
>                                     +- ShuffleQueryStage (15)
>                                        +- Exchange (14)
>                                           +- Project (13)
>                                              +- Filter (12)
>                                                 +- BatchScan (11)

@huaxingao I didn't port over the PR yet, that's my next step and perhaps that resolves the issue. Will report here if I find that to be the case

@andygrove Thanks for providing the new jar! Will try it out. We also are using an internal fork of comet (current changes are only enabling Spark 3.3 on some of the version restricted operators) so I will also rebase to include latest changes

viirya · 2024-11-15T23:26:25Z

The only place in Comet planner to remove ColumnarToRowExec is when there is a combination ColumnarToRowExec + CometSparkToColumnarExec. As the combination is a no-op actually, we can remove it from the query plan.

And it is removed from the top of BatchScan (20), it is not even close to Project (19), so I'm wondering why it is related. 🤔

Do you have an example query we can reproduce this?

bmorck added the bug Something isn't working label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ColumnarToRow when using CometSparkToColumnar #1092

Missing ColumnarToRow when using CometSparkToColumnar #1092

bmorck commented Nov 15, 2024 •

edited by viirya

Loading

andygrove commented Nov 15, 2024

andygrove commented Nov 15, 2024

huaxingao commented Nov 15, 2024

viirya commented Nov 15, 2024

viirya commented Nov 15, 2024 •

edited

Loading

bmorck commented Nov 15, 2024 •

edited by viirya

Loading

viirya commented Nov 15, 2024

Missing ColumnarToRow when using CometSparkToColumnar #1092

Missing ColumnarToRow when using CometSparkToColumnar #1092

Comments

bmorck commented Nov 15, 2024 • edited by viirya Loading

Describe the bug

Steps to reproduce

Expected behavior

Additional context

andygrove commented Nov 15, 2024

andygrove commented Nov 15, 2024

huaxingao commented Nov 15, 2024

viirya commented Nov 15, 2024

viirya commented Nov 15, 2024 • edited Loading

bmorck commented Nov 15, 2024 • edited by viirya Loading

viirya commented Nov 15, 2024

bmorck commented Nov 15, 2024 •

edited by viirya

Loading

viirya commented Nov 15, 2024 •

edited

Loading

bmorck commented Nov 15, 2024 •

edited by viirya

Loading