Implement predicate pushdown for parquet reader #349

OussamaSaoudi-db · 2024-09-19T20:32:08Z

This PR implements predicate pushdown for the parquet reader used in both the sync and default engines.

Closes: #341

codecov · 2024-09-19T20:37:14Z

Codecov Report

Attention: Patch coverage is 83.74384% with 33 lines in your changes missing coverage. Please review.

Project coverage is 74.94%. Comparing base (896accc) to head (11763bd).
Report is 19 commits behind head on main.

Files with missing lines	Patch %	Lines
kernel/src/engine/default/parquet.rs	75.24%	17 Missing and 8 partials ⚠️
kernel/src/engine/arrow_expression.rs	86.00%	3 Missing and 4 partials ⚠️
kernel/src/engine/sync/parquet.rs	92.85%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #349      +/-   ##
==========================================
+ Coverage   74.03%   74.94%   +0.91%     
==========================================
  Files          43       43              
  Lines        8137     8542     +405     
  Branches     8137     8542     +405     
==========================================
+ Hits         6024     6402     +378     
- Misses       1733     1738       +5     
- Partials      380      402      +22

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zachschuermann

took a quick look

kernel/src/engine/arrow_expression.rs

zachschuermann · 2024-09-19T22:19:53Z

kernel/src/engine/default/parquet.rs

@@ -112,7 +121,7 @@ impl FileOpener for ParquetOpener {
        // let projection = self.projection.clone();
        let table_schema = self.table_schema.clone();
        let limit = self.limit;
-
+        let predicate = self.predicate.clone();


maybe if the 'to_row_filter' above takes ref we don't need to clone?

kernel/src/snapshot.rs

OussamaSaoudi-db · 2024-09-20T17:58:07Z

kernel/src/engine/arrow_expression.rs

@@ -30,6 +32,18 @@ use crate::schema::{DataType, PrimitiveType, SchemaRef};
 use crate::{EngineData, ExpressionEvaluator, ExpressionHandler};

 // TODO leverage scalars / Datum
+//
+pub fn expression_to_row_filter(predicate: Expression) -> RowFilter {
+    let arrow_predicate = ArrowPredicateFn::new(ProjectionMask::all(), move |batch| {


Here I'm taking all columns. Is there an opportunity to pass in the projection mask too? Could this bring performance gains?

I think without a projection mask, this provides no benefit. From the docs:

RowFilter applies predicates in order, after decoding only the columns required. As predicates eliminate rows, fewer rows from subsequent columns may be required, thus potentially reducing IO and decode.

So in the case we expect to use this, it will help because it will only decode the required cols and then can skip without decoding the rest of it, but we'll need to either:

Specify which columns the predicate applies to

Work it out from the expression

2 would be better, but if that proves too tricky we could require having it passed in. Possibly some of the code in arrow_utils::get_requested_indices could help with getting the projection mask, but that's written for schemas and we don't necessarily have those here. But you probably could construct the needed stuff.

Okay so we'd definitely like projection. I was initially only thinking about the row group skipping optimization.

Re option 2: I could imagine a query select * from table where value > 50. Here I believe the filter would only have the value column, but we are looking for all columns. If that's the case, I think option 1 is the way.

Couple of things:

For actual "data path" we don't push the predicate down in the "all in one" scan (see here). Although an engine could choose to do that since read_parquet_files does take a predicate. Filtering like this is not always a win for reading parquet, so we'll need to be a bit more careful about when we want to actually push things down. Our example code also doesn't push it down yet. Once this is working, we can maybe think about providing some guidance to engines about when to push down.

my option 1 vs 2 above was just about figuring out which columns to project out to evaluate the predicate. You definitely want to project columns out. So for your example, you could either require the caller to tell you that it's expression requires the value column, or you could look at the expression itself and notice that that's the only column it references. Figuring it out from the expression is much nicer for users, but requires more work because you need need to examine the expression, see which cols it references. Regardless you then have to figure out what the positions of those columns are in the parquet file, which is very non-trivial (see get_requested_indices). You might want to limit this to only allow filtering on root cols.

Update on this: I'm now filtering columns, but only at the root level since StructType::project only works at root level. I extract the columns by recursing down the expression structure. I've put up an issue to explore projections in nested columns #353.

OussamaSaoudi-db · 2024-09-20T21:24:57Z

kernel/src/snapshot.rs

-    fn test_snapshot_read_metadata() {
-        let path =
-            std::fs::canonicalize(PathBuf::from("./tests/data/table-with-dv-small/")).unwrap();
+    fn get_snapshot_from_path(path: &str, version: Option<Version>) -> Snapshot {


@zachschuermann I moved setup code into this helper function. How's it look?

Fixed a bug in the protocol and metadata pushdown

nicklan · 2024-09-20T21:55:32Z

kernel/src/engine/arrow_expression.rs

@@ -30,6 +32,18 @@ use crate::schema::{DataType, PrimitiveType, SchemaRef};
 use crate::{EngineData, ExpressionEvaluator, ExpressionHandler};

 // TODO leverage scalars / Datum
+//
+pub fn expression_to_row_filter(predicate: Expression) -> RowFilter {
+    let arrow_predicate = ArrowPredicateFn::new(ProjectionMask::all(), move |batch| {


I think without a projection mask, this provides no benefit. From the docs:

RowFilter applies predicates in order, after decoding only the columns required. As predicates eliminate rows, fewer rows from subsequent columns may be required, thus potentially reducing IO and decode.

So in the case we expect to use this, it will help because it will only decode the required cols and then can skip without decoding the rest of it, but we'll need to either:

Specify which columns the predicate applies to

Work it out from the expression

2 would be better, but if that proves too tricky we could require having it passed in. Possibly some of the code in arrow_utils::get_requested_indices could help with getting the projection mask, but that's written for schemas and we don't necessarily have those here. But you probably could construct the needed stuff.

kernel/src/engine/arrow_expression.rs

nicklan · 2024-09-20T22:03:06Z

kernel/src/snapshot.rs

+    }
+
+    #[test]
+    fn test_replay_protocol_metadata_filtering_predicate() {


So currently this just checks that we don't break anything right?

I think we should probably have a more specific check in parquet reader that manually creates the expected expression and pushes it into a filter and ensures it does what it says it will.

Cool, I'll go do that 👍

I put up some new tests in default/parquet.rs. I feel like test_parquet_protocol_metadata_filter is a little ugly, but I don't see any easy ways to simplify it or make it more reusable.

scovich · 2024-10-09T18:38:38Z

kernel/src/engine/default/parquet.rs

+                    parquet_schema,
+                    parquet_physical_schema,
+                )?;
+                builder = builder.with_row_filter(row_filter);


I'm not sure this will work as well as we wish it would...

In my experience, this kind of row-level pushdown doesn't consistently help performance -- even with the kind of lazy materialization arrow-rust brags about (it won't fetch column chunks until proving at least one row is needed).

The reason is: Every row pays the cost of evaluating the filter, while any I/O reduction is only partial at best. We still have to fetch the columns the predicate touches, so any I/O savings come by not fetching payload columns. But that only works if the filter eliminated ALL rows from the row group. And if row groups can be skipped, then stats-based row group skipping can often do it much more cheaply (no extra I/O at all).

Meanwhile, in cases that don't see any I/O reduction, pushing down the filtering just shifts complexity from the query engine to the file scanner. And that's usually a net loss because the output of the scan is likely consumed in a pipelined single pass either way.

Note: There are absolutely cases where row-level filter pushdown is a performance win... but there are too many cases where it doesn't help or even hurts performance instead. And it's data-dependent, so hard to predict how any one query will be affected.

thanks ryan planning on pausing this work for now :)

OussamaSaoudi-db force-pushed the predicate_pushdown branch from 3cb3b25 to 3a6dd9b Compare September 19, 2024 20:47

zachschuermann reviewed Sep 19, 2024

View reviewed changes

OussamaSaoudi-db commented Sep 20, 2024

View reviewed changes

OussamaSaoudi-db force-pushed the predicate_pushdown branch from 8d8d930 to f68ff08 Compare September 20, 2024 21:24

OussamaSaoudi-db commented Sep 20, 2024

View reviewed changes

OussamaSaoudi-db changed the title ~~[WIP] Implement predicate pushdown for parquet reader~~ Implement predicate pushdown for parquet reader Sep 20, 2024

OussamaSaoudi-db added 9 commits September 20, 2024 14:27

Implement predicate pushdown for apache parquet reader

a507c76

Fixed a bug in the protocol and metadata pushdown

Move RowFilter creation to a helper function

baf23f2

Clean up unnecessary changes

66f6742

Remove unnecessary file

e19c3c6

Address formatting issues

71c631d

Add test for protocol and metadata query

e932c9c

Address PR comments

5db1ebc

Remove boilerplate code for snapshot

5279ff5

Run rust format

f2c15e3

OussamaSaoudi-db force-pushed the predicate_pushdown branch from 9bf5787 to f2c15e3 Compare September 20, 2024 21:27

OussamaSaoudi-db marked this pull request as ready for review September 20, 2024 21:29

nicklan requested changes Sep 20, 2024

View reviewed changes

Add tests in to validate predicate filtering

7842372

OussamaSaoudi-db force-pushed the predicate_pushdown branch from 4987a5a to 7842372 Compare September 23, 2024 21:56

OussamaSaoudi-db added 2 commits September 23, 2024 15:00

Add panic if filtered returns non P or M

c71ec9d

Add ProjectionMask generation prototype

1a87abc

OussamaSaoudi-db changed the title ~~Implement predicate pushdown for parquet reader~~ [WIP] Implement predicate pushdown for parquet reader Sep 24, 2024

OussamaSaoudi-db added 4 commits September 24, 2024 09:40

Format files

94e1baa

Rename for clarity

440242c

remove unwraps

059b48c

Minor fixes

3a360ab

OussamaSaoudi-db changed the title ~~[WIP] Implement predicate pushdown for parquet reader~~ Implement predicate pushdown for parquet reader Sep 24, 2024

Check expression extraction to use a hashset instead

11763bd

scovich reviewed Oct 9, 2024

View reviewed changes

zachschuermann added the merge hold Don't allow the PR to merge label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement predicate pushdown for parquet reader #349

Implement predicate pushdown for parquet reader #349

OussamaSaoudi-db commented Sep 19, 2024

codecov bot commented Sep 19, 2024 •

edited

Loading

zachschuermann left a comment

zachschuermann Sep 19, 2024

OussamaSaoudi-db Sep 20, 2024

nicklan Sep 20, 2024 •

edited

Loading

OussamaSaoudi-db Sep 20, 2024

nicklan Sep 20, 2024

OussamaSaoudi-db Sep 24, 2024

OussamaSaoudi-db Sep 20, 2024

nicklan Sep 20, 2024 •

edited

Loading

nicklan Sep 20, 2024

OussamaSaoudi-db Sep 20, 2024

OussamaSaoudi-db Sep 24, 2024

scovich Oct 9, 2024

scovich Oct 9, 2024

zachschuermann Oct 9, 2024

Implement predicate pushdown for parquet reader #349

Are you sure you want to change the base?

Implement predicate pushdown for parquet reader #349

Conversation

OussamaSaoudi-db commented Sep 19, 2024

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

zachschuermann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicklan Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicklan Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Sep 19, 2024 •

edited

Loading

nicklan Sep 20, 2024 •

edited

Loading

nicklan Sep 20, 2024 •

edited

Loading