Case Study to design logical TableFunction operator #753

drin · 2024-12-06T21:23:04Z

I am interested in helping to design the spec for TableFunction in substrait and am hoping to track my own development in a way that provides a reference for discussion. As I mentioned in #745 (comment), I will eventually try to have an implementation of 3 different types of table functions and explore what feels natural and what feels like an antipattern (if anyone already has strong opinions feel free to share, I have very little experience).

The 3 types of table functions:

A TableFunction acting as a leaf operator, e.g. scan_arrow_ipc in duckdb.
A TableFunction acting as a transformation of an input table to an output table. For this, I want to implement something a bit unique: a function that maintains the cardinality of the table (output rows == input rows) but applies a function across the columns, thereby changing the schema (like a projection, e.g. [, , , ..., ] -> [, ].
A TableFunction acting like a fused operator, e.g. "GroupJoin" as in Accelerating Queries with... Join by GroupJoin.

I'll add more information to this issue as I prototype, if anyone has recommendations on a different way to track or reference the work, let me know and I can adjust as we like.

The text was updated successfully, but these errors were encountered:

jacques-n · 2024-12-10T01:53:27Z

I think about it more like two distinct behaviors as opposed to three.

A. Generator table function. Takes in only constant arguments and produces 0..N records. Operates as leaf in a tree.
B. Set-based table function: takes in a set of records and adds one or more additional columns to each.

For your type 2, I think of that as a window function which possibly excludes certain (or all) input columns. I guess the one distinction is you want a window function that returns multiple output values...

Type A (your type 1) feels pretty simple and that we have most of the low-level concepts to build against. I could see:

Add a new table function extension type.
Declare input arguments for table function as a collection of constant arguments.
Define the output to be a collection of output columns/fields rather than a single column

Type B (likely matches your type 3), we need to likely introduce a lateral operator or similar. Can you remind me how some tools like Calcite represent this? The lateral would accept a table function extension type that could be constants or field references. First use would probably be FLATTEN/UNNEST.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Case Study to design logical TableFunction operator #753

Case Study to design logical TableFunction operator #753

drin commented Dec 6, 2024

jacques-n commented Dec 10, 2024 •

edited

Loading

Case Study to design logical TableFunction operator #753

Case Study to design logical TableFunction operator #753

Comments

drin commented Dec 6, 2024

jacques-n commented Dec 10, 2024 • edited Loading

jacques-n commented Dec 10, 2024 •

edited

Loading