You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am interested in helping to design the spec for TableFunction in substrait and am hoping to track my own development in a way that provides a reference for discussion. As I mentioned in #745 (comment), I will eventually try to have an implementation of 3 different types of table functions and explore what feels natural and what feels like an antipattern (if anyone already has strong opinions feel free to share, I have very little experience).
The 3 types of table functions:
A TableFunction acting as a leaf operator, e.g. scan_arrow_ipcin duckdb.
A TableFunction acting as a transformation of an input table to an output table. For this, I want to implement something a bit unique: a function that maintains the cardinality of the table (output rows == input rows) but applies a function across the columns, thereby changing the schema (like a projection, e.g. [, , , ..., ] -> [, ].
I'll add more information to this issue as I prototype, if anyone has recommendations on a different way to track or reference the work, let me know and I can adjust as we like.
The text was updated successfully, but these errors were encountered:
I think about it more like two distinct behaviors as opposed to three.
A. Generator table function. Takes in only constant arguments and produces 0..N records. Operates as leaf in a tree.
B. Set-based table function: takes in a set of records and adds one or more additional columns to each.
For your type 2, I think of that as a window function which possibly excludes certain (or all) input columns. I guess the one distinction is you want a window function that returns multiple output values...
Type A (your type 1) feels pretty simple and that we have most of the low-level concepts to build against. I could see:
Add a new table function extension type.
Declare input arguments for table function as a collection of constant arguments.
Define the output to be a collection of output columns/fields rather than a single column
Type B (likely matches your type 3), we need to likely introduce a lateral operator or similar. Can you remind me how some tools like Calcite represent this? The lateral would accept a table function extension type that could be constants or field references. First use would probably be FLATTEN/UNNEST.
I am interested in helping to design the spec for TableFunction in substrait and am hoping to track my own development in a way that provides a reference for discussion. As I mentioned in #745 (comment), I will eventually try to have an implementation of 3 different types of table functions and explore what feels natural and what feels like an antipattern (if anyone already has strong opinions feel free to share, I have very little experience).
The 3 types of table functions:
TableFunction
acting as a leaf operator, e.g.scan_arrow_ipc
in duckdb.TableFunction
acting as a transformation of an input table to an output table. For this, I want to implement something a bit unique: a function that maintains the cardinality of the table (output rows == input rows) but applies a function across the columns, thereby changing the schema (like a projection, e.g. [, , , ..., ] -> [, ].TableFunction
acting like a fused operator, e.g. "GroupJoin" as in Accelerating Queries with... Join by GroupJoin.I'll add more information to this issue as I prototype, if anyone has recommendations on a different way to track or reference the work, let me know and I can adjust as we like.
The text was updated successfully, but these errors were encountered: