How does Datafusion compare with Arrow Compute functions? #3079
Replies: 1 comment
-
Hi @mauropagano. Personally, I'd like to treat SQL query engine as an interpreter (or compiler) and treat Arrow compute kernels like the targeted languages. Actually, the two (sql query engine, and interpreter) are very similar: At the highest level, you need to write some SQL strings, and the sql-parser will parse them as tokens and then generate the Logical Plan. (Logical Plan is very similar to the AST in interpreter.) Then, the Datafusion can optimize the Logical Plan (like the AST optimization) and generate the physical Plan (low level AST or IR). And then we could also optimize the Physical Plan to make it faster. The physical plan consists of many physical expressions which call the Arrow compute kernels. (You can think the physical expressions as the "high level" instructions in assembly language and Arrow kernels as the lowest level instructions, such as Add, Mul, ... in X64). And finally, we can execute the physical plan to get the result.
So the question is much similar to when to write high level language (Rust, Python ...) vs low level language (X64, RISC-V ...).
These are just my personal opinions. Remzi |
Beta Was this translation helpful? Give feedback.
-
Hi,
Apologies if this is a trivial question but I can't seem to answer it myself.
In the last 2-3 major release Arrow compute grew quite a bit including functionalities that I thought "was on Datafusion" to provide (group by, joins, exec plans, etc).
I understand the interface is different (e.g. SQL is an option in Datafusion) but in Python there seem to be some overlap.
Also understand the distributed nature of Ballista and the extensibility (e.g. UDF) Datafusion brings, that's a clear differentiator.
How should one reason about when to use Datafusion vs straight Arrow, say for example to aggregate data from a parquet file?
Or are these core operations now being provided by Arrow compute and Datafusion focuses on more higher-level operations?
Thanks,
Mauro
Beta Was this translation helpful? Give feedback.
All reactions