Better handling for which producers can correctly encode which queries #9

richtia · 2022-12-19T20:57:09Z

I think there are some bits here that are being handled manually that should be handled in a more automated fashion.

With the hand-written tuples for which producer can correctly encode which query, we have the means to track regressions (DuckDB suddenly fails to run logb, e.g.) but not improvements (Isthmus can now encode logb).

This is a highly non-trivial problem, because the outcomes of the tests of the producers are essentially the text fixtures for the consumers.

We've been using pytest-snapshot to test that Ibis produces "good" or "golden" SQL for various expressions (https://pypi.org/project/pytest-snapshot/) and I wonder if that would be of help here.

Testing producers would mean generating substrait blobs, then comparing them to known good / valid snapshots of those blobs.

Testing consumers would consist of loading the snapshots blobs and attempting to execute.

I know I'm not covering everything that needs covering in the test matrix here, but I think it would be a very good idea to start sketching out more sustainable patterns.

Having said all of ^^^^that^^^^, I don't think that should block this PR.

I do think that we should be attempting to run all producer tests on all SQL snippets, and not manually filtering them down pre-test. If isthmus is going to fail one of those tests because it uses a different SQL dialect, so be it -- we can get creative in the xfail markers and distinguish between "tests that fail that should pass in the future" and "tests that fail that will always fail".

Alternatively, we might make use of sqlglot to translate string sql between dialects -- it's very good at that.

Originally posted by @gforsyth in #6 (review)

The text was updated successfully, but these errors were encountered:

richtia · 2023-03-28T19:31:59Z

Partially resolved by #14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling for which producers can correctly encode which queries #9

Better handling for which producers can correctly encode which queries #9

richtia commented Dec 19, 2022

richtia commented Mar 28, 2023

Better handling for which producers can correctly encode which queries #9

Better handling for which producers can correctly encode which queries #9

Comments

richtia commented Dec 19, 2022

richtia commented Mar 28, 2023