You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think there are some bits here that are being handled manually that should be handled in a more automated fashion.
With the hand-written tuples for which producer can correctly encode which query, we have the means to track regressions (DuckDB suddenly fails to run logb, e.g.) but not improvements (Isthmus can now encode logb).
This is a highly non-trivial problem, because the outcomes of the tests of the producers are essentially the text fixtures for the consumers.
We've been using pytest-snapshot to test that Ibis produces "good" or "golden" SQL for various expressions (https://pypi.org/project/pytest-snapshot/) and I wonder if that would be of help here.
Testing producers would mean generating substrait blobs, then comparing them to known good / valid snapshots of those blobs.
Testing consumers would consist of loading the snapshots blobs and attempting to execute.
I know I'm not covering everything that needs covering in the test matrix here, but I think it would be a very good idea to start sketching out more sustainable patterns.
Having said all of ^^^^that^^^^, I don't think that should block this PR.
I do think that we should be attempting to run all producer tests on all SQL snippets, and not manually filtering them down pre-test. If isthmus is going to fail one of those tests because it uses a different SQL dialect, so be it -- we can get creative in the xfail markers and distinguish between "tests that fail that should pass in the future" and "tests that fail that will always fail".
Alternatively, we might make use of sqlglot to translate string sql between dialects -- it's very good at that.
I think there are some bits here that are being handled manually that should be handled in a more automated fashion.
With the hand-written tuples for which
producer
can correctly encode which query, we have the means to track regressions (DuckDB
suddenly fails to runlogb
, e.g.) but not improvements (Isthmus
can now encodelogb
).This is a highly non-trivial problem, because the outcomes of the tests of the producers are essentially the text fixtures for the consumers.
We've been using
pytest-snapshot
to test that Ibis produces "good" or "golden" SQL for various expressions (https://pypi.org/project/pytest-snapshot/) and I wonder if that would be of help here.Testing producers would mean generating substrait blobs, then comparing them to known good / valid snapshots of those blobs.
Testing consumers would consist of loading the snapshots blobs and attempting to execute.
I know I'm not covering everything that needs covering in the test matrix here, but I think it would be a very good idea to start sketching out more sustainable patterns.
Having said all of ^^^^that^^^^, I don't think that should block this PR.
I do think that we should be attempting to run all producer tests on all SQL snippets, and not manually filtering them down pre-test. If
isthmus
is going to fail one of those tests because it uses a different SQL dialect, so be it -- we can get creative in thexfail
markers and distinguish between "tests that fail that should pass in the future" and "tests that fail that will always fail".Alternatively, we might make use of
sqlglot
to translate string sql between dialects -- it's very good at that.Originally posted by @gforsyth in #6 (review)
The text was updated successfully, but these errors were encountered: