-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests #13936
base: main
Are you sure you want to change the base?
Conversation
Related: apache/datafusion-testing#2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THANK YOU @Omega359 -- this looks awesome
I think there are two things that we should fix prior to merge:
- The submodule issue (details below)
- "UnexpectedToken" issues (though I think this could potentially also be fine to fix this as a follow on PR)
Once we get this PR merged, I think the next obvious thing to do is to start running the suite in CI and actively tend the tickets for fixing issues found (the bugs listed on #13811 which will now be much eaiser to reproduce)
"UnexpectedToken label-XXX" errors:
When I ran this branch with
External error: task 27341 panicked with message "called Result::unwrap()
on an Err
value: ParseError { kind: UnexpectedToken("label-1"), loc: Location { file: "../../datafusion-testing/data/sqlite/random/select/slt_good_21.slt", line: 47, upper: None } }"
- This looks like something that came in via sqllogictests 0.25 yesteday: https://github.com/apache/datafusion/pull/13917
Perhaps we could downgrade / revert to 0.24 and file a ticket upstream 🤔
## progress reporting
This is pretty neat 🎉
```shell
cargo test --test sqllogictests
git submodule
issues
Initially I tried to run this locally and had some problems
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
Finished `release-nonlto` profile [optimized] target(s) in 0.43s
Running bin/sqllogictests.rs (target/release-nonlto/deps/sqllogictests-19127caafe5284e5)
Error: Execution("Error reading directory \"../../datafusion-testing/data/\": Not a directory (os error 20)")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`
Caused by:
process didn't exit successfully: `/Users/andrewlamb/Software/datafusion2/target/release-nonlto/deps/sqllogictests-19127caafe5284e5` (exit status: 1)
This seems to be related to not having the datafusion-testing submodule checked out
However, git submodule init
didn't seem to work
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git submodule init
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git status
On branch sqllogictest_with_sqlite
nothing to commit, working tree clean
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ ls datafusion-testing
datafusion-testing*
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ cat datafusion-testing
e2e320c9477a6d8ab09662eae255887733c0e304(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$
I found I could fix it by running with --force
:
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git rm datafusion-testing
rm 'datafusion-testing'
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git submodule add --force https://github.com/apache/datafusion-testing.git
Reactivating local git directory for submodule 'datafusion-testing'
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ ls datafusion-testing/
LICENSE.txt NOTICE.txt README.md data/
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git status
On branch sqllogictest_with_sqlite
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
typechange: datafusion-testing
@@ -160,7 +161,7 @@ cargo test --test sqllogictests -- information | |||
Test files that start with prefix `pg_compat_` verify compatibility | |||
with Postgres by running the same script files both with DataFusion and with Postgres | |||
|
|||
In order to run the sqllogictests running against a previously running Postgres instance, do: | |||
In order to have the sqllogictest run against an existing running Postgres instance, do: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for cleaning up the word salad
## Running Tests: `sqlite` | ||
|
||
Test files in `data/sqlite` directory of the datafusion-testing crate were | ||
sourced from the sqlite test suite and have been cleansed and updated to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also please add a link to what "sqlite test suite" means?
I think it means https://www.sqlite.org/sqllogictest/dir?ci=tip
@@ -239,6 +239,10 @@ pub fn cell_to_string(col: &ArrayRef, row: usize) -> Result<String> { | |||
let key = dict.normalized_keys()[row]; | |||
Ok(cell_to_string(dict.values(), key)?) | |||
} | |||
// only added because of a bug in v 1.0.4 (is) of lexical-write-integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same as
If so I think we could remove the workaround now as it has been subsequently fixed
use std::ffi::OsStr; | ||
use std::fs; | ||
#[cfg(feature = "postgres")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps as a follow on PR the code that manages the postgres container could be moved into its own module (like postgres_container.rs
or something so we only needed one #[cfg(feature = "postgres")]
I suspect this would also make the code a bit easier to reason about
Which issue does this PR close?
Closes #13812
Rationale for this change
Add most of the sqlite test suite to Datafusion sqllogictests. Note: THESE TESTS DO NOT CURRENTLY PASS! Any test results where Datafusion returns a result that does not match sqlite nor match Postgresql was left as-is.
What changes are included in this PR?
This PR includes a number of changes many of which are part of the test files in the
datafusion-testing
repo (5,711,125 select statements of which 78,437 fail outright in Datafusion). The list below includes both the changes in this direct PR as well as the process to generate the files indatafusion-testing/data/sqlite/
evidence
and theindex/delete
folderscontrol resultmode valuewise
added to the beginning to allow the sqllogictest runner to properly be able to compare the results from Datafusion (and Postgresql) to the results in the .slt fileskipif Datafusion
and/orskipif postgres
. For example:git submodule update --init --remote --recursive
to get it added to an existing checkout of datafusion.PG_URI
is set.Are these changes tested?
Indeed, yes. To run the tests locally checkout this branch, update the git submodules then run
INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
. Be aware that the tests can take quite a long time to run, especially if you do not run with release or release-nonlto profiles.Are there any user-facing changes?
No.