Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests #13936

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Omega359
Copy link
Contributor

Which issue does this PR close?

Closes #13812

Rationale for this change

Add most of the sqlite test suite to Datafusion sqllogictests. Note: THESE TESTS DO NOT CURRENTLY PASS! Any test results where Datafusion returns a result that does not match sqlite nor match Postgresql was left as-is.

What changes are included in this PR?

This PR includes a number of changes many of which are part of the test files in the datafusion-testing repo (5,711,125 select statements of which 78,437 fail outright in Datafusion). The list below includes both the changes in this direct PR as well as the process to generate the files in datafusion-testing/data/sqlite/

  • All the .test files in the sqlite test suite are includes except the contents of the evidence and the index/delete folders
  • All .test files have had mysql and mssql specific tests removed. All other references to mysql and mssql have been removed.
  • All .test files were renamed to .slt
  • All files had control resultmode valuewise added to the beginning to allow the sqllogictest runner to properly be able to compare the results from Datafusion (and Postgresql) to the results in the .slt file
  • All queries have been run through both Datafusion and Postgres and any queries that failed with an error have had comments added explaining the failure and a skipif Datafusion and/or skipif postgres. For example:
# Datafusion - DataFusion error: SQL error: RecursionLimitExceeded
skipif Datafusion
  • Datatypes have been updated to reflect data types from Datafusion/Postgresql as the sqlite datatypes are very limited. Comments reflecting the change have been added. For example:
# Datafusion - Types were automatically converted from:
# Datafusion - [Expected] [T]
# Datafusion - [Actual  ] [I]
  • Results have been updated if the Datafusion results differ from the sqlite results AND the Datafusion results are the same as what the results from Postgresql are. There are queries where the results differs especially around floating point (Real results in slt terms). floating point results were deemed equivalent between Datafusion and Postgresql if the result was the same to 4 decimal places. Comments reflecting the change have been added. For example:
# Datafusion - Data was automatically updated based on comparison db results
# Datafusion - Previous results:
# Datafusion - 54
# Datafusion - 9
  • The sqllogictest and runners have been updated to include progress information
  • A datafusion-testing git submodule has been added. You may need to run git submodule update --init --remote --recursive to get it added to an existing checkout of datafusion.
  • Added the ability to start a postgres docker container automatically if no PG_URI is set.
  • Readme updates

Are these changes tested?

Indeed, yes. To run the tests locally checkout this branch, update the git submodules then run INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests. Be aware that the tests can take quite a long time to run, especially if you do not run with release or release-nonlto profiles.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Dec 28, 2024
@Omega359 Omega359 marked this pull request as ready for review December 28, 2024 22:56
@Omega359
Copy link
Contributor Author

Related: apache/datafusion-testing#2

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU @Omega359 -- this looks awesome

I think there are two things that we should fix prior to merge:

  1. The submodule issue (details below)
  2. "UnexpectedToken" issues (though I think this could potentially also be fine to fix this as a follow on PR)

Once we get this PR merged, I think the next obvious thing to do is to start running the suite in CI and actively tend the tickets for fixing issues found (the bugs listed on #13811 which will now be much eaiser to reproduce)

"UnexpectedToken label-XXX" errors:

When I ran this branch with

External error: task 27341 panicked with message "called Result::unwrap() on an Err value: ParseError { kind: UnexpectedToken("label-1"), loc: Location { file: "../../datafusion-testing/data/sqlite/random/select/slt_good_21.slt", line: 47, upper: None } }"


- This looks like something that came in via sqllogictests 0.25 yesteday: https://github.com/apache/datafusion/pull/13917

Perhaps we could downgrade / revert to 0.24 and file a ticket upstream 🤔 

## progress reporting

This is pretty neat 🎉 
```shell
cargo test --test sqllogictests

Screenshot 2024-12-29 at 9 01 58 AM

git submodule issues

Initially I tried to run this locally and had some problems

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ INCLUDE_SQLITE=true cargo test --profile release-nonlto --test sqllogictests
    Finished `release-nonlto` profile [optimized] target(s) in 0.43s
     Running bin/sqllogictests.rs (target/release-nonlto/deps/sqllogictests-19127caafe5284e5)
Error: Execution("Error reading directory \"../../datafusion-testing/data/\": Not a directory (os error 20)")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`

Caused by:
  process didn't exit successfully: `/Users/andrewlamb/Software/datafusion2/target/release-nonlto/deps/sqllogictests-19127caafe5284e5` (exit status: 1)

This seems to be related to not having the datafusion-testing submodule checked out

However, git submodule init didn't seem to work

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git submodule init
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git status
On branch sqllogictest_with_sqlite
nothing to commit, working tree clean
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ ls datafusion-testing
datafusion-testing*
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ cat datafusion-testing
e2e320c9477a6d8ab09662eae255887733c0e304(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$

I found I could fix it by running with --force:

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git rm datafusion-testing
rm 'datafusion-testing'
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git submodule add --force https://github.com/apache/datafusion-testing.git
Reactivating local git directory for submodule 'datafusion-testing'
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ ls datafusion-testing/
LICENSE.txt  NOTICE.txt   README.md    data/
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git status
On branch sqllogictest_with_sqlite
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
	typechange: datafusion-testing

@@ -160,7 +161,7 @@ cargo test --test sqllogictests -- information
Test files that start with prefix `pg_compat_` verify compatibility
with Postgres by running the same script files both with DataFusion and with Postgres

In order to run the sqllogictests running against a previously running Postgres instance, do:
In order to have the sqllogictest run against an existing running Postgres instance, do:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for cleaning up the word salad

## Running Tests: `sqlite`

Test files in `data/sqlite` directory of the datafusion-testing crate were
sourced from the sqlite test suite and have been cleansed and updated to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also please add a link to what "sqlite test suite" means?

I think it means https://www.sqlite.org/sqllogictest/dir?ci=tip

@@ -239,6 +239,10 @@ pub fn cell_to_string(col: &ArrayRef, row: usize) -> Result<String> {
let key = dict.normalized_keys()[row];
Ok(cell_to_string(dict.values(), key)?)
}
// only added because of a bug in v 1.0.4 (is) of lexical-write-integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as

If so I think we could remove the workaround now as it has been subsequently fixed

use std::ffi::OsStr;
use std::fs;
#[cfg(feature = "postgres")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps as a follow on PR the code that manages the postgres container could be moved into its own module (like postgres_container.rs or something so we only needed one #[cfg(feature = "postgres")]

I suspect this would also make the code a bit easier to reason about

@alamb alamb changed the title Add sqlite test files into sqllogictests Add sqlite test files, progress bar, and automatic postgres container management into sqllogictests Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete / integrate sqlite sqllogictest test scripts integrattion
2 participants