Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete / integrate sqlite sqllogictest test scripts integrattion #13812

Open
alamb opened this issue Dec 17, 2024 · 13 comments · May be fixed by #13936
Open

Complete / integrate sqlite sqllogictest test scripts integrattion #13812

alamb opened this issue Dec 17, 2024 · 13 comments · May be fixed by #13936
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

Is your feature request related to a problem or challenge?

Part of #13811

@Omega359 has prototyped integration with the sqlite test suite ❤️

This ticket tracks the remaining work to get this runner merged into datafusion main.

Describe the solution you'd like

I would like:

  1. The runner merged into datafusion main
  2. Clear instructions on how to run the suite

Bonus points would be some way to regularly run these tests

Describe alternatives you've considered

From #13470 (comment)

I've updated my test branch with the latest changes including adding skipif for Postgresql queries that fail to run. What is left is at least 19 legitimate query result mismatches where tickets were filed.

I've submitted a PR to the sqllogictest-rs repo for my changes but so far they haven't responded. If they do not get around to merging it for whatever reason we have two choices:

  1. Rewrite the .slt files to be columnwise vs valuewise and change ordering to only be rowsort and remove the valuesort. I've actually written a lua script to handle the valuewise -> columnwise conversion, the second part will require some more work.
  2. Fork the repo and use that.

For now I'm going to slow down work on this.

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Dec 17, 2024

@Omega359 do I have this work described correctly?

@Omega359
Copy link
Contributor

Omega359 commented Dec 17, 2024

Thanks for putting this together @alamb. Yes, the work is described correctly. CI builds currently are failing because of a clippy warning that I need to look into but otherwise it's good for others to look at.

I find that, at least on my machine, running the sqlite tests in 'normal' test mode takes ...forever. I typically run now with --release. Slower to build, faster overall.

cargo test --release --test sqllogictests -- --include-sqlite

if you do not run in release mode you may need to increase the stack size or it'll fail part way in:

export RUST_MIN_STACK=3048576

@Omega359
Copy link
Contributor

One issue that I haven't fixed is that some of the sqlite test files are LARGE. In total the files are 763M. I am unsure that we would want them in the main repo ... and I am pretty sure they will fail a CI check with some being > 1MB anyways.

Perhaps we need a new datafusion_testing repository where these files would go and be linked into df via .gitmodules

@alamb
Copy link
Contributor Author

alamb commented Dec 19, 2024

Perhaps we need a new datafusion_testing repository where these files would go and be linked into df via .gitmodules

I think this is a good idea -- or maybe a script that checks out the original sqllite files or something

I also think we need to find a way to run it regularly (but likely it can't be on all PRs), see

@wiedld from InfluxData expressed interest in helping move this project along. I will let her comment on what she next plans, etc

@Omega359
Copy link
Contributor

Yes, this seems like a perfect thing for a nightly ci run.

The thing that is holding me back atm is the sqllogictest-rs changes. Without those in the official repo means either redoing the output of the sqlite results (possible but not particularly easy) or using my fork. I'm hoping to get back to seeing if I can do the rewrite this weekend by comparing the sortmode results during a complete run (with my super hacked up version of sqllogictest-rs) and if they are the same write it out as columnmode. That would mean that the current runner would work.

I don't think attempting to pull in the original sqlite files will work as what I am producing is very different from what they have. datafusion_testing repo I think would be best.

@Omega359
Copy link
Contributor

Ok, I believe I'm ready to push a PR for the source code since the sqllogictest-rs folks pushed a release with my changes. I just need a decision on what to do with the actual sqlite .slt files. Do we:

  1. Check them in as is and just adjust the large_files.yml in github workflow to let them through
  2. git lfs
  3. separate repo and use git modules for attaching to df repo (my preferred solution - datafusion-testing)
  4. compress files in place and check in compressed files
  5. something else?

@alamb
Copy link
Contributor Author

alamb commented Dec 21, 2024

separate repo and use git modules for attaching to df repo (my preferred solution - datafusion-testing)

yes this is my preferred solution too as it follows an exisitng pre-set pattern

@Omega359
Copy link
Contributor

What are the steps required to have a datafusion-testingcreated? I went hunting quickly through the apache docs I could find but I couldn't find the steps required

@alamb
Copy link
Contributor Author

alamb commented Dec 22, 2024

I did some poking around and found on this JIRA: https://issues.apache.org/jira/browse/INFRA-26314?jql=project%20%3D%20INFRA%20AND%20summary%20~%20%22create%20repo%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

This can be done on a selfserve basis via https://gitbox.apache.org/boxer/. If you have any questions please feel free to email us at [email protected] or find us on slack in #asfinfra.

I submitted a request

It appears it is now available 🎉 : https://github.com/apache/datafusion-testing

@Omega359
Copy link
Contributor

Thank you very much @alamb. I'll work on the PR later today

@Omega359
Copy link
Contributor

Omega359 commented Dec 23, 2024

Apparently there isn't a way to fork an empty repository in github, or at least I can't seem to find a way so I'm unable to push a PR there for the slt files. Can someone with contributor permissions push a license file to that new repo so that the github UI will allow me to fork it?

@comphead @alamb @findepi ...

@jonahgao
Copy link
Member

Apparently there isn't a way to fork an empty repository in github, or at least I can't seem to find a way so I'm unable to push a PR there for the slt files. Can someone with contributor permissions push a license file to that new repo so that the github UI will allow me to fork it?

Done. @Omega359

@Omega359
Copy link
Contributor

PR for data files: apache/datafusion-testing#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment