-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Include reshape benchmarks #3
Comments
Hi Grant, Thank you for the suggestion! I currently don't have a lot of bandwidth to add a whole new solution to the benchmark, but if you want to open a PR that adds the necessary See repro.sh for steps to run the benchmark either locally or on an AWS instance. If no errors are thrown for the 0.5GB & 5GB datasets I'd be happy to merge your PR and re-run the benchmark to include results for collapse. |
As for the reshaping benchmarks, I think its a great idea! It would take a while to finally include those queries in the benchmark, however, as I would need to
I would like to do a re-work of the report generation code, as it was hard to track down bugs while re-running the benchmark. As mentioned in h2oai#175, however, I would be happy to review or collaborate any PRs that help maintain and improve the benchmark! |
This comment was marked as resolved.
This comment was marked as resolved.
collapse author here. Thanks @grantmcdermott and @vincentarelbundock for the initiative! I'm happy with adding collapse to the benchmarks, and also happy for any suggested code, but would like to wait for the pending v2.0 release (which includes implementations of table joins and reshaping). I will also ensure the benchmarking code is equivalent to other DBMS (collapse has some unfavorable defaults e.g. |
Sounds good @SebKrantz. You may want to use my PR as a starting point since most of the setup and group-by stuff is close to done. FYI, the |
Stoked to see this back up and running!
(As an aside, the relentless performance gains of DuckDB are truly impressive.)
Two suggestions:
Please consider the collapse R package (link). In my own set of benchmarks, collapse is typically at or near the top of various groupby operations for datasets in the order of .5-5 GB. (I haven't tested larger than that and should also say it doesn't support join operations yet.) I can add a PR if interested.Closed via [WIP] New solution:r-collapse
#33.Thanks again for all effort in resurrecting this.
The text was updated successfully, but these errors were encountered: