-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: caching optd stats, 12x speedup on TPC-H SF1 (#132)
**Summary**: Now caching the stat objects used by `OptCostModel`, meaning we don't need to load data into DataFusion after doing it the first time. **Demo**: 12x speedup on TPC-H SF1 compared to not caching stats. Caching everything _except_ optd stats takes 45.6s total. ![Screenshot 2024-03-23 at 16 59 04](https://github.com/cmu-db/optd/assets/20631215/4c199374-e2df-43fb-9eba-f348ea1e275a) Caching everything, _including_ optd stats, takes 3.9s total. ![Screenshot 2024-03-23 at 16 57 45](https://github.com/cmu-db/optd/assets/20631215/4ef01ae9-c5a9-4fcd-bad9-c52d9a73c147) **Details**: * This caching is **disabled by default** to avoid accidentally using stale stats. I added a CLI arg to enable it. * The main challenge of this PR was making `PerTableStats` a serializable object for `serde`. * The serializability refactor will also help down the line when we want to **put statistics in the catalog**, since that is fundamentally a serialization problem too. Having `Box<dyn ...>` would make putting stats in the catalog more difficult. * This required a significant refactor of how the `MostCommonValues` and `Distribution` traits are handled in `OptCostModel`. Instead of having `Box<dyn ...>` values in `PerColumnStats` which store any object that implements these traits, I made `PerColumnStats` a templated object. * The one downside of this refactor is that we can no longer have a database which uses _different_ data structures for `Distribution` (like a t-digest for one column, a histogram for another, etc.). I didn't see this as a big enough reason to not do the refactor because it seems like a rare thing to do. Additionally, if we really needed to do this, we could just make an enum that had both types.
- Loading branch information
1 parent
3477898
commit 204758e
Showing
18 changed files
with
281 additions
and
199 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.