-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify precomputation of aggregations behind a common API #16733
base: main
Are you sure you want to change the base?
Conversation
We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met). In every case, we've hooked that into custom logic hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point). With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override tryPrecomputeAggregationForLeaf, which is expected to return true if they managed to precompute. This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method). Signed-off-by: Michael Froh <[email protected]>
❌ Gradle check result for 4d5c32b: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
@@ -216,6 +220,10 @@ protected void preGetSubLeafCollectors(LeafReaderContext ctx) throws IOException | |||
*/ | |||
protected void doPreCollection() throws IOException {} | |||
|
|||
protected boolean tryPrecomputeAggregationForLeaf(LeafReaderContext ctx) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to add support for subAggregations later ?
Don't we need something similar to LeafBucketCollector [ mainly the collect equivalent method ] - where the subAggregators can return the implementation and the parent class can invoke the subAggs implementation to collect each of the unique values/buckets.
[ This requirement is mainly for star tree aggregations ]
For example : for Terms aggs with sum sub-aggs,
Terms | sum
1 500 --> Bucket 1
2 6000 --> Bucket 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that it would be the responsibility of an Aggregator
to delegate to its subaggregators.
I think @sandeshkr419 managed to do that in #16674
Regarding implementation of this, I have one more alternative which I think is worth discussing. How about bringing this abstraction at ContextIndexSearcher itself.
Basically if we have pre computed aggregations already, we assign it as EarlyTerminationCollector. So, what I'm thinking about is cases with sub-aggregations that we can pre-compute, which is highly relevant in cases of star tree pre-computation. For eg.: #16674 and if a dedicated abstraction for star-tree preCompute in ComtextIndexSearcher wopuld make more sense or not. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16733 +/- ##
============================================
- Coverage 72.15% 72.05% -0.10%
- Complexity 65145 65187 +42
============================================
Files 5315 5318 +3
Lines 303573 303836 +263
Branches 43925 43957 +32
============================================
- Hits 219039 218937 -102
- Misses 66587 67010 +423
+ Partials 17947 17889 -58 ☔ View full report in Codecov by Sentry. |
Description
We've had a series of aggregation speedups that use the same strategy: instead of iterating through documents that match the query one-by-one, we can look at a Lucene segment and compute the aggregation directly (if some particular conditions are met).
In every case, we've hooked that into custom logic that hijacks the getLeafCollector method and throws CollectionTerminatedException. This creates the illusion that we're implementing a custom LeafCollector, when really we're not collecting at all (which is the whole point).
With this refactoring, the mechanism (hijacking getLeafCollector) is moved into AggregatorBase. Aggregators that have a strategy to precompute their answer can override
tryPrecomputeAggregationForLeaf
, which is expected to return true if they managed to precompute.This should also make it easier to keep track of which aggregations have precomputation approaches (since they override this method).
Related Issues
N/A
Check List
Functionality includes testing.API changes companion pull request created, if applicable.Public documentation issue/PR created, if applicable.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.