-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add limit for max range query splits by interval #6458
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ahmed Hassan <[email protected]>
staticIntervalFn := func(_ tripperware.Request) time.Duration { return cfg.SplitQueriesByInterval } | ||
queryRangeMiddleware = append(queryRangeMiddleware, tripperware.InstrumentMiddleware("split_by_interval", metrics), SplitByIntervalMiddleware(staticIntervalFn, limits, prometheusCodec, registerer)) | ||
intervalFn := func(_ tripperware.Request) time.Duration { return cfg.SplitQueriesByInterval } | ||
if cfg.SplitQueriesByIntervalMaxSplits != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the limit be applied to both range splits and vertical spits?
func (s shardBy) Do(ctx context.Context, r Request) (Response, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this sets a limit for the total range and vertical splits for a given query. The number of vertical shards is static, so the max number of of splits for a given query becomes split_queries_by_interval_max_splits
x query_vertical_shard_size
. Because of this adding a separate limit for vertical sharding when the number of vertical shards is a static config would be redundant because we limit it already.
Signed-off-by: Ahmed Hassan <[email protected]>
Instead of changing split interval using max number of split queries, can we try to combine it with estimated data to fetch? For example, a query up[30d] is very expensive to split to 30 splits as each split query still fetches 30 day of data so 30 splits ended up fetching 900 days of data. Instead of having a limit of total splits should we use total days of data to fetch? |
That's a good idea - I can add a new limit for total hours of data fetched and adjust the interval to not exceed it. We can still keep max number of splits since it gives more flexibility to limit the number of shards for queries with long day range even if they don't fetch a lot of days of data like the example you mentioned |
What this PR does:
Cortex only supports using a static interval to split range queries. This PR adds a new limit called
split_queries_by_interval_max_splits
that is used to dynamically change split interval to a multiple ofsplit_queries_by_interval
to ensure that the total number of splits remains below the set number.Example:
split_queries_by_interval
= 24hsplit_queries_by_interval_max_splits
= 30A 30 day range query is split to 30 queries using 24h interval
A 40 day range query is split to 20 queries using 48h interval
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]