Add limit for max range query splits by interval #6458

afhassan · 2024-12-24T18:43:11Z

What this PR does:
Cortex only supports using a static interval to split range queries. This PR adds a new limit called split_queries_by_interval_max_splits that is used to dynamically change split interval to a multiple of split_queries_by_interval to ensure that the total number of splits remains below the set number.

Example:
split_queries_by_interval = 24h
split_queries_by_interval_max_splits = 30
A 30 day range query is split to 30 queries using 24h interval
A 40 day range query is split to 20 queries using 48h interval

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Ahmed Hassan <[email protected]>

harry671003 · 2024-12-27T03:46:46Z

pkg/querier/tripperware/queryrange/query_range_middlewares.go

-		staticIntervalFn := func(_ tripperware.Request) time.Duration { return cfg.SplitQueriesByInterval }
-		queryRangeMiddleware = append(queryRangeMiddleware, tripperware.InstrumentMiddleware("split_by_interval", metrics), SplitByIntervalMiddleware(staticIntervalFn, limits, prometheusCodec, registerer))
+		intervalFn := func(_ tripperware.Request) time.Duration { return cfg.SplitQueriesByInterval }
+		if cfg.SplitQueriesByIntervalMaxSplits != 0 {


Shouldn't the limit be applied to both range splits and vertical spits?

cortex/pkg/querier/tripperware/shard_by.go

Line 40 in 8a46d20

func (s shardBy) Do(ctx context.Context, r Request) (Response, error) {

Technically this sets a limit for the total range and vertical splits for a given query. The number of vertical shards is static, so the max number of of splits for a given query becomes split_queries_by_interval_max_splits x query_vertical_shard_size. Because of this adding a separate limit for vertical sharding when the number of vertical shards is a static config would be redundant because we limit it already.

Signed-off-by: Ahmed Hassan <[email protected]>

pkg/querier/tripperware/queryrange/split_by_interval.go

yeya24 · 2024-12-31T19:55:30Z

Instead of changing split interval using max number of split queries, can we try to combine it with estimated data to fetch?

For example, a query up[30d] is very expensive to split to 30 splits as each split query still fetches 30 day of data so 30 splits ended up fetching 900 days of data.

Instead of having a limit of total splits should we use total days of data to fetch?

afhassan · 2024-12-31T23:59:47Z

Instead of changing split interval using max number of split queries, can we try to combine it with estimated data to fetch?

For example, a query up[30d] is very expensive to split to 30 splits as each split query still fetches 30 day of data so 30 splits ended up fetching 900 days of data.

Instead of having a limit of total splits should we use total days of data to fetch?

That's a good idea - I can add a new limit for total hours of data fetched and adjust the interval to not exceed it.

We can still keep max number of splits since it gives more flexibility to limit the number of shards for queries with long day range even if they don't fetch a lot of days of data like the example you mentioned

add limit for range query max splits by interval

b874e4e

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added the size/S label Dec 24, 2024

harry671003 reviewed Dec 27, 2024

View reviewed changes

Change dynamic interval sharding to take into account vertical sharding

6106978

Signed-off-by: Ahmed Hassan <[email protected]>

pull-request-size bot added size/M and removed size/S labels Dec 31, 2024

afhassan commented Dec 31, 2024

View reviewed changes

pkg/querier/tripperware/queryrange/split_by_interval.go Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add limit for max range query splits by interval #6458

Add limit for max range query splits by interval #6458

afhassan commented Dec 24, 2024

harry671003 Dec 27, 2024

afhassan Dec 30, 2024 •

edited

Loading

yeya24 commented Dec 31, 2024

afhassan commented Dec 31, 2024

Add limit for max range query splits by interval #6458

Are you sure you want to change the base?

Add limit for max range query splits by interval #6458

Conversation

afhassan commented Dec 24, 2024

harry671003 Dec 27, 2024

Choose a reason for hiding this comment

afhassan Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

yeya24 commented Dec 31, 2024

afhassan commented Dec 31, 2024

afhassan Dec 30, 2024 •

edited

Loading