-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache the shard routings with no weight for faster access #12989
Cache the shard routings with no weight for faster access #12989
Conversation
The list of shards to run a query is determined for every request and the weight of the nodes guides the shard selection. Currently, IndexRoutingTable caches the shard routings with weight for faster access. But, during cases where the fail open option is enabled, shards with no weight is also returned lower in the order along with shards with weights. They will be used as fall back if the shards with weights can't be used due to some error. The shard routing with no weight is not cached, hence it does a full loop for every request, this impacts the search latency when the number of shards to query or the number of nodes in the cluster is high. The latency impact is very high when both the number of shards and the number of nodes are high. This change introduces a caching mechanism for shard routing with no weights similar to the existing cache for shard routing with weights. Signed-off-by: Prabhakar Sithanandam <[email protected]>
Compatibility status:Checks if related components are compatible with change 547e3ab Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/performance-analyzer.git] |
❌ Gradle check result for 396c0df: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
server/src/main/java/org/opensearch/cluster/routing/IndexShardRoutingTable.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/IndexShardRoutingTable.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, otherwise looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Prabhakar . Changes look good to me
❌ Gradle check result for 396c0df: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Prabhakar Sithanandam <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #12989 +/- ##
============================================
- Coverage 71.42% 71.37% -0.05%
- Complexity 59978 60354 +376
============================================
Files 4985 5025 +40
Lines 282275 284399 +2124
Branches 40946 41190 +244
============================================
+ Hits 201603 202998 +1395
- Misses 63999 64605 +606
- Partials 16673 16796 +123 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Prabhakar Sithanandam <[email protected]>
❕ Gradle check result for 037aa62: UNSTABLE
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
❌ Gradle check result for cc24120: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for 547e3ab: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
* Cache the shard routings with no weight for faster access The list of shards to run a query is determined for every request and the weight of the nodes guides the shard selection. Currently, IndexRoutingTable caches the shard routings with weight for faster access. But, during cases where the fail open option is enabled, shards with no weight is also returned lower in the order along with shards with weights. They will be used as fall back if the shards with weights can't be used due to some error. The shard routing with no weight is not cached, hence it does a full loop for every request, this impacts the search latency when the number of shards to query or the number of nodes in the cluster is high. The latency impact is very high when both the number of shards and the number of nodes are high. This change introduces a caching mechanism for shard routing with no weights similar to the existing cache for shard routing with weights. Signed-off-by: Prabhakar Sithanandam <[email protected]> Co-authored-by: Prabhakar Sithanandam <[email protected]> (cherry picked from commit fb5d036) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…13050) * Cache the shard routings with no weight for faster access The list of shards to run a query is determined for every request and the weight of the nodes guides the shard selection. Currently, IndexRoutingTable caches the shard routings with weight for faster access. But, during cases where the fail open option is enabled, shards with no weight is also returned lower in the order along with shards with weights. They will be used as fall back if the shards with weights can't be used due to some error. The shard routing with no weight is not cached, hence it does a full loop for every request, this impacts the search latency when the number of shards to query or the number of nodes in the cluster is high. The latency impact is very high when both the number of shards and the number of nodes are high. This change introduces a caching mechanism for shard routing with no weights similar to the existing cache for shard routing with weights. Signed-off-by: Prabhakar Sithanandam <[email protected]> Co-authored-by: Prabhakar Sithanandam <[email protected]>
…-project#12989) * Cache the shard routings with no weight for faster access The list of shards to run a query is determined for every request and the weight of the nodes guides the shard selection. Currently, IndexRoutingTable caches the shard routings with weight for faster access. But, during cases where the fail open option is enabled, shards with no weight is also returned lower in the order along with shards with weights. They will be used as fall back if the shards with weights can't be used due to some error. The shard routing with no weight is not cached, hence it does a full loop for every request, this impacts the search latency when the number of shards to query or the number of nodes in the cluster is high. The latency impact is very high when both the number of shards and the number of nodes are high. This change introduces a caching mechanism for shard routing with no weights similar to the existing cache for shard routing with weights. Signed-off-by: Prabhakar Sithanandam <[email protected]> Co-authored-by: Prabhakar Sithanandam <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
Description
The list of shards to run a query is determined for every request and the weight of the nodes guides the shard selection. Currently, IndexRoutingTable caches the shard routings with weight for faster access. But, during cases where the fail open option is enabled, shards with no weight is also returned lower in the order along with shards with weights. They will be used as fall back if the shards with weights can't be used due to some error.
The shard routing with no weight is not cached, hence it does a full loop for every request, this impacts the search latency when the number of shards to query or the number of nodes in the cluster is high. The latency impact is very high when both the number of shards and the number of nodes are high.
This change introduces a caching mechanism for shard routing with no weights similar to the existing cache for shard routing with weights.
Check List
[ ] New functionality has been documented.[ ] New functionality has javadoc added[ ] Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)[ ] Commit changes are listed out in CHANGELOG.md file (See: Changelog)[ ] Public documentation issue/PR createdBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.