You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An issue has been identified where port exhaustion is causing indexing failures and partial snapshots in OpenSearch clusters with high indexing loads. The problem manifests in the following ways:
Periodic spikes in 5xx HTTP status codes during indexing operations.
Exceptions with the message "Cannot assign requested address" appearing in logs, particularly during stale segment deletion.
Failures in translog uploads due to the same "Cannot assign requested address" error.
Partial snapshots due to shard failures, with error messages indicating metadata files are not present for certain primary terms and generations.
Root Cause:
The issue appears to stem from the synchronous S3 client creating new sockets for every request under high load, leading to port exhaustion. This primarily affects operations like stale segment deletion, which can involve a large number of files becoming eligible for deletion between events.
Impact:
Degraded indexing performance
Incomplete or failed snapshots
Related component
Storage:Snapshots
To Reproduce
The issue can be reproduced by:
Creating a single-node remote store enabled OpenSearch domain with a large instance type
Creating an index with a high number of primary shards (e.g., 200) and no replicas.
Initiating heavy indexing operations.
Create snapshot periodically
Expected behavior
The ports should not exhausted since there is a default setting that limits the max connection to 500.
Observed behaviour
Indexing rate dips and 5xx error spikes occur at regular intervals, coinciding with snapshot operations.
Snapshot status shows as PARTIAL with multiple shard failures.
Additional Details
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
An issue has been identified where port exhaustion is causing indexing failures and partial snapshots in OpenSearch clusters with high indexing loads. The problem manifests in the following ways:
Root Cause:
The issue appears to stem from the synchronous S3 client creating new sockets for every request under high load, leading to port exhaustion. This primarily affects operations like stale segment deletion, which can involve a large number of files becoming eligible for deletion between events.
Impact:
Related component
Storage:Snapshots
To Reproduce
The issue can be reproduced by:
Expected behavior
The ports should not exhausted since there is a default setting that limits the max connection to 500.
Observed behaviour
Additional Details
No response
The text was updated successfully, but these errors were encountered: