You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've noticed that as the number of our secrets went up, some of them are not replicated anymore. In the log of our k8s cluster, we've found the following message:
Forcing secrets watcher close due to unresponsiveness: key: "/secrets", labels: "", fields: "". len(c.input) = 10, len(c.result) = 10, graceful = false
We managed to reproduce this behaviour with a simplified version of SecretWatcher:
var config = KubernetesClientConfiguration.BuildDefaultConfig();
config.HttpClientTimeout = TimeSpan.FromMinutes(30);
using var k8s = new Kubernetes(config);
var doDelay = true;
while(true)
{
using var watcher = k8s.CoreV1.ListSecretForAllNamespacesWithHttpMessagesAsync(watch: true, timeoutSeconds: 1200);
var watchList = watcher.WatchAsync<V1Secret, V1SecretList>();
int count = 0;
await foreach (var (type, item) in watchList)
{
if (doDelay) await Task.Delay(TimeSpan.FromMilliseconds(100));
count++;
}
Console.WriteLine($"{doDelay}: {count}");
doDelay = !doDelay;
}
If doDelay is true then the number of secrets processed is ~130, but with doDelay set to false thousands of secrets are processed
Similarly, WatchBackgroundService does basically the same thing:
using var watcher = OnGetWatcher(stoppingToken);
var watchList = watcher.WatchAsync<TResource, TResourceList>(cancellationToken: stoppingToken);
await foreach (var (type, item) in watchList
.WithCancellation(stoppingToken))
await Mediator.Publish(new WatcherEvent
{
Item = item,
Type = type
}, stoppingToken);
Mediator.Publish awaits for all downstream NotificationHandlers to complete for each secret one-by-one causing k8s API to terminate the connection.
From what we understood exploring KubernetesClient, iterating over watchList reads the response from k8s line by line keeping the connection open.
All this combined (synchronous processing + lazy response reading) leads to k8s simply terminating the connection at some point resulting in some secrets not being processed ever
The text was updated successfully, but these errors were encountered:
DmitrySenin
changed the title
K8s disconnects watcher due to unresponsiveness
Some secrets are not processed if there are too many of them
Dec 2, 2024
We see this behavior too on our larger clusters. There is a treshold of appx. 8000 total secrets (not just those handled by reflector), depending on CPU allocated to the operator.
The behavior is the same on each reconciliation loop, e.g. API server closes the connection due to unresponsiveness every time at about the same time.
We've noticed that as the number of our secrets went up, some of them are not replicated anymore. In the log of our k8s cluster, we've found the following message:
We managed to reproduce this behaviour with a simplified version of SecretWatcher:
If
doDelay
istrue
then the number of secrets processed is ~130, but withdoDelay
set tofalse
thousands of secrets are processedSimilarly, WatchBackgroundService does basically the same thing:
Mediator.Publish
awaits for all downstream NotificationHandlers to complete for each secret one-by-one causing k8s API to terminate the connection.From what we understood exploring KubernetesClient, iterating over
watchList
reads the response from k8s line by line keeping the connection open.All this combined (synchronous processing + lazy response reading) leads to k8s simply terminating the connection at some point resulting in some secrets not being processed ever
The text was updated successfully, but these errors were encountered: