Some secrets are not processed if there are too many of them #468

DmitrySenin · 2024-12-02T15:58:42Z

We've noticed that as the number of our secrets went up, some of them are not replicated anymore. In the log of our k8s cluster, we've found the following message:

Forcing secrets watcher close due to unresponsiveness: key: "/secrets", labels: "", fields: "". len(c.input) = 10, len(c.result) = 10, graceful = false

We managed to reproduce this behaviour with a simplified version of SecretWatcher:

var config = KubernetesClientConfiguration.BuildDefaultConfig();
config.HttpClientTimeout = TimeSpan.FromMinutes(30);
using var k8s = new Kubernetes(config);

var doDelay = true;

while(true)
{
    using var watcher = k8s.CoreV1.ListSecretForAllNamespacesWithHttpMessagesAsync(watch: true, timeoutSeconds: 1200);
    var watchList = watcher.WatchAsync<V1Secret, V1SecretList>();
    int count = 0;
    
    await foreach (var (type, item) in watchList)
    {
        if (doDelay) await Task.Delay(TimeSpan.FromMilliseconds(100));
        count++;
    }

    Console.WriteLine($"{doDelay}: {count}");
    doDelay = !doDelay;
}

If doDelay is true then the number of secrets processed is ~130, but with doDelay set to false thousands of secrets are processed

Similarly, WatchBackgroundService does basically the same thing:

using var watcher = OnGetWatcher(stoppingToken);
var watchList = watcher.WatchAsync<TResource, TResourceList>(cancellationToken: stoppingToken);

await foreach (var (type, item) in watchList
                    .WithCancellation(stoppingToken))
    await Mediator.Publish(new WatcherEvent
    {
        Item = item,
        Type = type
    }, stoppingToken);

Mediator.Publish awaits for all downstream NotificationHandlers to complete for each secret one-by-one causing k8s API to terminate the connection.
From what we understood exploring KubernetesClient, iterating over watchList reads the response from k8s line by line keeping the connection open.
All this combined (synchronous processing + lazy response reading) leads to k8s simply terminating the connection at some point resulting in some secrets not being processed ever

The text was updated successfully, but these errors were encountered:

bukovjanmic · 2024-12-10T09:58:21Z

We see this behavior too on our larger clusters. There is a treshold of appx. 8000 total secrets (not just those handled by reflector), depending on CPU allocated to the operator.

The behavior is the same on each reconciliation loop, e.g. API server closes the connection due to unresponsiveness every time at about the same time.

DmitrySenin changed the title ~~K8s disconnects watcher due to unresponsiveness~~ Some secrets are not processed if there are too many of them Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some secrets are not processed if there are too many of them #468

Some secrets are not processed if there are too many of them #468

DmitrySenin commented Dec 2, 2024 •

edited

Loading

bukovjanmic commented Dec 10, 2024

Some secrets are not processed if there are too many of them #468

Some secrets are not processed if there are too many of them #468

Comments

DmitrySenin commented Dec 2, 2024 • edited Loading

bukovjanmic commented Dec 10, 2024

DmitrySenin commented Dec 2, 2024 •

edited

Loading