Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some secrets are not processed if there are too many of them #468

Open
DmitrySenin opened this issue Dec 2, 2024 · 1 comment
Open

Some secrets are not processed if there are too many of them #468

DmitrySenin opened this issue Dec 2, 2024 · 1 comment

Comments

@DmitrySenin
Copy link

DmitrySenin commented Dec 2, 2024

We've noticed that as the number of our secrets went up, some of them are not replicated anymore. In the log of our k8s cluster, we've found the following message:

Forcing secrets watcher close due to unresponsiveness: key: "/secrets", labels: "", fields: "". len(c.input) = 10, len(c.result) = 10, graceful = false

We managed to reproduce this behaviour with a simplified version of SecretWatcher:

var config = KubernetesClientConfiguration.BuildDefaultConfig();
config.HttpClientTimeout = TimeSpan.FromMinutes(30);
using var k8s = new Kubernetes(config);

var doDelay = true;

while(true)
{
    using var watcher = k8s.CoreV1.ListSecretForAllNamespacesWithHttpMessagesAsync(watch: true, timeoutSeconds: 1200);
    var watchList = watcher.WatchAsync<V1Secret, V1SecretList>();
    int count = 0;
    
    await foreach (var (type, item) in watchList)
    {
        if (doDelay) await Task.Delay(TimeSpan.FromMilliseconds(100));
        count++;
    }

    Console.WriteLine($"{doDelay}: {count}");
    doDelay = !doDelay;
}

If doDelay is true then the number of secrets processed is ~130, but with doDelay set to false thousands of secrets are processed

Similarly, WatchBackgroundService does basically the same thing:

using var watcher = OnGetWatcher(stoppingToken);
var watchList = watcher.WatchAsync<TResource, TResourceList>(cancellationToken: stoppingToken);

await foreach (var (type, item) in watchList
                    .WithCancellation(stoppingToken))
    await Mediator.Publish(new WatcherEvent
    {
        Item = item,
        Type = type
    }, stoppingToken);

Mediator.Publish awaits for all downstream NotificationHandlers to complete for each secret one-by-one causing k8s API to terminate the connection.
From what we understood exploring KubernetesClient, iterating over watchList reads the response from k8s line by line keeping the connection open.
All this combined (synchronous processing + lazy response reading) leads to k8s simply terminating the connection at some point resulting in some secrets not being processed ever

@DmitrySenin DmitrySenin changed the title K8s disconnects watcher due to unresponsiveness Some secrets are not processed if there are too many of them Dec 2, 2024
@bukovjanmic
Copy link

We see this behavior too on our larger clusters. There is a treshold of appx. 8000 total secrets (not just those handled by reflector), depending on CPU allocated to the operator.

The behavior is the same on each reconciliation loop, e.g. API server closes the connection due to unresponsiveness every time at about the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants