Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in interface conversion: EndpointSlice informer DeleteEvent event handler func #7775

Open
JoelSpeed opened this issue Dec 9, 2024 · 0 comments · May be fixed by #7776
Open

Panic in interface conversion: EndpointSlice informer DeleteEvent event handler func #7775

JoelSpeed opened this issue Dec 9, 2024 · 0 comments · May be fixed by #7776
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@JoelSpeed
Copy link
Contributor

What happened:

Observed a panic

W1204 22:27:53.157224       1 reflector.go:484] k8s.io/client-go/informers/factory.go:160: watch of *v1.EndpointSlice ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1204 22:27:53.157252       1 reflector.go:484] k8s.io/client-go/informers/factory.go:160: watch of *v1.Service ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
W1204 22:27:53.157268       1 reflector.go:484] k8s.io/client-go/informers/factory.go:160: watch of *v1.Node ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
E1204 22:27:53.157315       1 controller.go:302] "Unhandled Error" err="error processing service e2e-loadbalancers-8022/svc-udp (retrying with exponential backoff): failed to remove load balancer cleanup finalizer: Patch \"https://api-int.ci-op-1i2ww9js-40356.ci2.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-loadbalancers-8022/services/svc-udp/status\": http2: client connection lost" logger="UnhandledError"
E1204 22:27:53.157315       1 leaderelection.go:429] Failed to update lock optimitically: Put "https://api-int.ci-op-1i2ww9js-40356.ci2.azure.devcluster.openshift.com:6443/apis/coordination.k8s.io/v1/namespaces/openshift-cloud-controller-manager/leases/cloud-controller-manager?timeout=53.5s": http2: client connection lost, falling back to slow path
I1204 22:27:53.157385       1 event.go:389] "Event occurred" object="e2e-loadbalancers-8022/svc-udp" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to remove load balancer cleanup finalizer: Patch \"https://api-int.ci-op-1i2ww9js-40356.ci2.azure.devcluster.openshift.com:6443/api/v1/namespaces/e2e-loadbalancers-8022/services/svc-udp/status\": http2: client connection lost"
I1204 22:27:53.280327       1 azure_loadbalancer_repo.go:72] LoadBalancerClient.List(ci-op-1i2ww9js-40356-qsrhd-rg) success
I1204 22:27:53.414084       1 controller.go:973] Removing finalizer from service e2e-loadbalancers-8022/svc-udp
I1204 22:27:53.443126       1 controller.go:999] Patching status for service e2e-loadbalancers-8022/svc-udp
I1204 22:27:53.443289       1 event.go:389] "Event occurred" object="e2e-loadbalancers-8022/svc-udp" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="DeletedLoadBalancer" message="Deleted load balancer"
I1204 22:27:54.206743       1 reflector.go:341] Listing and watching *v1.Service from k8s.io/client-go/informers/factory.go:160
I1204 22:27:54.213560       1 reflector.go:368] Caches populated for *v1.Service from k8s.io/client-go/informers/factory.go:160
I1204 22:27:54.553283       1 reflector.go:341] Listing and watching *v1.EndpointSlice from k8s.io/client-go/informers/factory.go:160
I1204 22:27:54.557402       1 reflector.go:368] Caches populated for *v1.EndpointSlice from k8s.io/client-go/informers/factory.go:160
E1204 22:27:54.558549       1 iface.go:262] "Observed a panic" panic="interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.EndpointSlice" panicGoValue="&runtime.TypeAssertionError{_interface:(*abi.Type)(0x291daa0), concrete:(*abi.Type)(0x2b73880), asserted:(*abi.Type)(0x2f5cc20), missingMethod:\"\"}" stacktrace=<
	goroutine 321 [running]:
	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x342b140, 0x4f2eb40}, {0x29df300, 0xc0005f3140})
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0xbc
	k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x342b140, 0x4f2eb40}, {0x29df300, 0xc0005f3140}, {0x4f2eb40, 0x0, 0x43fb05?})
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:82 +0x5e
	k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000d1bc00?})
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108
	panic({0x29df300?, 0xc0005f3140?})
		/usr/lib/golang/src/runtime/panic.go:770 +0x132
	sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).setUpEndpointSlicesInformer.func3({0x2b73880?, 0xc0000af240?})
		/go/src/github.com/openshift/cloud-provider-azure/pkg/provider/azure_local_services.go:360 +0xee
	k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnDelete(...)
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/cache/controller.go:260
	k8s.io/client-go/tools/cache.(*processorListener).run.func1()
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/cache/shared_informer.go:983 +0x9f
	k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
	k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000b7bf70, {0x33e7be0, 0xc000b883f0}, 0x1, 0xc000a10360)
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
	k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000509770, 0x3b9aca00, 0x0, 0x1, 0xc000a10360)
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
	k8s.io/apimachinery/pkg/util/wait.Until(...)
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
	k8s.io/client-go/tools/cache.(*processorListener).run(0xc000a9ef30)
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69
	k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52
	created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 364
		/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73

It was caused by this interface conversion.

The actual type was cache.DeletedFinalStateUnknown, which is sent when events are missed because the watch stream was disconnected when the delete happened.

The object includes a, potentially stale, copy of the object. And since the delete handler is only using this information to remove data from the cache, I suspect the stale object is sufficient to also delete the object. A small handling of this case should cause fewer panics when there are network disconnections or missed events like this.

What you expected to happen:

Could should not panic during normal operation.

How to reproduce it (as minimally and precisely as possible):

Not sure, have seen it a handful of times while deleting services, during E2E testing

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.31
  • Cloud provider or hardware configuration: Azure
  • OS (e.g: cat /etc/os-release): RHCOS
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@JoelSpeed JoelSpeed added the kind/bug Categorizes issue or PR as related to a bug. label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant