Health check failing on default installation #2

IronhandedLayman · 2017-04-04T20:39:28Z

After following the given documentation I am seeing the following issue:

Name:		keepalived-cloud-provider-1765620686-d9nkt
Namespace:	kube-system
Node:		<elided/>
Start Time:	Tue, 04 Apr 2017 16:25:17 -0400
Labels:		app=keepalived-cloud-provider
		pod-template-hash=1765620686
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"keepalived-cloud-provider-1765620686","uid":"ef9618e5-1973-1...
		scheduler.alpha.kubernetes.io/critical-pod=
		scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status:		Running
IP:		192.168.241.247
Controllers:	ReplicaSet/keepalived-cloud-provider-1765620686
Containers:
  keepalived-cloud-provider:
    Container ID:	docker://e3240e5d78a382155c902ee6b5cca8294b3be393959c5c3cfa1eba4d303bc66c
    Image:		quay.io/munnerz/keepalived-cloud-provider
    Image ID:		docker-pullable://quay.io/munnerz/keepalived-cloud-provider@sha256:170351533b23126b8f4eeeeb4293ec417607b762b7ae07d5c018a9cb792d1032
    Port:		
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Error
      Exit Code:	2
      Started:		Mon, 01 Jan 0001 00:00:00 +0000
      Finished:		Tue, 04 Apr 2017 16:29:08 -0400
    Ready:		False
    Restart Count:	5
    Requests:
      cpu:	200m
    Liveness:	http-get http://127.0.0.1:10252/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
    Environment:
      KEEPALIVED_NAMESPACE:	kube-system
      KEEPALIVED_CONFIG_MAP:	vip-configmap
      KEEPALIVED_SERVICE_CIDR:	<elided/>
    Mounts:
      /etc/ssl/certs from certs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-jl9jx (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  certs:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/ssl/certs
  default-token-jl9jx:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-jl9jx
    Optional:	false
QoS Class:	Burstable
Node-Selectors:	<none>
Tolerations:	node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
		node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath					Type		Reason		Message
  ---------	--------	-----	----			-------------					--------	------		-------
<snipped/>
  41s	41s	1	kubelet, <elided/>	spec.containers{keepalived-cloud-provider}	Normal	Killing		Killing container with id docker://e3240e5d78a382155c902ee6b5cca8294b3be393959c5c3cfa1eba4d303bc66c:pod "keepalived-cloud-provider-1765620686-d9nkt_kube-system(d16932c2-1974-11e7-a2c7-0025905ca872)" container "keepalived-cloud-provider" is unhealthy, it will be killed and re-created.
  1m	1s	9	kubelet, <elided/>		spec.containers{keepalived-cloud-provider}	Warning	BackOff		Back-off restarting failed container
  41s	1s	5	kubelet, <elided/>							Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "keepalived-cloud-provider" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=keepalived-cloud-provider pod=keepalived-cloud-provider-1765620686-d9nkt_kube-system(d16932c2-1974-11e7-a2c7-0025905ca872)"

it appears to be failing a health check. Removing the health check stops the CrashLoopBackOff but obviously that's not optimal. I can't find the health check on the first pass through the code so I'm not entirely sure what's wrong.

The text was updated successfully, but these errors were encountered:

antoineserrano · 2017-08-04T07:27:54Z

Hi @IronhandedLayman, we got nearly the same issue here, did you manage to find a workaround and/or make it work properly?

munnerz · 2017-08-04T07:35:59Z

I'll try and have a look at this at some point today - for now I'd advise leaving it out. It was a very last minute addition to include it at the time!

…

On Fri, 4 Aug 2017 at 08:27, Antoine Serrano ***@***.***> wrote: Hi @IronhandedLayman <https://github.com/ironhandedlayman>, we got nearly the same issue here, did you manage find a workaround and/or make it work properly? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAMbP2pgQONSGu1d4AlTy4xUTjjlnsCGks5sUsf6gaJpZM4MzZa7> .

tommyknows · 2017-12-21T12:27:52Z

Same Problem for me.
I started a shell inside the container and one thing I noticed is that netstat listens on port 10253

/ # netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 :::10253                :::*                    LISTEN      1/keepalived-cloud-

However, even after changing the livenessProbe's port to 10253, it's still failing.
When testing inside the container:

/ # wget -q http://127.0.0.1:10253/healthz
/ # cat healthz 
ok/ #

So the livenessProbe should be working...(but somehow doesn't)
Events from kubectl describe pod:

Events:
  Type     Reason                 Age                From                  Message
  ----     ------                 ----               ----                  -------
  Normal   Scheduled              3m                 default-scheduler     Successfully assigned keepalived-cloud-provider-664464fc97-bvpbj to k8s-fed-wk1
  Normal   SuccessfulMountVolume  3m                 kubelet, k8s-fed-wk1  MountVolume.SetUp succeeded for volume "certs"
  Normal   SuccessfulMountVolume  3m                 kubelet, k8s-fed-wk1  MountVolume.SetUp succeeded for volume "keepalived-token-dl7qw"
  Normal   Pulled                 2m (x2 over 3m)    kubelet, k8s-fed-wk1  Container image "quay.io/munnerz/keepalived-cloud-provider:0.0.1" already present on machine
  Normal   Created                2m (x2 over 3m)    kubelet, k8s-fed-wk1  Created container
  Normal   Started                2m (x2 over 3m)    kubelet, k8s-fed-wk1  Started container
  Warning  Unhealthy              53s (x15 over 3m)  kubelet, k8s-fed-wk1  Liveness probe failed: Get http://127.0.0.1:10253/healthz: dial tcp 127.0.0.1:10253: getsockopt: connection refused
  Normal   Killing                53s (x2 over 2m)   kubelet, k8s-fed-wk1  Killing container with id docker://keepalived-cloud-provider:Container failed liveness probe.. Container will be killed and recreated.

Note: It is the exact same error as I get with port 10252...

walteraa mentioned this issue Aug 29, 2017

Public External IP Load Balancer #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health check failing on default installation #2

Health check failing on default installation #2

IronhandedLayman commented Apr 4, 2017 •

edited

Loading

antoineserrano commented Aug 4, 2017 •

edited

Loading

munnerz commented Aug 4, 2017 via email

tommyknows commented Dec 21, 2017

Health check failing on default installation #2

Health check failing on default installation #2

Comments

IronhandedLayman commented Apr 4, 2017 • edited Loading

antoineserrano commented Aug 4, 2017 • edited Loading

munnerz commented Aug 4, 2017 via email

tommyknows commented Dec 21, 2017

IronhandedLayman commented Apr 4, 2017 •

edited

Loading

antoineserrano commented Aug 4, 2017 •

edited

Loading