Add health probes #2

george-echo · 2022-07-21T16:51:38Z

What does this PR do?

Add ability to register health functions in services and serve startz, readyz and alivez endpoints for K8s probe consumption

maximus1108

Love it!

Added a few comments :)

maximus1108 · 2022-07-22T09:25:10Z

run.go

@@ -22,25 +26,49 @@ func Run() error {
 		service.Registry.Register(service)
 	}

+	// Register probes for internal services
+	grpcStarted := RegisterHealthProbe("grpc", func(ctx context.Context) error {


There's a "best practice" method of approach gRPC health checks, perhaps we should look into that. More info:

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe

https://github.com/grpc/grpc/blob/master/doc/health-checking.md

It's that available to us yet? Says v1.24 beta and we're v1.20 from the looks of it? Would be a good thing to use, less resource usage.

maximus1108 · 2022-07-22T09:27:37Z

run.go

+	prometheusStarted := RegisterHealthProbe("prometheus", func(ctx context.Context) error {
+		return nil
+	}, func(ctx context.Context) error {
+		if !service.prometheusRunning {


I wonder whether a failing prometheus server should causes the reboot of our entire pod and thus core application?

🤷 Yeah was thinking the same, I arrived at it being unlikely that both pods would have prometheus fail at once so it would be better to get the other capturing data again instead of just leaving it. Maybe that's not correct though as it's not a fundamental part of the application?

run.go

maximus1108 · 2022-07-22T09:41:23Z

run.go

+	if len(errMsgs) == 0 {
+		return
+	}
+	writer.WriteHeader(500)


Nit: Should this be 503 like the others?

For the alivez one I thought this made more sense as it's not just unavailable with a chance of future requests being accepted but something which isn't suppose to happen has happened and we need to be restarted 🤔

run.go

markosamuli · 2022-07-22T10:03:14Z

run.go

 	service.PrometheusServer = &http.Server{Addr: service.PrometheusConfig.Address()}

 	http.Handle("/metrics", promhttp.Handler())
 	logrus.Infof("Prometheus metrics at http://%s/metrics", service.PrometheusConfig.Address())

+	service.prometheusRunning = true
+	started()
+
 	go func() {
 		if err := service.PrometheusServer.ListenAndServe(); err != nil {


Should the service marked as running only after the ListenAndServe() is called successfully?

ListenAndServe is blocking from what I can tell. Could call Listen which gets us our port then call started() then Serve()? Still wouldn't be 100% true but close enough?

split into listen and serve

george-echo added 6 commits July 21, 2022 14:20

Add readyz and alivez probes for pod monitoring

28ed8eb

Change mod name

ce3a738

update to include startz probe

3d0c7be

add usage information

6fb885e

Add log message for probes starting

90e06e3

fix scope capture bug

04c6e08

george-echo requested a review from a team July 21, 2022 16:54

maximus1108 suggested changes Jul 22, 2022

View reviewed changes

markosamuli reviewed Jul 22, 2022

View reviewed changes

george-echo added 4 commits July 22, 2022 12:20

Improve gRPC & prometheus probe state

a2c988c

move to probe package and de-duplicate code

8fac6c9

remove dep cycle

9098233

add passthrough registration func

ed0f4a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add health probes #2

Add health probes #2

george-echo commented Jul 21, 2022

maximus1108 left a comment

maximus1108 Jul 22, 2022

george-echo Jul 22, 2022

maximus1108 Jul 22, 2022

george-echo Jul 22, 2022

maximus1108 Jul 22, 2022

george-echo Jul 22, 2022

markosamuli Jul 22, 2022

george-echo Jul 22, 2022

george-echo Jul 22, 2022

Add health probes #2

Are you sure you want to change the base?

Add health probes #2

Conversation

george-echo commented Jul 21, 2022

What does this PR do?

maximus1108 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment