-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add health probes #2
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it!
Added a few comments :)
run.go
Outdated
@@ -22,25 +26,49 @@ func Run() error { | |||
service.Registry.Register(service) | |||
} | |||
|
|||
// Register probes for internal services | |||
grpcStarted := RegisterHealthProbe("grpc", func(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a "best practice" method of approach gRPC health checks, perhaps we should look into that. More info:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's that available to us yet? Says v1.24 beta and we're v1.20 from the looks of it? Would be a good thing to use, less resource usage.
prometheusStarted := RegisterHealthProbe("prometheus", func(ctx context.Context) error { | ||
return nil | ||
}, func(ctx context.Context) error { | ||
if !service.prometheusRunning { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether a failing prometheus server should causes the reboot of our entire pod and thus core application?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 Yeah was thinking the same, I arrived at it being unlikely that both pods would have prometheus fail at once so it would be better to get the other capturing data again instead of just leaving it. Maybe that's not correct though as it's not a fundamental part of the application?
run.go
Outdated
if len(errMsgs) == 0 { | ||
return | ||
} | ||
writer.WriteHeader(500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should this be 503 like the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the alivez one I thought this made more sense as it's not just unavailable with a chance of future requests being accepted but something which isn't suppose to happen has happened and we need to be restarted 🤔
run.go
Outdated
service.PrometheusServer = &http.Server{Addr: service.PrometheusConfig.Address()} | ||
|
||
http.Handle("/metrics", promhttp.Handler()) | ||
logrus.Infof("Prometheus metrics at http://%s/metrics", service.PrometheusConfig.Address()) | ||
|
||
service.prometheusRunning = true | ||
started() | ||
|
||
go func() { | ||
if err := service.PrometheusServer.ListenAndServe(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the service marked as running only after the ListenAndServe() is called successfully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ListenAndServe is blocking from what I can tell. Could call Listen which gets us our port then call started() then Serve()? Still wouldn't be 100% true but close enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split into listen and serve
What does this PR do?
Add ability to register health functions in services and serve startz, readyz and alivez endpoints for K8s probe consumption