Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health probes #2

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Add health probes #2

wants to merge 10 commits into from

Conversation

george-echo
Copy link

What does this PR do?

Add ability to register health functions in services and serve startz, readyz and alivez endpoints for K8s probe consumption

@george-echo george-echo requested a review from a team July 21, 2022 16:54
Copy link

@maximus1108 maximus1108 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Added a few comments :)

run.go Outdated
@@ -22,25 +26,49 @@ func Run() error {
service.Registry.Register(service)
}

// Register probes for internal services
grpcStarted := RegisterHealthProbe("grpc", func(ctx context.Context) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's that available to us yet? Says v1.24 beta and we're v1.20 from the looks of it? Would be a good thing to use, less resource usage.

prometheusStarted := RegisterHealthProbe("prometheus", func(ctx context.Context) error {
return nil
}, func(ctx context.Context) error {
if !service.prometheusRunning {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether a failing prometheus server should causes the reboot of our entire pod and thus core application?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 Yeah was thinking the same, I arrived at it being unlikely that both pods would have prometheus fail at once so it would be better to get the other capturing data again instead of just leaving it. Maybe that's not correct though as it's not a fundamental part of the application?

run.go Outdated Show resolved Hide resolved
run.go Outdated
if len(errMsgs) == 0 {
return
}
writer.WriteHeader(500)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should this be 503 like the others?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the alivez one I thought this made more sense as it's not just unavailable with a chance of future requests being accepted but something which isn't suppose to happen has happened and we need to be restarted 🤔

run.go Outdated Show resolved Hide resolved
run.go Outdated
service.PrometheusServer = &http.Server{Addr: service.PrometheusConfig.Address()}

http.Handle("/metrics", promhttp.Handler())
logrus.Infof("Prometheus metrics at http://%s/metrics", service.PrometheusConfig.Address())

service.prometheusRunning = true
started()

go func() {
if err := service.PrometheusServer.ListenAndServe(); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the service marked as running only after the ListenAndServe() is called successfully?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListenAndServe is blocking from what I can tell. Could call Listen which gets us our port then call started() then Serve()? Still wouldn't be 100% true but close enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split into listen and serve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants