Accurate resource requests for Lagoon workloads #3294

smlx · 2022-09-19T13:29:59Z

smlx
Sep 19, 2022
Maintainer

I've been thinking about what would need to happen for Lagoon to fully support accurate resource requests. The motivation for this is to reduce the impact of noisy neighbours and also to more efficiently use cluster resources. The noisy neighbour issue in particular can also have security impacts (DoS).

One approach that seems like it might work is to use the Vertical Pod Autoscaler to automatically adjust pod resource requests. However this has the limitation that it cannot be used with Horizontal Pod Autoscalers targeting CPU or memory.

Horizontal Pod Autoscalers targeting CPU or memory for webserver workloads such as those hosted by Lagoon are not ideal anyway because often workloads are network or process-count bound instead. For example, php-fpm in Lagoon is currently limited to 50 workers and if those 50 are all stuck in a slow database request the app deployment won't hit CPU thresholds and scale up.

One way to address both of these issues would be to start using custom metrics for HPAs in Lagoon. To take the case of php-fpm, this would mean adding a php-fpm metrics exporter sidecar container to the Lagoon nginx/php pod.

These metrics would need to be scraped by a Prometheus instance. The easiest way to do that from an operational perspective would be to make Prometheus Operator an optional dependency of Lagoon. Then the Lagoon Remote helm chart can create a Prometheus object to invoke a Prometheus instance, and Lagoon can create ServiceMonitor objects targeting the custom metrics during deploys.

Finally, the cluster would also need the Prometheus Adapter for Kubernetes Metrics APIs to be installed and configured to inject the metrics collected by Prometheus into the Kubernetes API.

This is quite a lot of additional software stack, so I wanted to start a discussion about the issue. Assuming that the above makes sense and the outcome is worth the extra complexity in some Lagoon Remote deployments, what's the best way to expose the interface for this? I'm thinking that ideally a simple boolean flag in the Lagoon Remote chart enableHPAs would be possible, and would just need to be thoroughly documented (explaining the dependency on Prometheus Operator etc.).

Any thoughts?

bomoko · 2022-11-12T19:02:37Z

bomoko
Nov 12, 2022
Maintainer

These are very interesting points, @smlx. I think this is a fantastic idea.

Am I right in thinking that out-the-box the idea would be to support the "standard" scaling parameters (memory and processor usage) and then it would be up to whomever is implementing this custom scaling functionality on their remote to determine just what the scaling parameters should be? We could, for instance, provide some defaults/examples, like the 50 php process example you gave, but in general, the person(s) implementing remote would then be able to specify their scaling parameters?

1 reply

smlx Nov 14, 2022
Maintainer Author

Ideally it would have as little configuration as possible, but yes I guess some control over scaling parameters might be useful. Unsure if that would live in the remote or the Lagoon API though 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accurate resource requests for Lagoon workloads #3294

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Accurate resource requests for Lagoon workloads #3294

smlx Sep 19, 2022 Maintainer

Replies: 1 comment · 1 reply

bomoko Nov 12, 2022 Maintainer

smlx Nov 14, 2022 Maintainer Author

smlx
Sep 19, 2022
Maintainer

Replies: 1 comment 1 reply

bomoko
Nov 12, 2022
Maintainer

smlx Nov 14, 2022
Maintainer Author