You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wordpress pods are getting restarted in a loop on one of our deployments, with liveness checks failing.
We are not sure of the root cause, but could gather some evidence of Prometheus-related issues.
The first one seems to be related to promtail failing on parse errors while (as far as I understand these log lines) scraping Apache log files, which then triggers pebble to restart all services:
2024-12-10T05:23:10.443Z [promtail] level=info ts=2024-12-10T05:23:10.434149788Z caller=filetarget.go:252 msg="watching new directory" directory=/var/log/apache2
2024-12-10T05:26:23.877Z [promtail] level=error ts=2024-12-10T05:26:23.873142101Z caller=logfmt.go:139 component=file_pipeline component=stage type=logfmt msg="failed to decode logfmt" err="logfmt syntax error at pos 412 on line 1: invalid quoted value"
2024-12-10T05:26:49.662Z [pebble] Exiting on terminated signal.
2024-12-10T05:26:49.685Z [pebble] Stopping all running services.
The second issue related to Prometheus seems a red-herring (at least for the pod restarts) as it is followed by various events and not just stop:
2024-12-10T06:04:35.188Z [container-agent] 2024-12-10 06:04:35 DEBUG juju-log ops 2.14.0 up and running.
2024-12-10T06:04:35.468Z [container-agent] 2024-12-10 06:04:35 DEBUG juju-log Invalid Prometheus alert rules folder at /var/lib/juju/agents/unit-wordpress-k8s-0/charm/src/prometheus_alert_rules: directory does not exist
2024-12-10T06:04:35.477Z [container-agent] 2024-12-10 06:04:35 DEBUG juju-log Emitting Juju event update_status.
Sadly I was not able to access the log files before the pod was restarted.
We believe this issue could lie in the monitoring configuration for this charm, maybe loki rules or similar?
Thank you
To Reproduce
Not sure how to reproduce, we suspect some log lines could not match the expected format.
Perhaps redeploying an application with the same versions as described below and attempting many access types would trigger the bug.
Environment
This charm runs on a Juju 2.9.49 controller with:
App Version Status Scale Charm Channel Rev Address Exposed Message
nginx-ingress-integrator 25.3.0 active 1 nginx-ingress-integrator latest/stable 81 REDACTED no
wordpress-k8s 6.4.3 active 2 wordpress-k8s latest/edge 114 REDACTED no
Relevant log output
2024-12-10T05:23:10.443Z [promtail] level=info ts=2024-12-10T05:23:10.433939586Z caller=filetarget.go:252 msg="watching new directory" directory=/var/log/apache2
2024-12-10T05:23:10.443Z [promtail] level=info ts=2024-12-10T05:23:10.434149788Z caller=filetarget.go:252 msg="watching new directory" directory=/var/log/apache2
2024-12-10T05:26:23.877Z [promtail] level=error ts=2024-12-10T05:26:23.873142101Z caller=logfmt.go:139 component=file_pipeline component=stage type=logfmt msg="failed to decode logfmt" err="logfmt syntax error at pos 412 on line 1: invalid quoted value"
2024-12-10T05:26:49.662Z [pebble] Exiting on terminated signal.
2024-12-10T05:26:49.685Z [pebble] Stopping all running services.
Although the Promtail parsing failure is interesting, Promtail should not constitute as the readiness or liveness check for the charm. This means that even if Promtail fails, it shouldn't trigger a pod restart. And, I am not sure that a parsing failure for a single line would cause Promtail to fail entirely. It is more likely that WordPress's own checks are failing. Could you try upgrading WordPress to revision 114 and increasing the health_check_timeout_seconds configuration of the charm to see if that reduces the chances of restarts?
Bug Description
Wordpress pods are getting restarted in a loop on one of our deployments, with liveness checks failing.
We are not sure of the root cause, but could gather some evidence of Prometheus-related issues.
The first one seems to be related to
promtail
failing on parse errors while (as far as I understand these log lines) scraping Apache log files, which then triggerspebble
to restart all services:The second issue related to Prometheus seems a red-herring (at least for the pod restarts) as it is followed by various events and not just
stop
:Sadly I was not able to access the log files before the pod was restarted.
We believe this issue could lie in the monitoring configuration for this charm, maybe loki rules or similar?
Thank you
To Reproduce
Not sure how to reproduce, we suspect some log lines could not match the expected format.
Perhaps redeploying an application with the same versions as described below and attempting many access types would trigger the bug.
Environment
This charm runs on a Juju 2.9.49 controller with:
Relevant log output
https://pastebin.canonical.com/p/m3pKR5kKd5/
The text was updated successfully, but these errors were encountered: