Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Metrics stop working after some time (with solution) #636

Open
QthePirate opened this issue Aug 15, 2024 · 2 comments
Open

Prometheus Metrics stop working after some time (with solution) #636

QthePirate opened this issue Aug 15, 2024 · 2 comments

Comments

@QthePirate
Copy link

Bug Description

After running the COS-Lite stack on MicroK8s for some time I would notice that my dashboard (which at this time only contained ceph and machine metrics from grafana-agent) would stop displaying data. It took me a bit to figure out what was going on until I realized I should look at the prometheus storage space.

According to the default config: maximum_retention_size is 80%.

This led me to look at the default size for the Prometheus PVC, which was 1G.

I solved this continuing issue by manually increasing the Prometheus PVC size in K8s via kubectl -n cos edit pvc

This is just a workaround. I would recommend either:

  1. Increasing the default size on deployment

  2. Mention that this needs to increase in documentation (including here: https://charmhub.io/topics/canonical-observability-stack/tutorials/install-microk8s)

Even in my small lab environment running this, 1G is absolutely not enough space for a continually running Prometheus instance.

To Reproduce

juju deploy cos-lite

juju integrate prometheus:metrics-endpoint

Environment

Prometheus-k8s Channel: latest/stable Rev: 189
MicroK8s v1.30.3 revision 7040
Juju 3.5.3 (Found on 3.5.2)

Relevant log output

There we're no logs able to be found that were relevant to the issue.

Additional context

No response

@sed-i
Copy link
Contributor

sed-i commented Aug 15, 2024

1Gi is a default value, and is as arbitrary as 20Gi. If the admin is not aware of this limit then they'd encounter the "storage full" problem eventually anyway. We'd be just delaying the problem.

But you're making a good point about mentioning it in the doc in addition to the bit about the overlay.

@QthePirate
Copy link
Author

@sed-i You're right, it is arbitrary.

The other thing that could help would be useful log info. There was nothing in the logs that indicated that this was why my data stopped showing up. I made a guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants