Metrics are still received after Grafana Agent stops #643

acsgn · 2024-10-18T12:31:24Z

Bug Description

Hi,
While doing some testing for a customer, I realized that the metrics from machine deployments continue to be received after the Grafana Agent is already stopped or the machine itself is down. This only stops after 4 minutes which in our use, renders the alerts not so useful.

To Reproduce

multipass launch --cpus 4 --memory 8G --disk 30G --name cos-test 22.04
multipass shell cos-test

HOST_IP=$(hostname -I | cut -d ' ' -f 1)

lxd init --auto
lxc network set lxdbr0 ipv6.address none

sudo snap install microk8s --channel 1.30-strict
sudo microk8s enable hostpath-storage
sudo microk8s enable metallb:$HOST_IP-$HOST_IP

sudo snap install juju
mkdir -p ~/.local/share
juju bootstrap localhost overlord
sudo microk8s config | juju add-k8s k8s --controller overlord

juju add-model zookeeper localhost
juju add-model cos k8s

juju deploy -m zookeeper zookeeper
juju deploy -m zookeeper grafana-agent
juju relate -m zookeeper zookeeper grafana-agent

juju deploy -m cos cos-lite --trust
juju offer cos.grafana:grafana-dashboard
juju offer cos.loki:logging
juju offer cos.prometheus:receive-remote-write

juju consume -m zookeeper cos.prometheus
juju consume -m zookeeper cos.loki
juju consume -m zookeeper cos.grafana
juju relate -m zookeeper grafana-agent grafana
juju relate -m zookeeper grafana-agent loki
juju relate -m zookeeper grafana-agent prometheus

juju run -m cos grafana/leader get-admin-password
## Collect metrics for a while and browse Grafana before proceeding

juju ssh -m zookeeper grafana-agent/leader
date && sudo snap stop grafana-agent
exit

## Go back to Grafana and observe that the metrics are still "received" for 4 minutes after the service stops
## You can also try stoppping the LXC container, both causes to the same ghost metrics
## You can use the Explore tab and following metric with Prometheus as the data source
## zookeeper_QuorumSize OR up{juju_application="zookeeper"}

exit
multipass stop cos-test
multipass delete --purge cos-test

Environment

The reproduce steps are using multipass, I encountered the same behavior on local machine, GCP and AWS. All snaps and charms use the latest/stable.

Relevant log output

No logs are available

Additional context

No response

The text was updated successfully, but these errors were encountered:

acsgn added Status: Triage Type: Bug labels Oct 18, 2024

Abuelodelanada added the good first issue label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics are still received after Grafana Agent stops #643

Metrics are still received after Grafana Agent stops #643

acsgn commented Oct 18, 2024

Metrics are still received after Grafana Agent stops #643

Metrics are still received after Grafana Agent stops #643

Comments

acsgn commented Oct 18, 2024

Bug Description

To Reproduce

Environment

Relevant log output

Additional context