Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart with TLS enabled breaks Alloy clustering #2195

Open
jaxklag opened this issue Nov 29, 2024 · 0 comments
Open

Helm chart with TLS enabled breaks Alloy clustering #2195

jaxklag opened this issue Nov 29, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jaxklag
Copy link

jaxklag commented Nov 29, 2024

What's wrong?

Hello,

I deploy Alloy using Helm charts
Problem is with Helm. Not with manual deployment.

When enabling TLS for Alloy Web UI, Alloy clustering is broken because communication between cluster peers stay in HTTP rather than HTTPS.

The solution would be to provide Helm charts keys to configure TLS clustering with cert/key/ca and other parameters of alloy clustering config

Cf. clusterign parameters : https://grafana.com/docs/alloy/latest/reference/cli/run/#clustering

Clustering parameters not available into Helm charts

--cluster.enable-tls: Specifies whether TLS should be used for communication between peers (default false).
--cluster.tls-ca-path: Path to the CA certificate file used for peer communication over TLS.
--cluster.tls-cert-path: Path to the certificate file used for peer communication over TLS.
--cluster.tls-key-path: Path to the key file used for peer communication over TLS.
--cluster.tls-server-name: Server name used for peer communication over TLS.

Moreover, another problem after activating TLS is that webhook URL of "watch" container is not updated to HTTPS

Cf. manifest link :

- --webhook-url=http://localhost:{{ $values.listenPort }}/-/reload

 args:
    - --volume-dir=/etc/alloy
    - --webhook-url=http://localhost:{{ $values.listenPort }}/-/reload

I think we could reuse the Helm key named "listenScheme" to determine if "webhook-url" container parameter must be "http" or "https"

Regards,

Steps to reproduce

Deploy Alloy via Helm chart and activate TLS and clustering in chart value file and in alloy config file (cf configs below).

Alloy clustering will be broken.

Remove HTTPS and upgrade Helm deployment. Clustering works fine.

Here are my version :

$ helm -n alloy list
NAME       NAMESPACE    REVISION    UPDATED                                    STATUS      CHART          APP VERSION
grafana    alloy        13          2024-11-29 10:11:57.437538995 +0100 CET    deployed    alloy-0.9.2    v1.4.3 

System information

Rocky Linux 9.3 / kernel 5.14.0-362.13.1.el9_3.x86_64 / Kubernetes v1.29.6

Software version

Grafana Alloy v1.4.3 / Helm chart v0.9.2 / Helm v3.15.4

Configuration

## Helm values

service:
  # -- Creates a Service for the controller's pods.
  enabled: true
  # -- Service type
  type: LoadBalancer

alloy:
  configMap:
    # -- Create a new ConfigMap for the config file.
    create: false
    # -- Name of existing ConfigMap to use. Used when create is false.
    name: alloy-config
    # -- Key in ConfigMap to get config from.
    key: config.alloy

  clustering:
    # -- Deploy Alloy in a cluster to allow for load distribution.
    enabled: true

    # -- Name for the Alloy cluster. Used for differentiating between clusters.
    name: "tstt9"

    # -- Name for the port used for clustering, useful if running inside an Istio Mesh
    portName: https

  listenScheme: HTTPS

  # -- Enables sending Grafana Labs anonymous usage stats to help improve Grafana
  # Alloy.
  enableReporting: false

  envFrom:
    - secretRef:
        name: wildcard.tstt9.admin
    - secretRef:
        name: tstt9-alloy-supa2-mimir-client

  # pour livedebugging dans la web UI Alloy
  stabilityLevel: experimental

## Alloy config added into a k8s configmap :

logging {
    level  = "info"
    format = "json"
}

// Experimental feature.
// Debug of data flowing into Alloy
livedebugging {
  enabled = true
}

// Config HTTPS Alloy
http {
  tls {
    cert_pem = sys.env("tls.crt")
    key_pem = sys.env("tls.key")
    min_version = "TLS12"
    client_auth_type = "NoClientCert"
  }
}

Logs

{"ts":"2024-11-29T09:53:35.621731564Z","level":"info","msg":"rejoining peers","service":"cluster","peers_count":3,"peers":"10.233.110.98:12345,10.233.123.225:12345,10.233.84.219:12345"}
{"ts":"2024-11-29T09:53:35.623128883Z","level":"error","msg":"failed to rejoin list of peers","service":"cluster","err":"failed to join memberlist: 3 errors occurred:\n\t* Failed to join 10.233.110.98:12345: Post \"http://10.233.110.98:12345/api/v1/ckit/transport/stream\": unexpected EOF\n\t* Failed to join 10.233.123.225:12345: Post \"http://10.233.123.225:12345/api/v1/ckit/transport/stream\": dial tcp 10.233.123.225:12345: connect: connection refused\n\t* Failed to join 10.233.84.219:12345: Post \"http://10.233.84.219:12345/api/v1/ckit/transport/stream\": unexpected EOF\n\n"}
@jaxklag jaxklag added the bug Something isn't working label Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant