[Bug]: Load balancer initialisation fails #1581

hvraven · 2024-12-04T12:05:52Z

Description

I had some issues with the load balancer after an upgrade, switched to metal/klipper lb and I'm now trying to switch back.
This fails and gets stuck at Waiting for load-balancer to get an IP...

I dug a bit around and found some things, but am stuck now. tofu creates the expected load-balancer with the set name (k3s-nginx in my case). However the hcloud cloud controller does not configure it correctly. Instead it complains about the name being already used, log output below. When I check the cloud interface it shows two load balancer. One with the correct name, but unconfigured, a second with a random name and correctly configured (it points to all 4 currently configured workers). It appears the cloud controller is creating and configuring its own load balancer. This one is not known to tofu, most likely explaining the errors during setup.

There are some issues at the cloud controller describing similar issues, not sure who's to blame: hetznercloud/hcloud-cloud-controller-manager#811 & hetznercloud/hcloud-cloud-controller-manager#812

Parts of output of kubectl logs -f -n kube-system deployments/hcloud-cloud-controller-manager

I1204 11:57:49.384227       1 load_balancers.go:127] "ensure Load Balancer" op="hcloud/loadBalancers.EnsureLoadBalancer" service="nginx-ingress-nginx-controller" nodes=["k3s-agent-arm-small-dgi","k3s-agent-arm-small-qry","k3s-agent-small-1-ywn","k3s-agent-small-igs"]
I1204 11:57:49.386310       1 event.go:389] "Event occurred" object="nginx/nginx-ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E1204 11:57:50.183708       1 controller.go:303] "Unhandled Error" err="error processing service nginx/nginx-ingress-nginx-controller (retrying with exponential backoff): failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLB: hcops/LoadBalancerOps.changeHCLBInfo: name is already used (uniqueness_error, c42eb92584201411)" logger="UnhandledError"
I1204 11:57:50.184612       1 event.go:389] "Event occurred" object="nginx/nginx-ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLB: hcops/LoadBalancerOps.changeHCLBInfo: name is already used (uniqueness_error, c42eb92584201411)"

Kube.tf file

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token


  source = "kube-hetzner/kube-hetzner/hcloud"

  version = "2.15.4"

  ssh_public_key = data.hcloud_ssh_key.admin_key.public_key
  ssh_private_key = null

  ssh_hcloud_key_label = "role=admin"

  ssh_max_auth_tries = 10

  hcloud_ssh_key_id = data.hcloud_ssh_key.admin_key.id


  control_plane_nodepools = [
    {
      name        = "control-plane-fsn1",
      server_type = "cx22",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1
      zram_size   = "2G"
      kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]


    },
    {
      name        = "control-plane-nbg1",
      server_type = "cx22",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1
      zram_size   = "2G"
      kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]


    },
    {
      name        = "control-plane-hel1",
      server_type = "cx22",
      location    = "hel1",
      labels      = [],
      taints      = [],
      count       = 1
      zram_size   = "2G"
      kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]


    }
  ]

  agent_nodepools = [
    {
      name        = "agent-small",
      server_type = "cx22",
      location    = "fsn1",
      labels = [
        "node.longhorn.io/create-default-disk=config",
      ],
      taints = [],
      zram_size = "2G"
      nodes = {
        "0": {
        },
        "1": {
          server_type: "cx32",
          location = "nbg1",
        },
      }

    },

    {
      name        = "agent-arm-small",
      server_type = "cax21",
      location    = "fsn1",
      labels = [
        "node.longhorn.io/create-default-disk=config",
      ],
      zram_size = "2G"
      taints      = [],
      count       = 2,
    },
  ]

  enable_wireguard = true

  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"


  base_domain = "${var.subdomain}.${var.domain}"


  enable_csi_driver_smb = true


  enable_longhorn = true


  longhorn_namespace = "longhorn-system"

  longhorn_fstype = "ext4"

  longhorn_replica_count = 3

  ingress_controller = "nginx"


  system_upgrade_use_drain = true





  initial_k3s_channel = "stable"




  /* k3s_registries = <<-EOT
    mirrors:
      hub.my_registry.com:
        endpoint:
          - "hub.my_registry.com"
    configs:
      hub.my_registry.com:
        auth:
          username: username
          password: password
  EOT */

  additional_k3s_environment = {
    "CONTAINERD_HTTP_PROXY" : "http://localhost:1055",
    "CONTAINERD_HTTPS_PROXY" : "http://localhost:1055",
    "NO_PROXY" : "127.0.0.0/8,10.128.0.0/9,10.0.0.0/10,",
  }

  preinstall_exec = [
    "curl -vL https://registry.gitlab.com",
  ]

  k3s_exec_agent_args = "--kubelet-arg image-gc-high-threshold=50 --kubelet-arg=image-gc-low-threshold=45"


  extra_firewall_rules = [
  ]



  enable_cert_manager = true

  dns_servers = []

  lb_hostname = "${var.subdomain}.${var.domain}"

  extra_kustomize_parameters = {
    vpn_domain = var.vpn_domain,
  }

  create_kubeconfig = false

  create_kustomization = false

  longhorn_values = <<EOT
defaultSettings:
  createDefaultDiskLabeledNodes: true
  defaultDataPath: /var/longhorn
  node-down-pod-deletion-policy: delete-both-statefulset-and-deployment
persistence:
  defaultFsType: ext4
  defaultClassReplicaCount: 3
  defaultClass: true
  EOT

}

Screenshots

No response

Platform

Linux

The text was updated successfully, but these errors were encountered:

hvraven added the bug Something isn't working label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Load balancer initialisation fails #1581

[Bug]: Load balancer initialisation fails #1581

hvraven commented Dec 4, 2024 •

edited

Loading

[Bug]: Load balancer initialisation fails #1581

[Bug]: Load balancer initialisation fails #1581

Comments

hvraven commented Dec 4, 2024 • edited Loading

Description

Kube.tf file

Screenshots

Platform

hvraven commented Dec 4, 2024 •

edited

Loading