Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Load balancer initialisation fails #1581

Open
hvraven opened this issue Dec 4, 2024 · 0 comments
Open

[Bug]: Load balancer initialisation fails #1581

hvraven opened this issue Dec 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@hvraven
Copy link
Contributor

hvraven commented Dec 4, 2024

Description

I had some issues with the load balancer after an upgrade, switched to metal/klipper lb and I'm now trying to switch back.
This fails and gets stuck at Waiting for load-balancer to get an IP...

I dug a bit around and found some things, but am stuck now. tofu creates the expected load-balancer with the set name (k3s-nginx in my case). However the hcloud cloud controller does not configure it correctly. Instead it complains about the name being already used, log output below. When I check the cloud interface it shows two load balancer. One with the correct name, but unconfigured, a second with a random name and correctly configured (it points to all 4 currently configured workers). It appears the cloud controller is creating and configuring its own load balancer. This one is not known to tofu, most likely explaining the errors during setup.

There are some issues at the cloud controller describing similar issues, not sure who's to blame: hetznercloud/hcloud-cloud-controller-manager#811 & hetznercloud/hcloud-cloud-controller-manager#812

grafik

Parts of output of kubectl logs -f -n kube-system deployments/hcloud-cloud-controller-manager

I1204 11:57:49.384227       1 load_balancers.go:127] "ensure Load Balancer" op="hcloud/loadBalancers.EnsureLoadBalancer" service="nginx-ingress-nginx-controller" nodes=["k3s-agent-arm-small-dgi","k3s-agent-arm-small-qry","k3s-agent-small-1-ywn","k3s-agent-small-igs"]
I1204 11:57:49.386310       1 event.go:389] "Event occurred" object="nginx/nginx-ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
E1204 11:57:50.183708       1 controller.go:303] "Unhandled Error" err="error processing service nginx/nginx-ingress-nginx-controller (retrying with exponential backoff): failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLB: hcops/LoadBalancerOps.changeHCLBInfo: name is already used (uniqueness_error, c42eb92584201411)" logger="UnhandledError"
I1204 11:57:50.184612       1 event.go:389] "Event occurred" object="nginx/nginx-ingress-nginx-controller" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: hcloud/loadBalancers.EnsureLoadBalancer: hcops/LoadBalancerOps.ReconcileHCLB: hcops/LoadBalancerOps.changeHCLBInfo: name is already used (uniqueness_error, c42eb92584201411)"

Kube.tf file

module "kube-hetzner" {
  providers = {
    hcloud = hcloud
  }
  hcloud_token = var.hcloud_token


  source = "kube-hetzner/kube-hetzner/hcloud"

  version = "2.15.4"

  ssh_public_key = data.hcloud_ssh_key.admin_key.public_key
  ssh_private_key = null

  ssh_hcloud_key_label = "role=admin"

  ssh_max_auth_tries = 10

  hcloud_ssh_key_id = data.hcloud_ssh_key.admin_key.id


  control_plane_nodepools = [
    {
      name        = "control-plane-fsn1",
      server_type = "cx22",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 1
      zram_size   = "2G"
      kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]


    },
    {
      name        = "control-plane-nbg1",
      server_type = "cx22",
      location    = "nbg1",
      labels      = [],
      taints      = [],
      count       = 1
      zram_size   = "2G"
      kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]


    },
    {
      name        = "control-plane-hel1",
      server_type = "cx22",
      location    = "hel1",
      labels      = [],
      taints      = [],
      count       = 1
      zram_size   = "2G"
      kubelet_args = ["kube-reserved=cpu=250m,memory=1500Mi,ephemeral-storage=1Gi", "system-reserved=cpu=250m,memory=300Mi"]


    }
  ]

  agent_nodepools = [
    {
      name        = "agent-small",
      server_type = "cx22",
      location    = "fsn1",
      labels = [
        "node.longhorn.io/create-default-disk=config",
      ],
      taints = [],
      zram_size = "2G"
      nodes = {
        "0": {
        },
        "1": {
          server_type: "cx32",
          location = "nbg1",
        },
      }

    },

    {
      name        = "agent-arm-small",
      server_type = "cax21",
      location    = "fsn1",
      labels = [
        "node.longhorn.io/create-default-disk=config",
      ],
      zram_size = "2G"
      taints      = [],
      count       = 2,
    },
  ]

  enable_wireguard = true

  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"


  base_domain = "${var.subdomain}.${var.domain}"


  enable_csi_driver_smb = true


  enable_longhorn = true


  longhorn_namespace = "longhorn-system"

  longhorn_fstype = "ext4"

  longhorn_replica_count = 3

  ingress_controller = "nginx"


  system_upgrade_use_drain = true





  initial_k3s_channel = "stable"




  /* k3s_registries = <<-EOT
    mirrors:
      hub.my_registry.com:
        endpoint:
          - "hub.my_registry.com"
    configs:
      hub.my_registry.com:
        auth:
          username: username
          password: password
  EOT */

  additional_k3s_environment = {
    "CONTAINERD_HTTP_PROXY" : "http://localhost:1055",
    "CONTAINERD_HTTPS_PROXY" : "http://localhost:1055",
    "NO_PROXY" : "127.0.0.0/8,10.128.0.0/9,10.0.0.0/10,",
  }

  preinstall_exec = [
    "curl -vL https://registry.gitlab.com",
  ]

  k3s_exec_agent_args = "--kubelet-arg image-gc-high-threshold=50 --kubelet-arg=image-gc-low-threshold=45"


  extra_firewall_rules = [
  ]



  enable_cert_manager = true

  dns_servers = []

  lb_hostname = "${var.subdomain}.${var.domain}"

  extra_kustomize_parameters = {
    vpn_domain = var.vpn_domain,
  }

  create_kubeconfig = false

  create_kustomization = false

  longhorn_values = <<EOT
defaultSettings:
  createDefaultDiskLabeledNodes: true
  defaultDataPath: /var/longhorn
  node-down-pod-deletion-policy: delete-both-statefulset-and-deployment
persistence:
  defaultFsType: ext4
  defaultClassReplicaCount: 3
  defaultClass: true
  EOT

}

Screenshots

No response

Platform

Linux

@hvraven hvraven added the bug Something isn't working label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant