System upgrade fails due to PodDisruptionBudget #1530

Talinx · 2024-10-26T10:03:44Z

Talinx
Oct 26, 2024

Description

When pod disruption budgets apply to multiple pods, the system upgrade can fail due to not being able to completely drain a node.

Here is what happens:

A system upgrade is started
Nodes are drained
The pod disruption budget of multiple pods is reached, preventing further draining
Every worker node has at least one pod that can't be evicted

The result is a cluster that is stuck wanting to upgrade with nodes that don't allow scheduling pods with as much pods evicted as possible. This effectively results in downtime of the hosted application until manually resolved (unless every critical pod has a PodDisruptionBudget).

This process is a bit random. E. g. if there are 2 worker nodes and after evicting pods until the disruption budget is reached one node has no pods then this node can be upgraded.

(In this case the Elasticsearch helm chart from Bitnami "caused" the problem. However this can happen with everything that introduces enough pod disruption budgets.)

Kube.tf file

module "kube-hetzner" {

  providers = {
    hcloud = hcloud
  }

  hcloud_token = var.hcloud_token

  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.14.5"

  ssh_public_key  = file(var.ssh_public_key)
  ssh_private_key = file(var.ssh_private_key)

  hcloud_ssh_key_id = var.ssh_key_id

  network_region = "eu-central"

  allow_scheduling_on_control_plane = false

  control_plane_nodepools = [
    {
      name        = "cax11-master",
      server_type = "cax11",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      count       = 3
    }
  ]

  agent_nodepools = [
    {
      name        = "cax21-worker",
      server_type = "cax21",
      location    = "fsn1",
      labels      = [],
      taints      = [],
      floating_ip = false,
      count       = 2
    }
  ]

  load_balancer_type     = "lb11"
  load_balancer_location = "fsn1"

  create_kubeconfig    = false
  create_kustomization = false

  traefik_values = <<EOT
globalArguments:
  - "--global.sendanonymoususage=false"
  EOT

}

Screenshots

Platform

Linux

pat-s · 2024-10-29T10:40:29Z

pat-s
Oct 29, 2024

link #1004 (in this case the longhorn-manager PDB prevents this as long as there are replicas on the node).

I am currently testing if the system-upgrade-controller even needs to perform a drain/cordon - I think an in-place update of the k3s-agent should suffice, without draining all pods.

0 replies

Talinx · 2024-10-29T11:32:41Z

Talinx
Oct 29, 2024
Author

Would be great if no drain is necessary.
Maybe sequentially draining and updating the nodes could also work?

0 replies

pat-s · 2024-10-29T11:42:49Z

pat-s
Oct 29, 2024

For major updates you definitely want to drain all pods, for patch updates it is surely debatable. I guess a drain by default is definitely needed therefore.

I think #1338 should help here, when setting system_upgrade_enable_eviction=false PDBs should be ignored, allowing the upgrade process to succeed.

Maybe sequentially draining and updating the nodes could also work?

That should actually already happen - at least for me, only one node at a time is drained. If the pods can be relocated to other nodes, there shouldn't be an issue.
Could it be that some nodeSelector/spread restrictions are preventing this? In this case, the pod can't be evicted and the upgrade plan fails.

0 replies

mysticaltech · 2024-11-07T10:03:34Z

mysticaltech
Nov 7, 2024
Maintainer

@Talinx Please try @pat-s' tip above, if you feel confident enough that is.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System upgrade fails due to PodDisruptionBudget #1530

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

System upgrade fails due to PodDisruptionBudget #1530

Talinx Oct 26, 2024

Description

Kube.tf file

Screenshots

Platform

Replies: 4 comments

pat-s Oct 29, 2024

Talinx Oct 29, 2024 Author

pat-s Oct 29, 2024

mysticaltech Nov 7, 2024 Maintainer

Talinx
Oct 26, 2024

pat-s
Oct 29, 2024

Talinx
Oct 29, 2024
Author

pat-s
Oct 29, 2024

mysticaltech
Nov 7, 2024
Maintainer