Replies: 4 comments
-
link #1004 (in this case the I am currently testing if the system-upgrade-controller even needs to perform a drain/cordon - I think an in-place update of the k3s-agent should suffice, without draining all pods. |
Beta Was this translation helpful? Give feedback.
-
Would be great if no drain is necessary. |
Beta Was this translation helpful? Give feedback.
-
For major updates you definitely want to drain all pods, for patch updates it is surely debatable. I guess a drain by default is definitely needed therefore. I think #1338 should help here, when setting
That should actually already happen - at least for me, only one node at a time is drained. If the pods can be relocated to other nodes, there shouldn't be an issue. |
Beta Was this translation helpful? Give feedback.
-
@Talinx Please try @pat-s' tip above, if you feel confident enough that is. |
Beta Was this translation helpful? Give feedback.
-
Description
When pod disruption budgets apply to multiple pods, the system upgrade can fail due to not being able to completely drain a node.
Here is what happens:
The result is a cluster that is stuck wanting to upgrade with nodes that don't allow scheduling pods with as much pods evicted as possible. This effectively results in downtime of the hosted application until manually resolved (unless every critical pod has a PodDisruptionBudget).
This process is a bit random. E. g. if there are 2 worker nodes and after evicting pods until the disruption budget is reached one node has no pods then this node can be upgraded.
(In this case the Elasticsearch helm chart from Bitnami "caused" the problem. However this can happen with everything that introduces enough pod disruption budgets.)
Kube.tf file
Screenshots
Platform
Linux
Beta Was this translation helpful? Give feedback.
All reactions