Install systemd unit to run cleanup on shutdown or reboot #10431

agracey · 2024-06-28T18:11:52Z

Is your feature request related to a problem? Please describe.

When rebooting a node, the system shows

[FAILED] Failed unmounting /etc
[Failed] Failed unmounting /var

for a few minutes before timing out and rebooting.

This is a problem for edge use-cases where you need to limit downtime of single-node clusters.

Describe the solution you'd like

I would like for the k3s installer to include a systemd unit file that runs k3s-killall.sh when a shutdown or reboot is requested. This has been discussed as a work around previously in: #7362 (comment)

Describe alternatives you've considered

Just waiting means an unpredictable outage
Running k3s-killall.sh manually means that any tooling that manages the system needs to know about it (which is error prone)
Writing the service yourself means that some users are going to not see the documentation and raise new issues

The text was updated successfully, but these errors were encountered:

pocketbroadcast · 2024-12-16T09:58:37Z

I experienced the same issues when rebooting our nodes and we agree, there is a requirement for an upstream solution (at least a proposal) to fix that.

k3s-killall.sh (in addition to the name) seems critical to me since it SIGKILL's the processes in the container.
This could potentially lead to loss of data or inconsistencies.

We switched to SIGTERM in the k3s-killall.sh script for some grace period and SIGKILL afterwards, to increase chances of a "clean" shutdown. While this is a simple approach and works in our case - we feel it's not the Kubernetes intended way to take a node off for planned maintenance.
That's why we started experimenting to drain the node and uncordon it after a reboot.
Unfortunately, that led to some (occasional) issues with the kube-scheduler when stopping k3s after draining the node.

[Unit]
Description=K3s Container Startup and Cleanup Handling
After=k3s.service

[Service]
Type=oneshot
RemainAfterExit=yes

ExecStart=/usr/local/bin/k3s-node-management.sh start
ExecStop=/usr/local/bin/k3s-node-management.sh stop

TimeoutStopSec=60

[Install]
WantedBy=multi-user.target

k3s-node-management.sh stripped down to the essential parts - omitted boilerplate to wait for kube api ready:

...
case "$1" in
  start)
    echo "Uncordon node $NODE_NAME..."
    ${KUBECTL} uncordon "$NODE_NAME" && echo "Node $NODE_NAME uncordoned successfully."
    ;;

  stop)
    echo "Draining node $NODE_NAME..."
    ${KUBECTL} drain "$NODE_NAME" --disable-eviction --ignore-daemonsets --delete-emptydir-data --force
    ;;
esac
...

Note: The code shared here shall not be considered stable or tested!

brandond · 2024-12-16T10:13:16Z

fwiw, that error isn't really an error. Some Kubernetes components react better than others to their context being cancelled by a shutdown signal, and will log odd errors while exiting. Usually you don't see these because when your running them in a container (as other distros do) the container is exiting anyway.

github-project-automation bot added this to K3s Development Jun 28, 2024

github-project-automation bot moved this to New in K3s Development Jun 28, 2024

brandond added kind/enhancement An improvement to existing functionality kind/internal area/uninstall priority/medium labels Jun 28, 2024

brandond moved this from New to Accepted in K3s Development Jun 28, 2024

brandond added this to the v1.30.4+k3s1 milestone Jun 28, 2024

agracey mentioned this issue Jun 28, 2024

Install systemd unit to run cleanup on shutdown or reboot rancher/rke2#6268

Open

caroline-suse-rancher removed this from the v1.30.4+k3s1 milestone Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install systemd unit to run cleanup on shutdown or reboot #10431

Install systemd unit to run cleanup on shutdown or reboot #10431

agracey commented Jun 28, 2024

pocketbroadcast commented Dec 16, 2024

brandond commented Dec 16, 2024

Install systemd unit to run cleanup on shutdown or reboot #10431

Install systemd unit to run cleanup on shutdown or reboot #10431

Comments

agracey commented Jun 28, 2024

pocketbroadcast commented Dec 16, 2024

brandond commented Dec 16, 2024