Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.31] - Secondary etcd-only nodes do not reconnect to apiserver after outage if joined against an etcd-only node #11320

Closed
brandond opened this issue Nov 14, 2024 · 1 comment
Assignees
Milestone

Comments

@brandond
Copy link
Member

Backport fix for Secondary etcd-only nodes do not reconnect to apiserver after outage if joined against an etcd-only node

@aganesh-suse
Copy link

Validated on release-1.31 branch with commit 53d4dd8

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent - 2 etcd, 1 cp, 1 agent node config.
or 
3 etcd, 2 cp, 1 agent configuration.

P.S - all nodes pointing to the main/first etcd server

Config.yaml:

etcd only node config.yaml:

token: xxxx
disable-apiserver: true
disable-controller-manager: true
disable-scheduler: true
node-taint:
- node-role.kubernetes.io/etcd:NoExecute
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server
debug: true

Control plane only node config.yaml:

$ cat /etc/rancher/k3s/config.yaml 
token: xxxx
server: https://1.1.1.1:6443
disable-etcd: true
node-taint:
- node-role.kubernetes.io/control-plane:NoSchedule
write-kubeconfig-mode: "0644"
node-external-ip: 2.2.2.2
node-label:
- k3s-upgrade=server
debug: true

Testing Steps

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='53d4dd85f57cb39062f221c326056e5d8948f3bb' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Restart control plane node. Wait for 5 minutes.
  2. Verify cluster status - that all nodes are in Ready state:
kubectl get nodes -o wide

Replication Results:

  • k3s version used for replication:
$ k3s -v
k3s version v1.31.2+k3s1 (6da20424)
go version go1.22.8
$ kubectl get nodes
time="2024-11-15T02:58:54Z" level=debug msg="Asset dir /var/lib/rancher/k3s/data/3345fdb78d4ac6f55d7d70b8ec401ed32d58d5af6b2e11412cd5a2d3c50ff3d1"
time="2024-11-15T02:58:54Z" level=debug msg="Running /var/lib/rancher/k3s/data/3345fdb78d4ac6f55d7d70b8ec401ed32d58d5af6b2e11412cd5a2d3c50ff3d1/bin/kubectl [kubectl get nodes]"
NAME               STATUS     ROLES                  AGE   VERSION
ip-172-31-0-37     NotReady   <none>                 14m   v1.31.2+k3s1
ip-172-31-12-163   Ready      etcd                   15m   v1.31.2+k3s1
ip-172-31-4-1      Ready      control-plane,master   15m   v1.31.2+k3s1
ip-172-31-9-130    NotReady   etcd                   15m   v1.31.2+k3s1

Validation Results:

  • k3s version used for validation:
$ k3s -v
k3s version v1.31.2+k3s-53d4dd85 (53d4dd85)
go version go1.22.8
$ sudo /usr/local/bin/kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes 
time="2024-11-15T03:24:11Z" level=debug msg="Asset dir /var/lib/rancher/k3s/data/5ace11c9404c91159fe874950a750d8dfd13b3709d156e3b40be31278d8787d4"
time="2024-11-15T03:24:11Z" level=debug msg="Running /var/lib/rancher/k3s/data/5ace11c9404c91159fe874950a750d8dfd13b3709d156e3b40be31278d8787d4/bin/kubectl [kubectl --kubeconfig /etc/rancher/k3s/k3s.yaml get nodes]"
NAME               STATUS   ROLES                  AGE   VERSION
ip-172-31-14-14    Ready    <none>                 13m   v1.31.2+k3s-53d4dd85
ip-172-31-15-153   Ready    control-plane,master   14m   v1.31.2+k3s-53d4dd85
ip-172-31-15-186   Ready    etcd                   14m   v1.31.2+k3s-53d4dd85
ip-172-31-7-89     Ready    etcd                   14m   v1.31.2+k3s-53d4dd85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

3 participants