Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dual-Stack node-ip broken in v1.31.4+k3s1 #11488

Closed
samip5 opened this issue Dec 20, 2024 · 13 comments
Closed

Dual-Stack node-ip broken in v1.31.4+k3s1 #11488

samip5 opened this issue Dec 20, 2024 · 13 comments

Comments

@samip5
Copy link

samip5 commented Dec 20, 2024

Environmental Info:
k3s version v1.31.4+k3s1 (a562d09)
go version go1.22.9

Node(s) CPU architecture, OS, and Version:
Linux plex-server 6.1.0-28-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.119-1 (2024-11-22) x86_64 GNU/Linux

Cluster Configuration:
Single-Node cluster

Describe the bug:
Unable to get two nodeIPs to show up, despite node-ip: 192.168.2.129,fd9d:7a72:44eb:a:a236:9fff:fe18:55fb in config.

Steps To Reproduce:

  • Installed K3s with dual-stack, using node-ip for both addresses.
cluster-init: true
cluster-cidr: 10.40.0.0/16,fd94:9bde:1ebb::/48
disable:
- flannel
- traefik
- servicelb
- metrics-server
- local-storage
- coredns
disable-cloud-controller: true
disable-kube-proxy: true
disable-network-policy: true
docker: false
etcd-expose-metrics: true
flannel-backend: none
kube-apiserver-arg:
- default-not-ready-toleration-seconds=20
- default-unreachable-toleration-seconds=20
kube-controller-manager-arg:
- bind-address=0.0.0.0
- node-monitor-period=4s
- node-monitor-grace-period=16s
kube-proxy-arg:
- metrics-bind-address=0.0.0.0
kube-scheduler-arg:
- bind-address=0.0.0.0
kubelet-arg:
- feature-gates=GracefulNodeShutdown=true
- node-status-update-frequency=4s
node-ip: 192.168.2.129,fd9d:7a72:44eb:a:a236:9fff:fe18:55fb
service-cidr: 10.41.0.0/16,2001:<snip>:16fd:9622::1:0/112
tls-san:
- 192.168.2.129
- 2001:<snip>:7424:8f00:a236:9fff:fe18:55fb
- fd9d:7a72:44eb:a:a236:9fff:fe18:55fb

Expected behavior:
I expected the node spec to include both addresses.
Actual behavior:
It only lists IPv4 address which breaks IPv6 with Cilium.

Additional context / logs:

I think the problem is related to kubernetes/enhancements#3705 & #8011

$ kubectl get nodes plex-server -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 192.168.2.129
Hostname: plex-server

@brandond
Copy link
Member

Seems to work for me. Are you trying to change it after the fact or something?

root@systemd-node-1:/# cat /etc/rancher/k3s/config.yaml
node-ip: 172.17.0.4,fd7c:53a5:aef5::242:ac11:4

root@systemd-node-1:/# kubectl get node -o wide
NAME             STATUS   ROLES                  AGE    VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
systemd-node-1   Ready    control-plane,master   2m5s   v1.31.4+k3s1   172.17.0.4    <none>        openSUSE Leap 15.4   6.8.0-1016-aws   containerd://1.7.23-k3s2

root@systemd-node-1:/# kubectl get nodes systemd-node-1 -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 172.17.0.4
InternalIP: fd7c:53a5:aef5::242:ac11:4
Hostname: systemd-node-1

@samip5
Copy link
Author

samip5 commented Dec 20, 2024

Seems to work for me. Are you trying to change it after the fact or something?

root@systemd-node-1:/# cat /etc/rancher/k3s/config.yaml

node-ip: 172.17.0.4,fd7c:53a5:aef5::242:ac11:4



root@systemd-node-1:/# kubectl get node -o wide

NAME             STATUS   ROLES                  AGE    VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME

systemd-node-1   Ready    control-plane,master   2m5s   v1.31.4+k3s1   172.17.0.4    <none>        openSUSE Leap 15.4   6.8.0-1016-aws   containerd://1.7.23-k3s2



root@systemd-node-1:/# kubectl get nodes systemd-node-1 -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'

InternalIP: 172.17.0.4

InternalIP: fd7c:53a5:aef5::242:ac11:4

Hostname: systemd-node-1

The problem is that it has vanished at some point and potentially the range is different than original so it's not setting it at all in node status which breaks other things that rely on that existing.

I don't know why it's not setting it as I don't see errors or any info about it in logs. One interface does have global and ula addresses.

@brandond
Copy link
Member

are the IPs both on the same interface? I'm not sure how the kubelet handles it if the internal IPs from different families are on different interfaces. It may expect them to all be on the same interface?

@samip5
Copy link
Author

samip5 commented Dec 20, 2024

are the IPs both on the same interface? I'm not sure how the kubelet handles it if the internal IPs from different families are on different interfaces. It may expect them to all be on the same interface?

On the same interface there's 3 IPs, two from a /64 IPv6 prefix, one of which is ULA prefix, one of which is global prefix and then there's a one IPv4 address.

@brandond
Copy link
Member

You might try deleting the node with kubectl delete node XX and see if it comes back when the node is recreated?

@samip5
Copy link
Author

samip5 commented Dec 20, 2024

You might try deleting the node with kubectl delete node XX and see if it comes back when the node is recreated?

On a single-node cluster? I don't know how that would work when etcd will not work after you delete the node object.

@samip5
Copy link
Author

samip5 commented Dec 20, 2024

I think I found the culprit, if cloud manager is not enabled, it doesn't get set to both.

@brandond
Copy link
Member

brandond commented Dec 20, 2024

On a single-node cluster? I don't know how that would work when etcd will not work after you delete the node object.

It works fine. The etcd node doesn't get deleted, and the kubernetes node is recreated on startup.

if cloud manager is not enabled, it doesn't get set to both.

Did you disable the embedded stub cloud-controller-manager without deploying a replacement? Yeah, things will be broken if you do that. That is covered in the docs.

@samip5
Copy link
Author

samip5 commented Dec 20, 2024

It works fine. The etcd node doesn't get deleted, and the kubernetes node is recreated on startup.

etcd went into a state where it couldn't recover due to "not enough members, rejecting leave" or something like that so I had to fully reset the cluster instead.

if cloud manager is not enabled, it doesn't get set to both.

Did you disable the embedded stub cloud-controller-manager without deploying a replacement? Yeah, things will be broken if you don't do that. That is covered in the docs.

I hadn't understood the point of it and it's not covered in docs. There's absolutely nothing about the ip fields that is relevant with the cloud controller that I could find.

@samip5 samip5 closed this as completed Dec 21, 2024
@github-project-automation github-project-automation bot moved this from New to Done Issue in K3s Development Dec 21, 2024
@brandond
Copy link
Member

brandond commented Dec 21, 2024

https://docs.k3s.io/networking/networking-services#deploying-an-external-cloud-controller-manager

K3s provides an embedded Cloud Controller Manager (CCM) stub that does the following:

  • Sets node InternalIP and ExternalIP address fields based on the --node-ip and --node-external-ip flags.

...
If you disable the built-in CCM and do not deploy and properly configure an external substitute, nodes will remain tainted and unschedulable.

@samip5
Copy link
Author

samip5 commented Dec 21, 2024

If you disable the built-in CCM and do not deploy and properly configure an external substitute, nodes will remain tainted and unschedulable.

Last updated on Dec 17, 2024 So not surprising that I hadn't understood it.

@brandond
Copy link
Member

brandond commented Dec 21, 2024

Yes, we do update our documentation regularly. However, that particular bit of verbiage has been there since February 2023.
https://github.com/k3s-io/docs/blame/main/docs/networking/networking-services.md#L107

@samip5
Copy link
Author

samip5 commented Dec 21, 2024

Which I hadn't looked at prior to Dec 2024 again. :)
User error in any case it seems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

2 participants