Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use containerd cri plugin from init #79

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

detiber
Copy link

@detiber detiber commented May 4, 2018

This PR attempts to address #74

The kernel, init, and runc image updates are a bit extraneous for this changeset, but the containerd update is required.

kubelet changes:

  • update to latest stable k8s release
  • update to latest cri tools
  • onboot hack to create /var/lib/cni/{bin,conf}
    • removing the cri-containerd image caused the directories to not exist and containerd appears to be processing the runtime mounts before the mkdir entries.

I also introduced files necessary to deploy calico networking rather than weave, since weave currently does not work with these changes. The issue appears to be with the way weave cni plugin uses nsenter.

detiber added 4 commits May 4, 2018 09:48
- update kernel, init, runc, and containerd images

Signed-off-by: Jason DeTiberus <[email protected]>
- rev k8s version to v1.10.2
- rev critools version

Signed-off-by: Jason DeTiberus <[email protected]>
- remove cri-containerd package
- kubelet container mounting hack
  - previously cri-containerd created the /var/lib/cni/{bin,conf}
    directories and containerd appears to process runtime mounts
    before mkdir entries

Signed-off-by: Jason DeTiberus <[email protected]>
Signed-off-by: Jason DeTiberus <[email protected]>
@detiber
Copy link
Author

detiber commented May 4, 2018

Currently all of the cri rtf tests are failing with flakiness starting the static pod manifests, there appears to be some flakiness around the interaction between the kubelet and containerd for pods that fail to come up and should be restarted, I'm seeing errors similar to:

I0504 14:43:30.680271     638 kuberuntime_manager.go:513] Container {Name:kube-scheduler Image:k8s.gcr.io/kube-scheduler-amd64:v1.10.2 Command:[kube-scheduler --address=127.0.0.1 --leader-elect=true --kubeconfig=/etc/kubernetes/scheduler.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[cpu:{i:{value:100 scale:-3} d:{Dec:<nil>} s:100m Format:DecimalSI}]} VolumeMounts:[{Name:kubeconfig ReadOnly:true MountPath:/etc/kubernetes/scheduler.conf SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/healthz,Port:10251,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:15,TimeoutSeconds:15,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:8,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
linuxkit-b2a001ae29c0:/# Connection to localhost closed by remote host.

I suspect this might be similar to the flakes mentioned in #64

A bit of digging, and I suspect that containerd/cri#733 might be coming into play.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants