When migrating a pre-existing cluster to containerd, the AWS VPC CNI DaemonSet still mounts dockershim #11589

ari-becker · 2021-05-24T14:01:10Z

1. What kops version are you running? The command kops version, will display
this information.
1.20.1

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.19.10

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Take a long-running cluster that uses the AWS VPC CNI define the cluster's spec.containerRuntime as containerd, and begin rolling the nodes.

5. What happened after the commands executed?
Cluster refuses to validate because of aws-node pods in CrashLoopBackoff.
Upon examination, we see in the describe output (partial output supplied, for clarity):

Containers:                                                                                                                                              
  aws-node:                                                                                                                                              
    Container ID:   containerd://5c55c39d43ba87a3e968c524e54dc3b6675e222d9289a6fe61e117a08d7140c0                                                        
    Image:          602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.7.10                                                                  
    State:          Running                                                                                                                              
    Ready:          False                                                                                                                                
    Mounts:                                                                                                                                              
      /host/etc/cni/net.d from cni-net-dir (rw)                                                                                                          
      /host/opt/cni/bin from cni-bin-dir (rw)                                                                                                            
      /host/var/log/aws-routed-eni from log-dir (rw)                                                                                                     
      /run/xtables.lock from xtables-lock (rw)                                                                                                           
      /var/run/aws-node from run-dir (rw)                                                                                                                
      /var/run/dockershim.sock from dockershim (rw)                                                                                                      
      /var/run/secrets/kubernetes.io/serviceaccount from aws-node-token-98jgt (ro)                                                                       
Conditions:                                              
  Type              Status                               
  Initialized       True                                 
  Ready             False                                
  ContainersReady   False                                
  PodScheduled      True                                 
Volumes:                                                 
  cni-bin-dir:                                           
    Type:          HostPath (bare host directory volume)                    
    Path:          /opt/cni/bin                          
    HostPathType:                                        
  cni-net-dir:                                           
    Type:          HostPath (bare host directory volume)                    
    Path:          /etc/cni/net.d                        
    HostPathType:                                        
  dockershim:                                            
    Type:          HostPath (bare host directory volume)                    
    Path:          /var/run/dockershim.sock              
    HostPathType:                                        
  xtables-lock:                                          
    Type:          HostPath (bare host directory volume)                    
    Path:          /run/xtables.lock                     
    HostPathType:                                        
  log-dir:                                               
    Type:          HostPath (bare host directory volume)                    
    Path:          /var/log/aws-routed-eni               
    HostPathType:  DirectoryOrCreate                     
  run-dir:                                               
    Type:          HostPath (bare host directory volume)                    
    Path:          /var/run/aws-node                     
    HostPathType:  DirectoryOrCreate                     
  aws-node-token-98jgt:                                  
    Type:        Secret (a volume populated by a Secret)                    
    SecretName:  aws-node-token-98jgt                    
    Optional:    false

6. What did you expect to happen?
#10502 changed the mount to containerd.sock, which I presume is mandatory when the underlying node has been upgraded to containerd.

This is a tricky upgrade on a production cluster as we don't want to update all of the pods on the DaemonSet before the underlying nodes have been rolled. Ideally, there would be two DaemonSets, one that selects onto dockershim nodes and one that selects onto containerd nodes, with the former DaemonSet cleaned up after the upgrade, or some other way of ensuring that the DaemonSet has access to the relevant socket as the nodes are upgraded.

The text was updated successfully, but these errors were encountered:

johngmyers · 2021-05-24T14:44:15Z

Sounds like the DaemonSet should have OnDelete updateStrategy.

ari-becker · 2021-05-24T14:59:41Z

@johngmyers I concur, I do see when running update cluster:

 ManagedFile/mycluster-addons-networking.amazon-vpc-routed-eni-k8s-1.16
       Contents            
                               ...
                                         name: cni-net-dir
                                       - hostPath:
                               +           path: /run/containerd/containerd.sock
                               -           path: /var/run/dockershim.sock
                                         name: dockershim
                                       - hostPath:
                               ...

which means that the containerd.sock should be mounted correctly. However, updateStrategy is rolling and not OnDelete.

johngmyers · 2021-05-24T15:36:24Z

Calico, Canal, and Weave are other CNIs that are still set to RollingUpdate.

johngmyers mentioned this issue May 24, 2021

Use the OnDelete updateStrategy for AWS VPC CNI DaemonSet #11590

Merged

k8s-ci-robot closed this as completed in #11590 May 25, 2021

hakman mentioned this issue Jun 18, 2021

Switching between container runtimes on existing cluster breaks aws-node pods #11798

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When migrating a pre-existing cluster to containerd, the AWS VPC CNI DaemonSet still mounts dockershim #11589

When migrating a pre-existing cluster to containerd, the AWS VPC CNI DaemonSet still mounts dockershim #11589

ari-becker commented May 24, 2021

johngmyers commented May 24, 2021

ari-becker commented May 24, 2021

johngmyers commented May 24, 2021

When migrating a pre-existing cluster to containerd, the AWS VPC CNI DaemonSet still mounts dockershim #11589

When migrating a pre-existing cluster to containerd, the AWS VPC CNI DaemonSet still mounts dockershim #11589

Comments

ari-becker commented May 24, 2021

johngmyers commented May 24, 2021

ari-becker commented May 24, 2021

johngmyers commented May 24, 2021