Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When migrating a pre-existing cluster to containerd, the AWS VPC CNI DaemonSet still mounts dockershim #11589

Closed
ari-becker opened this issue May 24, 2021 · 3 comments · Fixed by #11590

Comments

@ari-becker
Copy link
Contributor

1. What kops version are you running? The command kops version, will display
this information.

1.20.1

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.19.10

3. What cloud provider are you using?
AWS

4. What commands did you run? What is the simplest way to reproduce this issue?
Take a long-running cluster that uses the AWS VPC CNI define the cluster's spec.containerRuntime as containerd, and begin rolling the nodes.

5. What happened after the commands executed?
Cluster refuses to validate because of aws-node pods in CrashLoopBackoff.
Upon examination, we see in the describe output (partial output supplied, for clarity):

Containers:                                                                                                                                              
  aws-node:                                                                                                                                              
    Container ID:   containerd://5c55c39d43ba87a3e968c524e54dc3b6675e222d9289a6fe61e117a08d7140c0                                                        
    Image:          602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.7.10                                                                  
    State:          Running                                                                                                                              
    Ready:          False                                                                                                                                
    Mounts:                                                                                                                                              
      /host/etc/cni/net.d from cni-net-dir (rw)                                                                                                          
      /host/opt/cni/bin from cni-bin-dir (rw)                                                                                                            
      /host/var/log/aws-routed-eni from log-dir (rw)                                                                                                     
      /run/xtables.lock from xtables-lock (rw)                                                                                                           
      /var/run/aws-node from run-dir (rw)                                                                                                                
      /var/run/dockershim.sock from dockershim (rw)                                                                                                      
      /var/run/secrets/kubernetes.io/serviceaccount from aws-node-token-98jgt (ro)                                                                       
Conditions:                                              
  Type              Status                               
  Initialized       True                                 
  Ready             False                                
  ContainersReady   False                                
  PodScheduled      True                                 
Volumes:                                                 
  cni-bin-dir:                                           
    Type:          HostPath (bare host directory volume)                    
    Path:          /opt/cni/bin                          
    HostPathType:                                        
  cni-net-dir:                                           
    Type:          HostPath (bare host directory volume)                    
    Path:          /etc/cni/net.d                        
    HostPathType:                                        
  dockershim:                                            
    Type:          HostPath (bare host directory volume)                    
    Path:          /var/run/dockershim.sock              
    HostPathType:                                        
  xtables-lock:                                          
    Type:          HostPath (bare host directory volume)                    
    Path:          /run/xtables.lock                     
    HostPathType:                                        
  log-dir:                                               
    Type:          HostPath (bare host directory volume)                    
    Path:          /var/log/aws-routed-eni               
    HostPathType:  DirectoryOrCreate                     
  run-dir:                                               
    Type:          HostPath (bare host directory volume)                    
    Path:          /var/run/aws-node                     
    HostPathType:  DirectoryOrCreate                     
  aws-node-token-98jgt:                                  
    Type:        Secret (a volume populated by a Secret)                    
    SecretName:  aws-node-token-98jgt                    
    Optional:    false                                   

6. What did you expect to happen?
#10502 changed the mount to containerd.sock, which I presume is mandatory when the underlying node has been upgraded to containerd.

This is a tricky upgrade on a production cluster as we don't want to update all of the pods on the DaemonSet before the underlying nodes have been rolled. Ideally, there would be two DaemonSets, one that selects onto dockershim nodes and one that selects onto containerd nodes, with the former DaemonSet cleaned up after the upgrade, or some other way of ensuring that the DaemonSet has access to the relevant socket as the nodes are upgraded.

@johngmyers
Copy link
Member

Sounds like the DaemonSet should have OnDelete updateStrategy.

@ari-becker
Copy link
Contributor Author

@johngmyers I concur, I do see when running update cluster:

 ManagedFile/mycluster-addons-networking.amazon-vpc-routed-eni-k8s-1.16
       Contents            
                               ...
                                         name: cni-net-dir
                                       - hostPath:
                               +           path: /run/containerd/containerd.sock
                               -           path: /var/run/dockershim.sock
                                         name: dockershim
                                       - hostPath:
                               ...

which means that the containerd.sock should be mounted correctly. However, updateStrategy is rolling and not OnDelete.

@johngmyers
Copy link
Member

Calico, Canal, and Weave are other CNIs that are still set to RollingUpdate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants