You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem Statement:
Customer recently rotated their certificates for the vsphere datacenter which were about to be expired. This updated their datacenter with a new thumbprint. They are interested in knowing how to update their clusters to use the new thumbprint for the vsphere datacenter config.
Issue with running eksctl anywhere upgrade command:
When we run upgrade command on management cluster with new thumbprint, the eks-a controller does update the mgmt cluster's vspheredatacenterconfig object with the new thumbprint but the workload cluster's vspheredatacenterconfig object still has the old thumbprint so the vspheredatacenter reconciler will throw the following thumbprint mismatch error in the eks-a controller logs.
Solution:
From EKS-A v0.17.0 onwards, vSphere datacenter config thumbprint field has been made mutable for upgrades. But KinD-less upgrades was introduced in v0.18.0. In order to update all the existing clusters with the new thumbprint, you need to follow the following steps depending on your cluster's EKS-A version:
For all EKS-A versions:
Before starting the upgrade process, please take the backup of your management cluster as well as all workload clusters by following the steps documented here.
For EKS-A v0.18.0 and above:
Pause the eks-a cluster controller for all the workload clusters only:
Create a vSphereDatacenterConfig manifest file with the exact same vspheredatacenterconfig for existing management cluster as well as all the workload clusters except that the thumbprint should be the new thumbprint:
Apply the above manifest file to the management cluster:
kubectl apply -f {manifest-file-name}.yaml
Verify that the new machines are being rolled out only for the management cluster:
kubectl get machines -A -w
If any machine is stuck in the Provisioning phase, restart the capv controller manager pod:
kubectl delete --force -n capv-system pod capv-controller-manager-84bdf678db-kdvx8
Once all the machines are updated, verify that the thumbprint field is updated in the config maps and objects for management cluster:
kubectl get cm -n kube-system vsphere-cloud-config -o yaml
kubectl get cm -n eksa-system mgmt-cpi-manifests -o yaml
kubectl get vspheredatacenterconfigs.anywhere.eks.amazonaws.com mgmt -o yaml
kubectl get vsphereclusters.infrastructure.cluster.x-k8s.io -n eksa-system mgmt -oyaml
kubectl get vspherevms.infrastructure.cluster.x-k8s.io -n eksa-system mgmt-65cv5 -oyaml
Verify that vsphere machine templates and VMs are created with the new thumbprint for the management cluster:
kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n eksa-system
kubectl get vspheremachines.infrastructure.cluster.x-k8s.io -n eksa-system
Unpause the eks-a cluster controller for each of the workload clusters one by one:
Create a vSphereDatacenterConfig manifest file with the exact same vspheredatacenterconfig for existing management cluster as well as all the workload clusters except that the thumbprint should be the new thumbprint:
Create vspheremachinetemplate manifest files with the new thumbprint for etcd, control plane and worker machines:
kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n eksa-system mgmt-etcd-template-{most-recent-template-number} -oyaml > etcd-template.yaml
kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n eksa-system mgmt-control-plane-template-{most-recent-template-number} -oyaml > cp-template.yaml
kubectl get vspheremachinetemplates.infrastructure.cluster.x-k8s.io -n eksa-system mgmt-md-0-{most-recent-template-number} -oyaml > md-template.yaml
Update the spec.template.spec.thumbprint field with the new thumbprint and also update the metadata.name field with new template number in the above manifest files:
Modify the spec.machineTemplate.infrastructureRef.name field in the KubeadmControlPlane object to point to the new cp-template for the management cluster:
Problem Statement:
Customer recently rotated their certificates for the vsphere datacenter which were about to be expired. This updated their datacenter with a new thumbprint. They are interested in knowing how to update their clusters to use the new thumbprint for the vsphere datacenter config.
Issue with running eksctl anywhere upgrade command:
When we run upgrade command on management cluster with new thumbprint, the eks-a controller does update the mgmt cluster's vspheredatacenterconfig object with the new thumbprint but the workload cluster's vspheredatacenterconfig object still has the old thumbprint so the vspheredatacenter reconciler will throw the following thumbprint mismatch error in the eks-a controller logs.
Error message:
Solution:
From EKS-A v0.17.0 onwards, vSphere datacenter config thumbprint field has been made mutable for upgrades. But KinD-less upgrades was introduced in v0.18.0. In order to update all the existing clusters with the new thumbprint, you need to follow the following steps depending on your cluster's EKS-A version:
For all EKS-A versions:
Before starting the upgrade process, please take the backup of your management cluster as well as all workload clusters by following the steps documented here.
For EKS-A v0.18.0 and above:
export KUBECONFIG=mgmt/mgmt-eks-a-cluster.kubeconfig kubectl annotate cluster w01 anywhere.eks.amazonaws.com/paused=true kubectl annotate cluster w02 anywhere.eks.amazonaws.com/paused=true
Provisioning
phase, restart the capv controller manager pod:Repeat steps 4,5,6,7 for w01
Repeat steps 4,5,6,7 for w02
For EKS-A v0.17.x:
export KUBECONFIG=mgmt/mgmt-eks-a-cluster.kubeconfig kubectl annotate cluster mgmt anywhere.eks.amazonaws.com/paused=true kubectl annotate cluster w01 anywhere.eks.amazonaws.com/paused=true kubectl annotate cluster w02 anywhere.eks.amazonaws.com/paused=true
Provisioning
phase, restart the capv controller manager pod:Wait until all the new machines are rolled out with new thumbprint for w01
Wait until all the new machines are rolled out with new thumbprint for w02
The text was updated successfully, but these errors were encountered: