You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During upgrade, the CLI attempts to force apply k --force apply the Cluster spec at the "Applying eksa yaml resources to cluster" step. If there is some kind of validation error in the cluster webook at this point, it will stall indefinitely until it eventually times out. Normally, the k --force apply would delete the resource and just recreate it, but instead it the CLI gets stuck. This situation has primarily been observed when using Flux.
At this stage, the eksa-controller-manager outputs the following logs:
{"ts":1690488485167.4885,"caller":"controllers/cluster_controller.go:217","msg":"Reconciling cluster","v":0,"controller":"cluster","controllerGroup":"anywhere.eks.amazonaws.com","controllerKind":"Cluster","Cluster":{"name":"eksa-test-ae88bec","namespace":"default"},"namespace":"default","name":"eksa-test-ae88bec","reconcileID":"199cb160-1384-479f-b9f7-32dabcd6129e"}
{"ts":1690488485167.728,"caller":"controllers/cluster_controller.go:418","msg":"Updating cluster status","v":0,"controller":"cluster","controllerGroup":"anywhere.eks.amazonaws.com","controllerKind":"Cluster","Cluster":{"name":"eksa-test-ae88bec","namespace":"default"},"namespace":"default","name":"eksa-test-ae88bec","reconcileID":"199cb160-1384-479f-b9f7-32dabcd6129e"}
{"ts":1690488485168.4016,"caller":"controller/controller.go:329","msg":"Reconciler error","controller":"cluster","controllerGroup":"anywhere.eks.amazonaws.com","controllerKind":"Cluster","Cluster":{"name":"eksa-test-ae88bec","namespace":"default"},"namespace":"default","name":"eksa-test-ae88bec","reconcileID":"199cb160-1384-479f-b9f7-32dabcd6129e","err":"deleting self-managed clusters is not supported","errVerbose":"deleting self-managed clusters is not supported\ngithub.com/aws/eks-anywhere/controllers.(*ClusterReconciler).reconcileDelete\n\tgithub.com/aws/eks-anywhere/controllers/cluster_controller.go:444\ngithub.com/aws/eks-anywhere/controllers.(*ClusterReconciler).Reconcile\n\tgithub.com/aws/eks-anywhere/controllers/cluster_controller.go:262\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235\nruntime.goexit\n\truntime/asm_amd64.s:1598"}
Example Scenario
When upgrading from the latest minor release to v0.17.0, the CLI stalled while "Applying eksa yaml resources to cluster". In this case, both the newly added eksaVersion field and the bundlesRef were populated in submission to the kubeapi server, however, there is a web-hook validation that does not allow both to be set at the same time, so that was throwing an error in the kubeapi server pod. This issue was roughly patched by creating ClusterSpecGenerate so that we could generate a yaml where bundlesRef null so that it would not be omitted when submitted to the kubeapi server.
kubeapi-server pod
W0727 21:34:57.770016 1 dispatcher.go:216] rejected by webhook "validation.cluster.anywhere.amazonaws.com": &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"admission webhook \"validation.cluster.anywhere.amazonaws.com\" denied the request: Cluster.anywhere.eks.amazonaws.com \"eksa-test-ae88bec\" is invalid: spec: Invalid value: v1alpha1.ClusterSpec{KubernetesVersion:\"1.27\", ControlPlaneConfiguration:v1alpha1.ControlPlaneConfiguration{Count:1, Endpoint:(*v1alpha1.Endpoint)(nil), MachineGroupRef:(*v1alpha1.Ref)(nil), Taints:[]v1.Taint(nil), Labels:map[string]string(nil), UpgradeRolloutStrategy:(*v1alpha1.ControlPlaneUpgradeRolloutStrategy)(nil), SkipLoadBalancerDeployment:false}, WorkerNodeGroupConfigurations:[]v1alpha1.WorkerNodeGroupConfiguration{v1alpha1.WorkerNodeGroupConfiguration{Name:\"md-0\", Count:(*int)(0xc000f8ddc0), AutoScalingConfiguration:(*v1alpha1.AutoScalingConfiguration)(nil), MachineGroupRef:(*v1alpha1.Ref)(nil), Taints:[]v1.Taint(nil), Labels:map[string]string(nil), UpgradeRolloutStrategy:(*v1alpha1.WorkerNodesUpgradeRolloutStrategy)(nil), KubernetesVersion:(*v1alpha1.KubernetesVersion)(nil)}}, DatacenterRef:v1alpha1.Ref{Kind:\"DockerDatacenterConfig\", Name:\"eksa-test-ae88bec\"}, IdentityProviderRefs:[]v1alpha1.Ref(nil), GitOpsRef:(*v1alpha1.Ref)(0xc000a6e040), ClusterNetwork:v1alpha1.ClusterNetwork{Pods:v1alpha1.Pods{CidrBlocks:[]string{\"192.168.0.0/16\"}}, Services:v1alpha1.Services{CidrBlocks:[]string{\"10.96.0.0/12\"}}, CNI:\"\", CNIConfig:(*v1alpha1.CNIConfig)(0xc000fc7a20), DNS:v1alpha1.DNS{ResolvConf:(*v1alpha1.ResolvConf)(nil)}, Nodes:(*v1alpha1.Nodes)(nil)}, ExternalEtcdConfiguration:(*v1alpha1.ExternalEtcdConfiguration)(0xc000fc7a40), ProxyConfiguration:(*v1alpha1.ProxyConfiguration)(nil), RegistryMirrorConfiguration:(*v1alpha1.RegistryMirrorConfiguration)(nil), ManagementCluster:v1alpha1.ManagementCluster{Name:\"eksa-test-ae88bec\"}, PodIAMConfig:(*v1alpha1.PodIAMConfig)(nil), Packages:(*v1alpha1.PackageConfiguration)(nil), BundlesRef:(*v1alpha1.BundlesRef)(0xc000fce960), EksaVersion:(*v1alpha1.EksaVersion)(0xc000fc7a30)}: cannot pass both bundlesRef and eksaVersion. New clusters should use eksaVersion instead of bundlesRef", Reason:"Invalid", Details:(*v1.StatusDetails)(0xc016d9d260), Code:422}}
This is strange because, in the original submission of the Cluster resource yaml, bundlesRef was nil.
During upgrade, the CLI attempts to force apply
k --force apply
the Cluster spec at the "Applying eksa yaml resources to cluster" step. If there is some kind of validation error in the cluster webook at this point, it will stall indefinitely until it eventually times out. Normally, thek --force apply
would delete the resource and just recreate it, but instead it the CLI gets stuck. This situation has primarily been observed when using Flux.At this stage, the eksa-controller-manager outputs the following logs:
Example Scenario
When upgrading from the latest minor release to v0.17.0, the CLI stalled while "Applying eksa yaml resources to cluster". In this case, both the newly added
eksaVersion
field and thebundlesRef
were populated in submission to the kubeapi server, however, there is a web-hook validation that does not allow both to be set at the same time, so that was throwing an error in the kubeapi server pod. This issue was roughly patched by creatingClusterSpecGenerate
so that we could generate a yaml wherebundlesRef
null so that it would not be omitted when submitted to the kubeapi server.kubeapi-server pod
This is strange because, in the original submission of the Cluster resource yaml, bundlesRef was nil.
cluster.yaml
The text was updated successfully, but these errors were encountered: