You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to deploy EKS-A on an airgapped vSphere deployment and having little success. I am following the guides, but have not been able to get past the etcd node build. It looks like the kubeadm-bootstrap is failing with the following error:
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: -----END RSA PRIVATE KEY-----
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: } {Path:/run/cluster-api/placeholder Owner:root:root Permissions:0640 Content:This placeholder file is used to create the /run/cluster-api sub directory in a way that is compatible with both Linux and Windows (mkdir -p /run/cluster-api does not work with Windows)}] RunCmd:EtcdadmInit public.ecr.aws/eks-distro/etcd-io/etcd 3.5.15-eks-1-31-7 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256}
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: Using etcdadm support by CAPI
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: Writing userdata write files
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: Running etcdadm init phases
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: Running etcdadm init install phase
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: Phase command output:
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: --------
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: time="2024-12-02T16:45:14Z" level=info msg="[install] Removing existing data dir \"/var/lib/etcd/data\""
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: --------
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: Running etcdadm init certificates phase
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r systemd[1]: Stopping Host container: kubeadm-bootstrap...
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: time="2024-12-02T16:45:14Z" level=info msg="received signal: terminated"
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: time="2024-12-02T16:45:14Z" level=info msg="container task exited" code=143
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r host-containers@kubeadm-bootstrap[2774]: time="2024-12-02T16:45:14Z" level=fatal msg="Container kubeadm-bootstrap exited with non-zero status"
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r systemd[1]: [email protected]: Main process exited, code=exited, status=1/FAILURE
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r systemd[1]: [email protected]: Failed with result 'exit-code'.
Dec 02 16:45:14 xxx-mgmt-etcd-7q68r systemd[1]: Stopped Host container: kubeadm-bootstrap.
This appears to be at the 'etcdadm init certificates phase` with the container failing. This then means none of the other etcd nodes can join (as they cannot find the cert server) and the cluster creation fails.
I have tried this on a number of different versions of eks-anywhere:
v0.20.0 (bundle 68)
v0.20.9 (bundle 78)
v0.21.1 (bundle 83)
Sadly I dont know enough about how these certs are generated or the internals of the bootstrap-container, but each attempted creation appears to fail at the same stage.
What you expected to happen:
The initialisation completes and the etcd nodes are healthy.
How to reproduce it (as minimally and precisely as possible):
So I eventually managed to track down the issue - through grepping through the logs, I managed to find the following errors:
Dec 10 16:07:36 localhost systemd-tmpfiles[1652]: Reading config file "/x86_64-bottlerocket-linux-gnu/sys-root/usr/lib/tmpfiles.d/release-ca-certificates.conf"…
Dec 10 16:08:00 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[1838]: Running etcdadm init certificates phase
Dec 10 16:08:00 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[1838]: time="2024-12-10T16:08:00Z" level=info msg="[certificates] creating PKI assets"
Dec 10 16:08:00 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[1838]: time="2024-12-10T16:08:00Z" level=fatal msg="[certificates] failed creating PKI assets: failure loading ca certificate: the certificate is not valid yet"
Dec 10 16:08:00 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[1838]: Error running bootstrapper cmd: error running etcdadm phase 'init certificates', out:
Dec 10 16:08:00 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[1838]: time="2024-12-10T16:08:00Z" level=info msg="[certificates] creating PKI assets"
Dec 10 16:08:00 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[1838]: time="2024-12-10T16:08:00Z" level=fatal msg="[certificates] failed creating PKI assets: failure loading ca certificate: the certificate is not valid yet"
Dec 10 16:08:57 xxx-mgmt-etcd-74dk8 host-containers@kubeadm-bootstrap[2474]: Running etcdadm init certificates phase
As this was a disconnected cluster this lead me to the fact the time was out of sync - the admin machine I was using was 8 minutes ahead of the ESXI hosts and therefore the certificate was not being accepted and causing the crash.
I dont know if this is possible to either add to the airgapped requirements / troubleshooting so that this could possibly be identified by others in the future.
What happened:
I have been trying to deploy EKS-A on an airgapped vSphere deployment and having little success. I am following the guides, but have not been able to get past the etcd node build. It looks like the kubeadm-bootstrap is failing with the following error:
This appears to be at the 'etcdadm init certificates phase` with the container failing. This then means none of the other etcd nodes can join (as they cannot find the cert server) and the cluster creation fails.
I have tried this on a number of different versions of eks-anywhere:
Sadly I dont know enough about how these certs are generated or the internals of the bootstrap-container, but each attempted creation appears to fail at the same stage.
What you expected to happen:
The initialisation completes and the etcd nodes are healthy.
How to reproduce it (as minimally and precisely as possible):
To stick with the latest version of eks-a:
This is the mgmt cluster configuration:
Anything else we need to know?:
Environment:
The text was updated successfully, but these errors were encountered: