-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ubuntu 20.04 - Nodes sometime fails to come up due to package install issues? #9180
Comments
/assign |
This is a slightly truncated version of the cluster spec
|
Docker just added new repository for Focal, we can now use the official package instead of Bionic version. |
Thanks for the update and for all the effort on this @paalkr @andersosthus . |
@hakman: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.17-beta2
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.1.17.5
3. What cloud provider are you using?
AWS
Monday 25th of May we started noticing problems with new nodes added to our clusters. When adding nodes, some nodes comes up just fine, while some doesn't come up at all.
When investigating the failed nodes, we see what appears to be the
kops-configuration.service
being restarted in the middle of anapt-get install
command, resulting in a baddpkg
state that needs to be resolved withdpkg --configure -a
.The timeline on a nodeup looks like this:
kops-configuration.service: Main process exited, code=killed, status=15/TERM
)Right now it seems random if a node comes up or not.
Excerpt of the
kops-configuration.service
around the time the restart happens:We've had some discussions on Slack with @hakman about this, see this thread: https://kubernetes.slack.com/archives/C3QUFP0QM/p1590443302247200
If needed we can provide full logs, and we can also test fixes on this cluster.
The text was updated successfully, but these errors were encountered: