-
Notifications
You must be signed in to change notification settings - Fork 218
Wait for kube-system namespace before creating assets #368
Wait for kube-system namespace before creating assets #368
Conversation
Instead of trying to handle the NotFound error, check that the kube-system namespace exists before attempting to create assets.
Can one of the admins verify this patch? |
ok to test |
@pbx0 seems that rktbot is being a bit spammy |
The tests seem fine. Quay 404'd when running bootkube on the destruct test via rkt for whatever reason. I'll re-run them now. @aaron: I'm not sure how to solve the spam problem. We have 4 different jenkins jobs using the ghprb plugin, and even though only 1 automatically wants to run, all 4 want to spam the thread and I don't know how to turn it off without turning off those jobs yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about error behavior - but otherwise this is looking good. Thank you for the fix!
pkg/bootkube/create.go
Outdated
UserOutput("For example, after resolving issues: kubectl create -f <failed-manifest>\n\n") | ||
} | ||
return true, nil | ||
return true, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We weren't returning an error here before because there had been situations where an asset wasn't created - but it really wasn't a failure case, per-se (or you could actually recover it normally). An example of this was the dns service failed to be created because another service was assigned the service IP it was expecting to use.
The question was, should we let the cluster continue to boot - then let an admin resolve the issue (e.g. change the service IP of the conflicting service, and re-create the DNS service) -- VS -- just outright failing.
My inclination would be to keep the behavior the same for now, unless we have a good reason to change it - so either log the error here and return nil, or log in caller, but don't bubble it up further. No strong feelings for either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - given that this is subtle behavior - it might make sense to add a comment here if we do keep this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the background! It wasn't my intent to change the behavior. I'll go back to returning nil and add a comment explaining why.
@aaronlevy I've update the error handling to go back to allowing boot to continue when asset creation fails, and left a comment explaining why we aren't bubbling up the error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test that is failing is a self-hosted etcd test which is known to be flaky at the moment.
Thank you for the fix!
Instead of trying to handle the NotFound error, check that the kube-system namespace exists before attempting to create assets.
This fixes #319.