-
Notifications
You must be signed in to change notification settings - Fork 519
test: add REBOOT_CONTROL_PLANE_NODES E2E config #3745
Conversation
78c985a
to
c862d24
Compare
Codecov Report
@@ Coverage Diff @@
## master #3745 +/- ##
=======================================
Coverage 73.20% 73.20%
=======================================
Files 148 148
Lines 25372 25372
=======================================
Hits 18573 18573
Misses 5663 5663
Partials 1136 1136 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this - the minor change for the --request-timeout is something I had forgotten to include originally. It can help address kubectl hanging when something goes wrong (like another master rebooting and it happens to be the one that the kubectl is trying to talk to)
- bash | ||
- -c | ||
- >- | ||
while [[ $(kubectl annotate namespace ${LOCK_NS} ${LOCK_NAME}=${NODE_ID} 2>&1) != *\(${NODE_ID}\)* ]]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor addition - all of the kubectl command should also have --request-timeout 30s (or pick your number) since the reboots of another node may hang the kubectl command otherwise.
For example:
kubectl --request-timeout 30s annotate namespace ${LOCK_NS} ${LOCK_NAME}=${NODE_ID}
/lgtm |
@Michael-Sinz: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, Michael-Sinz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cff871e
to
8938ff4
Compare
Reason for Change:
This PR adds a
REBOOT_CONTROL_PLANE_NODES
option, and related daemonset, to randomly reboot control plane nodes during cluster lifecycle, to ensure that the cluster operates without side effects in such scenarios.Issue Fixed:
Requirements:
Notes: