-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for apiserver to be updated or restart kubelet on timeout #3818
Wait for apiserver to be updated or restart kubelet on timeout #3818
Conversation
Hello gdemonet,My role is to assist you with the merge of this Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
|
4aed7e7
to
0f70568
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
|
0f70568
to
f2a93d5
Compare
When waiting for kube-apiserver to restart, we can't rely on K8s API to give us a proper status (in single node, it would not be reachable). However, we have observed that sometimes, changing its manifest wouldn't trigger a restart by kubelet, which leads to a broken state where nothing converges (again, in single node). So we add simple utility methods to the `cri` module, and use it in `metalk8s.kubernetes.apiserver.installed` to ensure we at least attempt to restart kubelet if nothing moved after a while.
f2a93d5
to
af995d4
Compare
/approve |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
The following options are set: approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if current_id and current_id != last_id: | ||
return True | ||
remaining = timeout + start_time - time.time() | ||
if remaining < sleep: # Don't sleep if we know it's going to time out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Habile bill
raise CommandExecutionError( | ||
f"Unable to stop pods with labels {selector}:\n" | ||
f"IDS: {pod_ids}\nSTDERR: {out['stderr']}\nSTDOUT: {out['stdout']}" | ||
f"Pod {name} was not {verb} after {(time.time() - start_time):.0f} seconds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not use timeout variable directly ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may exit earlier in case timeout
is not a multiple of sleep
😇
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue None. Goodbye gdemonet. |
This will ensure that if kubelet gets stuck and doesn't restart k-a after an update of its manifest, we can detect the situation and restart kubelet.