Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for apiserver to be updated or restart kubelet on timeout #3818

Merged

Conversation

gdemonet
Copy link
Contributor

@gdemonet gdemonet commented Jul 8, 2022

This will ensure that if kubelet gets stuck and doesn't restart k-a after an update of its manifest, we can detect the situation and restart kubelet.

@bert-e
Copy link
Contributor

bert-e commented Jul 8, 2022

Hello gdemonet,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Jul 8, 2022

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

@gdemonet gdemonet force-pushed the bugfix/wait-for-apiserver-or-restart-kubelet branch 2 times, most recently from 4aed7e7 to 0f70568 Compare July 12, 2022 08:10
@bert-e
Copy link
Contributor

bert-e commented Jul 12, 2022

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

@gdemonet gdemonet force-pushed the bugfix/wait-for-apiserver-or-restart-kubelet branch from 0f70568 to f2a93d5 Compare July 12, 2022 14:27
When waiting for kube-apiserver to restart, we can't rely on K8s API to
give us a proper status (in single node, it would not be reachable).

However, we have observed that sometimes, changing its manifest wouldn't
trigger a restart by kubelet, which leads to a broken state where
nothing converges (again, in single node).

So we add simple utility methods to the `cri` module, and use it in
`metalk8s.kubernetes.apiserver.installed` to ensure we at least attempt
to restart kubelet if nothing moved after a while.
@gdemonet gdemonet force-pushed the bugfix/wait-for-apiserver-or-restart-kubelet branch from f2a93d5 to af995d4 Compare July 13, 2022 07:16
@gdemonet gdemonet marked this pull request as ready for review July 13, 2022 07:17
@gdemonet gdemonet requested a review from a team as a code owner July 13, 2022 07:17
@gdemonet
Copy link
Contributor Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Jul 13, 2022

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

The following options are set: approve

Copy link
Collaborator

@TeddyAndrieux TeddyAndrieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if current_id and current_id != last_id:
return True
remaining = timeout + start_time - time.time()
if remaining < sleep: # Don't sleep if we know it's going to time out
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Habile bill

raise CommandExecutionError(
f"Unable to stop pods with labels {selector}:\n"
f"IDS: {pod_ids}\nSTDERR: {out['stderr']}\nSTDOUT: {out['stdout']}"
f"Pod {name} was not {verb} after {(time.time() - start_time):.0f} seconds"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not use timeout variable directly ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may exit earlier in case timeout is not a multiple of sleep 😇

@bert-e
Copy link
Contributor

bert-e commented Jul 13, 2022

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/123.0

The following branches will NOT be impacted:

  • development/2.0
  • development/2.1
  • development/2.10
  • development/2.11
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Jul 13, 2022

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/123.0

The following branches have NOT changed:

  • development/2.0
  • development/2.1
  • development/2.10
  • development/2.11
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye gdemonet.

@bert-e bert-e merged commit 59a8b2c into development/123.0 Jul 13, 2022
@bert-e bert-e deleted the bugfix/wait-for-apiserver-or-restart-kubelet branch July 13, 2022 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants