Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd: throttle restart for availability #11677

Merged

Conversation

VannTen
Copy link
Contributor

@VannTen VannTen commented Oct 31, 2024

What type of PR is this?
/kind bug

What this PR does / why we need it:
During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.

Limit the concurrent restart so that the etcd cluster can keep quorum.

Which issue(s) this PR fixes:
Fixes #11645

Does this PR introduce a user-facing change?:

HA etcd cluster keeps quorum during upgrades.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Oct 31, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VannTen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 31, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Oct 31, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 31, 2024
During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.

Limit the concurrent restart so that the etcd cluster can keep quorum.
@VannTen VannTen force-pushed the fix/etcd_upgrade_availability branch from 223bba6 to b113d91 Compare October 31, 2024 12:56
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 31, 2024
when: ('etcd' in group_names)
listen: Restart etcd
throttle: "{{ groups['etcd'] | length // 2 }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greate Idea 👍

@yankay
Copy link
Member

yankay commented Nov 5, 2024

Thanks @VannTen
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 5, 2024
@k8s-ci-robot k8s-ci-robot merged commit 0f0e24b into kubernetes-sigs:master Nov 5, 2024
40 checks passed
kpoxo6op pushed a commit to kpoxo6op/kubespray that referenced this pull request Dec 27, 2024
* etcd: throttle restart for availability

During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.

Limit the concurrent restart so that the etcd cluster can keep quorum.

* Simplify etcd handlers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Discussion: Etcd scaling up process
3 participants