Skip to content

Commit

Permalink
etcd: throttle restart for availability (#11677)
Browse files Browse the repository at this point in the history
* etcd: throttle restart for availability

During upgrade, etcd member are restarted all at once.
This can impact the availability of the etcd cluster and subsequently of
the Kubernetes cluster.

Limit the concurrent restart so that the etcd cluster can keep quorum.

* Simplify etcd handlers
  • Loading branch information
VannTen authored Nov 5, 2024
1 parent e62e6d2 commit f276bc8
Showing 1 changed file with 11 additions and 12 deletions.
23 changes: 11 additions & 12 deletions roles/etcd/handlers/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,25 @@
- name: Backup etcd
import_tasks: backup.yml

- name: Etcd | reload systemd
- name: Restart etcd
systemd_service:
daemon_reload: true
listen:
- Restart etcd
- Restart etcd-events

- name: Reload etcd
service:
name: etcd
state: restarted
daemon_reload: true
when: ('etcd' in group_names)
listen: Restart etcd
throttle: "{{ groups['etcd'] | length // 2 }}"
# Etcd cluster MUST have an odd number of members
# Truncated integer division by 2 will always return (majority - 1) which
# means the cluster will keep quorum and stay available

- name: Reload etcd-events
service:
- name: Restart etcd-events
systemd_service:
name: etcd-events
state: restarted
daemon_reload: true
# TODO: this seems odd. etcd-events should be a different group possibly ?
when: ('etcd' in group_names)
listen: Restart etcd-events
throttle: "{{ groups['etcd'] | length // 2 }}"

- name: Wait for etcd up
uri:
Expand Down

0 comments on commit f276bc8

Please sign in to comment.