[bug] update too many pods at the same time. #310

runkecheng · 2021-11-25T03:03:45Z

Describe the problem

There are two nodes(3 nodes cluster) to be deleted and restarted when updating configuration. its leads to a cluster to be temporarily unavailable, The correct situation should be only one node update at the same time.

To Reproduce

The default PodDisruptionBudget is 50%, so the minimum available node of the 3-node cluster is 2, but in fact, when the configuration is updated, the two nodes will update at once, which does not match the PDB.

root cause:

The StatefulSetUpdateStrategy is OnDelete, delete POD by logic as follows，

	if pod.ObjectMeta.Labels["controller-revision-hash"] == s.sfs.Status.UpdateRevision {
		log.Info("pod is already updated", "pod name", pod.Name)
	} else {
                ...
		if pod.DeletionTimestamp != nil {
			log.Info("pod is being deleted", "pod", pod.Name, "key", s.Unwrap())
		} else {
			if err := s.cli.Delete(ctx, pod); err != nil {
				return err
			}
		}
	}

after delete a pod, retry will exit in advance because the health tag of the node being deleted is still yes. The correct logic is waiting for the deleted POD re-readiness and update the next POD.

	if pod.ObjectMeta.Labels["healthy"] == "yes" &&
		pod.ObjectMeta.Labels["controller-revision-hash"] != s.sfs.Status.UpdateRevision {
		return false, fmt.Errorf("pod %s is ready, wait next schedule", pod.Name)
	}

Expected behavior

Environment:

RadonDB MySQL version:

The text was updated successfully, but these errors were encountered:

runkecheng · 2021-11-25T03:10:01Z

syncer: Set the health label to `no` before updating pod. #310

runkecheng · 2022-05-27T07:56:08Z

The pod obtained in Retry () may not be the latest.

Need to check the additional DeletionTimestamp.

If the pod is deleted, health should be no and skipped other checks.

The POD information obtained in Retry () may be old, add the logic of retry(): Whether the POD is being deleted, if true, try it again. fix radondb#310

syncer: Set the health label to `no` before updating pod. radondb#310

runkecheng added the bug Something isn't working label Nov 25, 2021

runkecheng changed the title ~~[bug] 某些情况pdb不能生效，导致更新配置时~~ [bug] The number of nodes that are upgraded simultaneously are incorrect when scrolling updates Nov 25, 2021

runkecheng self-assigned this Nov 25, 2021

runkecheng added this to the Next milestone Nov 25, 2021

runkecheng changed the title ~~[bug] The number of nodes that are upgraded simultaneously are incorrect when scrolling updates~~ [bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates 不正确的 Nov 25, 2021

runkecheng changed the title ~~[bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates 不正确的~~ [bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates. Nov 25, 2021

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 25, 2021

syncer: Set the health label to no when the node is deleted. radondb#310

82a77de

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 25, 2021

syncer: Set the health label to no when the node is deleted. radondb#310

6f36568

runkecheng mentioned this issue Nov 25, 2021

syncer: Set the health label to no before updating pod. #310 #314

Merged

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 25, 2021

syncer: Set the health label to no when the node is deleted. radondb#310

20be34d

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 29, 2021

syncer: Set the health label to no when the node is deleted. radondb#310

4314fa1

runkecheng changed the title ~~[bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates.~~ [bug] The number of nodes that are upgraded simultaneously incorrect when updating pods. Nov 29, 2021

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 29, 2021

syncer: Set the health label to no before updating pod. radondb#310

bf857f0

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 29, 2021

syncer: Set the health label to no before updating pod. radondb#310

ffb284a

andyli029 closed this as completed in #314 Dec 1, 2021

andyli029 added a commit that referenced this issue Dec 1, 2021

Merge pull request #314 from runkecheng/fix_deleting_node_label

20f6275

syncer: Set the health label to `no` before updating pod. #310

runkecheng reopened this May 27, 2022

runkecheng changed the title ~~[bug] The number of nodes that are upgraded simultaneously incorrect when updating pods.~~ [bug] update too many pods at the same time. May 27, 2022

runkecheng mentioned this issue May 27, 2022

syncer: Optimize the logic of updating pod. #503

Merged

zhl003 pushed a commit to zhl003/radondb-mysql-kubernetes that referenced this issue Aug 17, 2022

syncer: Set the health label to no before updating pod. radondb#310

448bb75

zhl003 pushed a commit to zhl003/radondb-mysql-kubernetes that referenced this issue Aug 17, 2022

Merge pull request radondb#314 from runkecheng/fix_deleting_node_label

78ec4ec

syncer: Set the health label to `no` before updating pod. radondb#310

runkecheng closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] update too many pods at the same time. #310

[bug] update too many pods at the same time. #310

runkecheng commented Nov 25, 2021 •

edited

Loading

runkecheng commented Nov 25, 2021

runkecheng commented May 27, 2022 •

edited

Loading

[bug] update too many pods at the same time. #310

[bug] update too many pods at the same time. #310

Comments

runkecheng commented Nov 25, 2021 • edited Loading

runkecheng commented Nov 25, 2021

runkecheng commented May 27, 2022 • edited Loading

runkecheng commented Nov 25, 2021 •

edited

Loading

runkecheng commented May 27, 2022 •

edited

Loading