Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] update too many pods at the same time. #310

Closed
runkecheng opened this issue Nov 25, 2021 · 2 comments · Fixed by #314
Closed

[bug] update too many pods at the same time. #310

runkecheng opened this issue Nov 25, 2021 · 2 comments · Fixed by #314
Assignees
Labels
bug Something isn't working
Milestone

Comments

@runkecheng
Copy link
Collaborator

runkecheng commented Nov 25, 2021

Describe the problem

There are two nodes(3 nodes cluster) to be deleted and restarted when updating configuration. its leads to a cluster to be temporarily unavailable, The correct situation should be only one node update at the same time.

To Reproduce

The default PodDisruptionBudget is 50%, so the minimum available node of the 3-node cluster is 2, but in fact, when the configuration is updated, the two nodes will update at once, which does not match the PDB.

root cause:

The StatefulSetUpdateStrategy is OnDelete, delete POD by logic as follows,

	if pod.ObjectMeta.Labels["controller-revision-hash"] == s.sfs.Status.UpdateRevision {
		log.Info("pod is already updated", "pod name", pod.Name)
	} else {
                ...
		if pod.DeletionTimestamp != nil {
			log.Info("pod is being deleted", "pod", pod.Name, "key", s.Unwrap())
		} else {
			if err := s.cli.Delete(ctx, pod); err != nil {
				return err
			}
		}
	}

after delete a pod, retry will exit in advance because the health tag of the node being deleted is still yes. The correct logic is waiting for the deleted POD re-readiness and update the next POD.

	if pod.ObjectMeta.Labels["healthy"] == "yes" &&
		pod.ObjectMeta.Labels["controller-revision-hash"] != s.sfs.Status.UpdateRevision {
		return false, fmt.Errorf("pod %s is ready, wait next schedule", pod.Name)
	}

Expected behavior

Environment:

  • RadonDB MySQL version:
@runkecheng runkecheng added the bug Something isn't working label Nov 25, 2021
@runkecheng runkecheng changed the title [bug] 某些情况pdb不能生效,导致更新配置时 [bug] The number of nodes that are upgraded simultaneously are incorrect when scrolling updates Nov 25, 2021
@runkecheng runkecheng self-assigned this Nov 25, 2021
@runkecheng runkecheng added this to the Next milestone Nov 25, 2021
@runkecheng
Copy link
Collaborator Author

d9397238b3b35fae7f01ee8d4915742

@runkecheng runkecheng changed the title [bug] The number of nodes that are upgraded simultaneously are incorrect when scrolling updates [bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates 不正确的 Nov 25, 2021
@runkecheng runkecheng changed the title [bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates 不正确的 [bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates. Nov 25, 2021
runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 25, 2021
runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 25, 2021
runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 25, 2021
runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 29, 2021
@runkecheng runkecheng changed the title [bug] The number of nodes that are upgraded simultaneously incorrect when scrolling updates. [bug] The number of nodes that are upgraded simultaneously incorrect when updating pods. Nov 29, 2021
runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 29, 2021
runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue Nov 29, 2021
andyli029 added a commit that referenced this issue Dec 1, 2021
syncer: Set the health label to `no` before updating pod. #310
@runkecheng runkecheng reopened this May 27, 2022
@runkecheng runkecheng changed the title [bug] The number of nodes that are upgraded simultaneously incorrect when updating pods. [bug] update too many pods at the same time. May 27, 2022
@runkecheng
Copy link
Collaborator Author

runkecheng commented May 27, 2022

The pod obtained in Retry () may not be the latest.

Need to check the additional DeletionTimestamp.

If the pod is deleted, health should be no and skipped other checks.

f64ff95437357b06e85e36498fb144a

runkecheng added a commit to runkecheng/radondb-mysql-kubernetes that referenced this issue May 27, 2022
The POD information obtained in Retry () may be old,
add the logic of retry(): Whether the POD is being deleted,
if true, try it again.

fix radondb#310
zhl003 pushed a commit to zhl003/radondb-mysql-kubernetes that referenced this issue Aug 17, 2022
zhl003 pushed a commit to zhl003/radondb-mysql-kubernetes that referenced this issue Aug 17, 2022
syncer: Set the health label to `no` before updating pod. radondb#310
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant