You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@4n4nd, I can reproduce this exactly as you outlined above. This seems like a bug and something the operator should be able to recover from. Unfortunately, I don't have any free cycles to try to dig deeper into the issue at the moment. I'll try to make some time next week.
For further context, if you remove the sleep or bump it up higher (tried 30s) things come back as you'd expect. So there's probably a race condition that's coming into play here.
I believe what's happening is, when the operator starts to bring up new pods, there are pods still terminating. So, the operator makes the new pods join the old cluster, but by the time these new pods are ready the old pods are deleted. Essentially instead of initializing a whole new cluster, it tries to join the old cluster and fails. This leads to no hash slots being assigned to the new pods and hence they get stuck in a not ready state.
Redis Cluster is not able to recover after the redis node pods are deleted sequentially. Example command:
After new pods are spawned, they fail the readiness probe:
The text was updated successfully, but these errors were encountered: