Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve slot migration reliability #21

Closed
PingXie opened this issue Mar 25, 2024 · 1 comment · Fixed by #445
Closed

Improve slot migration reliability #21

PingXie opened this issue Mar 25, 2024 · 1 comment · Fixed by #445
Assignees
Labels

Comments

@PingXie
Copy link
Member

PingXie commented Mar 25, 2024

Re-sharding in OSS Redis clusters is a risky operation, lacking both high availability and eventual consistency in the design. This is a pain point for many users. Mitigations were previously discussed (see pull request redis/redis#10517), and I believe it's important to resurrect this conversation within this fork. These mitigations could provide much-needed relief while we work towards a more robust long-term solution. I'd like to hear the community's thoughts on the feasibility of these mitigations and how they could benefit the users.

@zuiderkwast @soloestoy @madolson

@zuiderkwast
Copy link
Contributor

Definitely. Another important point is that reading from replicas is broken. A replica doesn't know about ongoing migrations, so it can't return ASK redirects.

The only thing we didn't agree about is whether the SETSLOT can be replicated in the replication stream or if it should be done in the cluster bus.

If it's done in the replication stream, the ordering of SETSLOT and the commands executed before SETSLOT and after SETSLOT all come the right order. We can't achieve that with the cluster bus.

@PingXie PingXie self-assigned this Apr 27, 2024
This was referenced May 13, 2024
zuiderkwast added a commit that referenced this issue May 24, 2024
After READONLY, make a cluster replica behave as its primary regarding
returning ASK redirects and TRYAGAIN.

Without this patch, a client reading from a replica cannot tell if a key
doesn't exist or if it has already been migrated to another shard as
part of an ongoing slot migration. Therefore, without an ASK redirect in
this situation, offloading reads to cluster replicas wasn't reliable.

Note: The target of a redirect is always a primary. If a client wants to
continue reading from a replica after following a redirect, it needs to
figure out the replicas of that new primary using CLUSTER SHARDS or
similar.

This is related to #21 and has been made possible by the introduction of
Replication of Slot Migration States in #445.

----

Release notes:

During cluster slot migration, replicas are able to return -ASK
redirects and -TRYAGAIN.

---------

Signed-off-by: Viktor Söderqvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants