Improve slot migration reliability #21

PingXie · 2024-03-25T03:36:29Z

Re-sharding in OSS Redis clusters is a risky operation, lacking both high availability and eventual consistency in the design. This is a pain point for many users. Mitigations were previously discussed (see pull request redis/redis#10517), and I believe it's important to resurrect this conversation within this fork. These mitigations could provide much-needed relief while we work towards a more robust long-term solution. I'd like to hear the community's thoughts on the feasibility of these mitigations and how they could benefit the users.

@zuiderkwast @soloestoy @madolson

zuiderkwast · 2024-03-25T09:54:56Z

Definitely. Another important point is that reading from replicas is broken. A replica doesn't know about ongoing migrations, so it can't return ASK redirects.

The only thing we didn't agree about is whether the SETSLOT can be replicated in the replication stream or if it should be done in the cluster bus.

If it's done in the replication stream, the ordering of SETSLOT and the commands executed before SETSLOT and after SETSLOT all come the right order. We can't achieve that with the cluster bus.

After READONLY, make a cluster replica behave as its primary regarding returning ASK redirects and TRYAGAIN. Without this patch, a client reading from a replica cannot tell if a key doesn't exist or if it has already been migrated to another shard as part of an ongoing slot migration. Therefore, without an ASK redirect in this situation, offloading reads to cluster replicas wasn't reliable. Note: The target of a redirect is always a primary. If a client wants to continue reading from a replica after following a redirect, it needs to figure out the replicas of that new primary using CLUSTER SHARDS or similar. This is related to #21 and has been made possible by the introduction of Replication of Slot Migration States in #445. ---- Release notes: During cluster slot migration, replicas are able to return -ASK redirects and -TRYAGAIN. --------- Signed-off-by: Viktor Söderqvist <[email protected]>

zuiderkwast added the cluster label Mar 25, 2024

PingXie mentioned this issue Apr 6, 2024

Slot migration improvement #245

Closed

PingXie self-assigned this Apr 27, 2024

PingXie mentioned this issue May 7, 2024

Slot migration improvement #445

Merged

PingXie closed this as completed in #445 May 7, 2024

This was referenced May 13, 2024

Make cluster replicas return ASK and TRYAGAIN #495

Merged

Wishlist #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve slot migration reliability #21

Improve slot migration reliability #21

PingXie commented Mar 25, 2024

zuiderkwast commented Mar 25, 2024

Improve slot migration reliability #21

Improve slot migration reliability #21

Comments

PingXie commented Mar 25, 2024

zuiderkwast commented Mar 25, 2024