Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Sending semi-sync ACKs from Rdonly type tablets #13606

Closed
GuptaManan100 opened this issue Jul 25, 2023 · 3 comments · Fixed by #13698
Closed

RFC: Sending semi-sync ACKs from Rdonly type tablets #13606

GuptaManan100 opened this issue Jul 25, 2023 · 3 comments · Fixed by #13698

Comments

@GuptaManan100
Copy link
Member

GuptaManan100 commented Jul 25, 2023

Description

Up until now, we have always configured RDONLY tablets to not send semi-sync ACKs to the primary tablets. Both the in-built durability policies semi_sync and cross_cell configure the cluster in this way.

It looks like the reason we used to do this was because ERS didn't have a 2 step promotion process before and if a RDONLY tablet was the one sending semi-sync ACKs and a primary went down, it could have been the only tablet that had a committed write that the user received a Success for. However, ERS doesn't allow promotion of RDONLY tablets.

This limitation is no longer valid for us since ERS now has a 2 step promotion process. Even if a RDONLY is the most advanced, we now just get a REPLICA tablet to replicate from it.

A secondary reason was that this was the only way to get cross-cell durability with semi-sync. Now that we have a cross-cell durability policy, we no longer need to rely on "no ACKs from RDONLY" to get that.

Advantages

Allowing RDONLY tablets to send semi-sync ACKs will let users run semi-sync in the cluster that only has a PRIMARY, REPLICA and RDONLY tablet (like in our examples) and still tolerate a PRIMARY failure.

@timvaillancourt
Copy link
Contributor

One benefit we see in our production with the current behaviour is reducing the volume/overhead of semi-sync ACKs on the PRIMARY

In our configuration (more than just 1 x PRIMARY, REPLICA, RDONLY) we might not see a benefit to RDONLY sending ACKs. But I can see how this would help in some scenarios

Another thought: in my experience RDONLY tablets are often used for heavy workloads that can cause them to struggle to replicate. In these situations their ack is less likely to be useful to the PRIMARY

@GuptaManan100
Copy link
Member Author

@timvaillancourt We had a discussion and we realized that sending semi-sync ACKs from RDONLY tablets has both pros and cons:

  • If the RDONLY tablets have heavy traffic, then sending semi-sync ACKs, will cause replication lag to increase further and may degrade performance.
  • On the other hand, if all your replicas die, then having semi-sync ACKs from rdonly allows the shard to continue accepting writes, until REPLICA tablets can come back up.

For this reason, we propose not changing the current durability policies, but to introduce 2 more. This gives the users the flexibility to choose whatever configuration they want to run in based on their own use-case and load distribution.

The new policies can be named semi_sync_with_rdonly_ack and cross_cell_with_rdonly_ack.

We can then rework the example to use semi_sync_with_rdonly_ack.

What do you think about this?

@shlomi-noach
Copy link
Contributor

My thoughts: we should allow semi-sync acks from RDONLY per the original comment.

Discussion: if the PRIMARY dies, then vtorc looks at all replicas and figures out which is the most up-to-date; it either promotes it, or runs a two-step promotion in case that replica is non-promotable (RDONLY). The important observation is: this happens whether the RDONLY table sends semi-sync acks or not. The promotion process does not care about semi-sync acks. It does care whether some replica is lagging too much, it does care whether some replica is non-promotable.

If the RDONLY tablets have heavy traffic, then sending semi-sync ACKs, will cause replication lag to increase further and may degrade performance.

The overhead for enabling semi-sync is IMHO negligible. It's two more bits to each downstream event, plus a small ack upstream. The impact to the IO thread should be unnoticeable. And there is no impact to the SQL thread - so enabling semi-sync does not cause a replica to lag more, irrespective of whether the replica is overloaded or not.

On the other hand, if all your replicas die, then having semi-sync ACKs from rdonly allows the shard to continue accepting writes, until REPLICA tablets can come back up.

Agreed and that is an important advantage on small shards.

For this reason, we propose not changing the current durability policies, but to introduce 2 more.

With this suggestion there is no impact to existing users, it's configurable and opt-in, and I'm in favor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants