Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: VReplication throttling #7362

Closed
shlomi-noach opened this issue Jan 24, 2021 · 7 comments
Closed

Tracking issue: VReplication throttling #7362

shlomi-noach opened this issue Jan 24, 2021 · 7 comments

Comments

@shlomi-noach
Copy link
Contributor

We wish to introduce throttling in VReplication to avoid overwhelming the databases with reads/writes.

VReplication's current behavior is greedy: if it can read table data, it will read it and push downstream. If it can read binary logs, it will, and push downstream. If it can pull wither table data or binary logs from upstream, it will, and write them onto the database.

In the current design, the source engine and the target engine throttle one another, at the rate of their respective database capabilities. As example, assume the source is a replica tablet, and the target is a primary tablet. The target tablet requests to pull data from the source. The source reads from the replica, pushes downstream; the target intercepts and writes to the backend primary MySQL. If the backend primary MySQL is too busy, that will push back the target tablet, which will in turn stop processing events from upstream, which will in turn throttle reading from the MySQL replica. Conversely, if the source replica is slow to respond, that dictates the pace of writes to the target primary.

However, on both source and target sides, operations are aggressive on the MySQL servers. The engines will read as fast as the replica allows, or write as fast as the target primary allows. This, unfortunately, does not take into consideration the overall health of the source and target shards. We want to avoid overwhelming source/target shards so as to keep them healthy.

Specifically, we want to throttle writes on target when those writes generate replication lags on the target shard. We want to throttle reads from the source replica when that replica itself is lagging. This will not only keep the shards in a health ystate, it also makes the cut-over safer & quicker, as at any point in time the lag between source and target is known to be small.

To that effect we first need to be able to throttle on a lagging replica (source side). Our existing table throttling only throttles on writes to a cluster's primary.
We will introduce lag-based "check-self" throttle check on all tablets.

Some work has begun, illustrated in next comments.

cc @rohit-nayak-ps

@shlomi-noach
Copy link
Contributor Author

#7319 introduces a /throttler/check-self test, where each tablet (primary, replica, any) runs its own throttler mechanism to self check its own lag. This is true also for primaries as illustrated in the PR comment.

@shlomi-noach
Copy link
Contributor Author

#7324 extends #7319 and adds source-side throttling based on /throttler/check-self.

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Jan 27, 2021

Code-wise this is complete now that #7324 and #7364 are merged.

Documentation pending.

@shlomi-noach
Copy link
Contributor Author

Documentation: vitessio/website#689

@shlomi-noach
Copy link
Contributor Author

code and documentation are merged, functionality is complete for now and no plan for further work at this time.

@mattlord
Copy link
Contributor

mattlord commented Sep 25, 2024

@shlomi-noach I'm going to close this as done for now. We can re-open it if needed. In that case, can you clarify what's left in relation to this issue? Thanks!

@shlomi-noach
Copy link
Contributor Author

This was definitely done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants