-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
throttler: don't allocate any resources unless it is actually enabled #8643
Conversation
Signed-off-by: deepthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine.
The other thing is that we can increase the garbage-collect interval on most of these caches without affecting logic.
Out-of-band comment from @shlomi-noach:
I will make this change as well before merging the PR. |
We have the following:
What are reasonable values for the cleanup intervals that are currently 1 second or less? Should I change those to 10 seconds? There are two of them: |
Signed-off-by: deepthi <[email protected]>
Are we really OK with the increase for the timeouts/intervals in production systems? Vitess knows via its env when it's running in a local deployment, so maybe it would be wise to check that before increasing those time values, and only doing so in local environments with the purpose of reducing CPU usage. The increased accuracy is probably always worth it in production environments, and the increase in CPU usage there will be negligible. |
Both are fine to change to
Yes, allow me to explain. It's in how this particular cache implementation works. This cache is a KV map, where a value is a combination of:
Whenever you
Thus far the logic is sound and complete. There is no need for garbage-collecting in terms of correctness of data. Of course, caches can blow up with data, and the garbage collection, which is the topic of this discussion, is how the cache is cleaned up: the garbage collector iterates all items, computes This is useful in caches that have many items, where new keys ar ebeign introduced and are short-lived. However, in the lag-throttler caches, the number of items is very limited: lag computation cache size == number of servers in a shard (just a handful). Aggregated metrics == 2 (one for lag metric, one for primary metric). Cached check results size == number of app names the inquire for checks. This is also a handful ( Therefore, it's the same keys again and again, and there is no fear of bloating the cache size; hence, garbage collection can be very relaxed. It can probably be disabled altogether without impact; but a |
I'm convinced! 👍 |
Description
Even when
-enable_lag_throttler
is not set, various caches are created and checked at intervals.Specifically
nonLowPriorityAppRequestsThrottled
is checked every 100ms. This is wasteful.All of this is now gated by the flag. In addition to this, I moved various other initializations (like SelfChecks) to also be gated by the flag.
Related Issue(s)
Checklist
Deployment Notes