-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rate limiter to the Vertical Pod autoscaler updater component #2326
Conversation
Welcome @guillaumebreton! |
19844a2
to
4ae1ddd
Compare
@guillaumebreton thanks! This looks ok, I have some minor comments but first I wanted to clarify that this is what we want. |
@@ -144,6 +149,11 @@ func (u *updater) RunOnce() { | |||
if !evictionLimiter.CanEvict(pod) { | |||
continue | |||
} | |||
err := u.evictionRateLimiter.Wait(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait
always asks for one token, so it doesn't make much sense to set the burst rate to anything else than 1. If I'm correct then we should remove the eviction-rate-limit-burst
parameter.
Sorry, the golang.org/x/time/rate
documentation is not really clear. Looking at the code burst
is also the maximum number of tokens you can accrue over time. This is important as VPA couldn't suddenly evict lots of pods after not evicting anything for a long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without knowing the autoscaler well I'm a little bit worried if Wait
will block here for long time (e.g. minutes). The state of the cluster might change significantly during that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea for solving the Wait()/blocking problem:
- Let's collect all pods to be evicted in a list
- Randomize the list (to give all pods a fair chance to be evicted in a RunOnce run)
- Go through the list and evict a pod if
u.evictionRateLimiter.Allow()
returns true. - If
u.evictionRateLimiter.Allow()
returns false then return from the RunOnce function. Any remaining pods to be evicted will be retried in the next RunOnce run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your input 🙂
I'm not really sure why the Wait is an issue here :
- A context with timeout it passed to the Wait function so the Wait will be cancelled if the delay is larger than the tick interval.
- As far as I understand it. The state of the cluster is updated by the recommender, and the admission controller applies it. The updater is only responsible for eviction here, and when the pod restarts, the last value of the VPA are applied by the admission controller.
In your proposed solution, we would hit the rate limiter quickly and exit. So we would "waste" n seconds of the ticker loop doing nothing. Running on fresher data is an excellent idea, but the only option I see would be to refresh the pod list between every Wait, which doesn't seem ideal to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default tick for the updater is 1 minute, so as @guillaumebreton writes, the Wait will be canceled after that time at most (unless someone uses a custom tick). So we can basically look at info that is stale up to 1 minute, which is also true if updater has a lot of work to do. I'm inclined to agree this solution is acceptable.
As to randomizing the list of pods to evict - the updater has a priority mechanism to select which pods to evict first based on multiple factors (including how far away they are from their recommended resources). Shuffling the list wouldn't play well with that.
Yes, it does! 🙌 FYI @guillaumebreton and I are on the same team, so we've spoken about this offline 🙂 |
@milesbxf Thanks! In that case I'll get back to this early next week, appreciate your patience I am a bit short on time atm :) |
@bskiba No worries 👍 Thank you for reviewing it. |
func getRateLimiter(evictionRateLimit float64, evictionRateLimitBurst int) *rate.Limiter { | ||
var evictionRateLimiter *rate.Limiter | ||
if evictionRateLimit == -1 || evictionRateLimit == 0 { | ||
evictionRateLimiter = rate.NewLimiter(rate.Inf, evictionRateLimitBurst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't dived into the rate package docs, is this equivalent to having no rateLimiter at all? i.e. will the Wait() return immedately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is the Burst completely ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, if we set the rate limite to rate.Inf
the burst rate is completely ignored, and Wait will return immediately.
-> Documentation about burst being ignored if rate equals to Rate.inf https://github.com/golang/time/blob/master/rate/rate.go#L37
-> Wait() returning immediately : https://github.com/golang/time/blob/master/rate/rate.go#L307
}{ | ||
{0.0, 1, rate.NewLimiter(rate.Inf, 1)}, | ||
{-1.0, 2, rate.NewLimiter(rate.Inf, 2)}, | ||
{10.0, 3, rate.NewLimiter(rate.Every(time.Duration(1.0/10*float64(time.Second))), 3)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From @bandesz's comment above, looks like this can be simplified.
Thanks for your patience and apologies for the delay. I've added some comments. My main concern is to make sure that the current behavior stays the same (i.e. if we are not using the rate limiter). Other comments are mostly nit's |
3e88a62
to
e380765
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, only minor comments left. Can you address them and squash your commits into one.
Thanks for answering my questions and getting this done!
02cd198
to
b60628f
Compare
@bskiba What would be the next step with this PR ? Should I ask @jbartosik and @mwielgus for a review ? |
I'll review tomorrow, sorry for the wait! |
@bskiba No worries :) Github was showing you already approved it so I wasn't sure |
`Number of pods that can be evicted per seconds. A rate limit set to 0 or -1 will disable | ||
the rate limiter.`) | ||
|
||
evictionRateBurst = flag.Int("eviction-rate-limit-burst", 1, `Burst of pods that can be evicted.`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also rename the flag to eviction-rate-burst?
One last small comment (sorry, I missed this in the previous review :( ) |
This patch adds a rate limiter to the vertical pod autoscaler updater. It adds two flags - eviction-rate-limit to control the number of pods that can be evicted every seconds. - eviction-rate-limit-burst to control the burst of that can be evicted immediately.
d7b7727
to
59df00f
Compare
@bskiba Thank you 👍 I missed it and I fixed it. |
Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bskiba The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This patch adds a rate limiter to the vertical pod autoscaler updater.
It adds two flags
every seconds.
immediately.
fixes #2178