-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuring nginx worker reload behaviour, to prevent multiple concurrent worker reloads which can lead to high resource usage and OOMKill #10884
Conversation
…tiple concurrent worker reloads Signed-off-by: Rafael da Fonseca <[email protected]>
Welcome @rsafonseca! |
Hi @rsafonseca. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
✅ Deploy Preview for kubernetes-ingress-nginx canceled.
|
Signed-off-by: Rafael da Fonseca <[email protected]>
/ok-to-test |
internal/ingress/controller/nginx.go
Outdated
@@ -668,6 +672,11 @@ Error: %v | |||
// | |||
//nolint:gocritic // the cfg shouldn't be changed, and shouldn't be mutated by other processes while being rendered. | |||
func (n *NGINXController) OnUpdate(ingressCfg ingress.Configuration) error { | |||
concurrentlyReloadWorkers := n.store.GetBackendConfiguration().ConcurrentlyReloadWorkers | |||
if !concurrentlyReloadWorkers && n.workersReloading { | |||
return errors.New("worker reload already in progress, requeuing reload") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be an error, or just a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of returning an error is to pick up on the existing behaviour on the queue, which will re-queue in case of error here
ingress-nginx/internal/task/queue.go
Line 130 in 8227888
if err := t.sync(key); err != nil { |
@rsafonseca first of all, thank you for taking care of this long standing problem, this is great! As a suggestion, I'm not sure if using a single variable is the right approach here, you may still hit some issue on multiple reloads or something that causes a different behavior to replace the variable. Instead, and just thinking loud (I know it may not be an easy approach) what if when this feature is enabled, define a buffered channel for the reload that controls, as a semaphore, the reloads? I'm still not sure if it does make sense, but some channel that accepts just one item, and a reload would add an item on the queue (and block), and the checker process of workers that removes this item from the channel, unblocking any other attempts to reload. You can then control if a reload was already requested checking the length of this channel. @tao12345666333 you may have some better view then me here, as I honestly sucks on concurrency problems :) |
/triage accepted |
Also, another suggestion: As this is feature flagged (thanks!!!) let's try using positive behavior flag. Maybe instead of a flag that defaults to true, disabling the feature, something like a flag as "EnableSerialReloads = false" as default, and then if someone wants to enable it, set to true. Booleans can become hard to define behavior (and this is why a bunch of APIs uses pointers, so you have true, false and undefined) so at least it is clear that if someone wants to Enable this feature, they use something like an "enable-something" flag. This is better also for helm charts, etc. Thanks! |
Signed-off-by: Rafael da Fonseca <[email protected]>
Hi @rikatz, thanks for reviewing :) Regarding concurrency and using a single variable, i don't think this would be a problem in the way things are designed overall, but let's see if anyone else has any concern here. I don't think it's worth the trouble to create a channel just for this, as the existing queue system already has a re-queue mechanism which is being employed here. Regarding using a positive behaviour flag, that is a good point! I've refactored it and used your name suggestion there and added in a helm chart option for it as well :) |
Signed-off-by: Rafael da Fonseca <[email protected]>
/assign Thanks, I will take a look this week. |
Any news @tao12345666333 ? I'm waiting to get this merged so I can move off my fork and upgrade ;) |
Thank you for your contributions! BTW this PR need rebase |
rebased @tao12345666333 :) looks like pull-ingress-nginx-codegen job is failing, but it's unrelated to this PR |
/retest |
@rsafonseca as we have bumped a bunch of libraries, please rebase over main to be sure codegen job will run using golang v1.22 and the latest k8s versions during tests :) tks |
I did rebase 5 days ago @rikatz and my branch is still up to date with main. could that job be pulling from the main in my fork instead? I handn't synced that one, since it wasn't the branch i was merging, but i synced it now |
…elm chart option Signed-off-by: Rafael da Fonseca <[email protected]>
Signed-off-by: Rafael da Fonseca <[email protected]>
/test pull-ingress-nginx-codegen |
I have created a PR #11344 |
#11344 has been merged, let's do a rebase and the codegen will be passed |
Looks good @tao12345666333 , thanks for fixing this in main :) |
Any blockers for merging this now? Can we get it shipped? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rsafonseca, tao12345666333 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thank you for your contributions! @rsafonseca |
What this PR does / why we need it:
This PR introduces the ability to configure the concurrency behaviour of chained configuration updates that require a backend reload.
It does so by requeuing jobs that require a backend reload, whenever there's already a reload in progress, and verifies if the reload is finished by counting the number of workers, as per the configuration.
The default behaviour remains unchanged for now, and this behaviour can be configured via setting an option in the configmap.
The main purpose is to prevent several worker reloads to happen before the previous reload has finished, in order to avoid a steep and uncontrolled growth of resource usage, which can lead to OOMKill of the ingress-controller pods and/or CPU starvation. This is especially problematic in environments that use long lived connections (since the workers stay alive until the worker-shutdown-timeout is reached) and where a few configuration reloads might be triggered within that timeout.
There's a few issues regarding this problem such as #8166 but the solutions presented introduce other problems. The sync throttling suggested in the linked issue for example prevents service upstream endpoints from being updated on time, even if they don't require a reload, since all configuration changes currently share the same queue, hence the same throttling (for example, in the configuration proposed it could take up to ~2 minutes to remove terminating endpoints from a service, leading to HTTP 502)
This method addresses the root cause of the problem: multiple worker reloads which can happen spontaneously causing a huge amount of resources to be used, leading to OOMKill.
A few other related issues: #8336 #8362 #8357
Types of changes
How Has This Been Tested?
Running in a live cluster, with and without the option enabled.
Checklist: