-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a ring channel to avoid blocking write of events #2082
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these OWNERS Files:
Approvers can indicate their approval by writing |
internal/ingress/controller/nginx.go
Outdated
@@ -106,7 +107,7 @@ func NewNGINXController(config *Configuration, fs file.Filesystem) *NGINXControl | |||
}), | |||
|
|||
stopCh: make(chan struct{}), | |||
updateCh: make(chan store.Event, 1024), | |||
updateCh: channels.NewRingChannel(4096), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this specify a limit of 4096? If so, the initial load does not appear to pop items off the ring until the cache has been initialized. This still might cause a problem. Can we lower to 1024 to confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let me lower that value and create a different docker image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. still works, my guess is the ring can expand beyond 1024.
@azweb76 please use |
Correct me if Im wrong. On startup all events are written to the ring. Then later popped off the ring. Given this, only a maximum of 1024 events will be tracked in sync. If I understand, the ring is intialized as 1024/4096/etc and if the cluster exceeds this, the first N items are dropped before they are processed. |
Seems like we need to start processing the channel ring before we append to it so we can treat it like a buffer on initial startup. |
Yes. That does not mean we are not handling the events. Keep in mind we use a work queue for the update of the configuration and we discard all the event in a time window (all the events in the time lapse we start and finish the update). |
This is one example of the chicken and the egg problem, without the store we don't have events and without the syncQueue we cannot start to process updates but syncQueue depends on the store. Using the ring channel is the right solution because it allows us to discard the some of the initial events |
Sure. After it has started. But this loop is what processes those items and since the loop doesnt happen until all initial events are loaded, only up to the updateCh limit is processed. https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/nginx.go#L302 |
@azweb76 what are you proposing exactly? |
I greatly appreciate the effort you're putting into this fix. I just want to make sure we're not introducing another bug. You certainly know more about this than I do, so I will be quiet now. Thanks again! |
The update channel was introduced because of the store refactor. This is something we needed because it was almost impossible to have tests for the informers and all the logic behind the sync process. |
The issue in 2022 is related to the number of events we receive at startup that causes a write contention in the channel (we exceed the buffer size). The change introduced here allows us to discard events when we exceed the defined size. As I mentioned previously we don't need all the events (from the start) but at the same time, we cannot run any filter at this point because of the lack of context. If you run the test image and increase the log level to 3 using the flag |
awesome. Looking forward to the release. 😄 |
Which issue this PR fixes:
fixes #2022
closes #2081