-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/kates: fix bug where Accumulator was not coalescing changes mid-watch #4488
Conversation
6087e1f
to
051eb11
Compare
5271ca9
to
8df2095
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite a few comments, questions and suggestions. I will ping you in the morning to go over a few items to make sure I'm understanding them all correctly.
BTW...awesome to see the improved testing results :)
@ddymko - I think it is important for you to have yours eyes on this just so you familiarize yourself with it a little bit. |
a8d6656
to
64ca96c
Compare
TestBootStrapNoNotifyBeforeSync creates ConfigMaps during its tests which doesn't get cleaned up afterwards potentially infecting other tests. Add t.Cleanup to the test to clean up those ConfigMaps after the test is finished. Signed-off-by: Hamzah Qudsi <[email protected]>
64ca96c
to
fcee727
Compare
fcee727
to
d3e21b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good based on our discussion. Just a quick question and I agree with David's feedback.
d3e21b7
to
05961c5
Compare
805ca7d
to
9ded3cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@haq204 I think you have to rerun Otherwise it looks good! |
The Accumulator struct attempts to coalece changes into a single snapshot update as a way to do graceful load shedding. However, while this was the behavior on bootstrap, it didn't always happen mid-watch - each event that was received turned into a single snapshot update, thus not really satisfying this requirement. We add a new option to batch changes for a specified window interval before sending a snapshot update. The batching behavior is as follows: - The Accumulator will receive raw changes up until the window period where it will then send a change, even if new updates are still coming in. This is to prevent the potential of a scenario where a change is never sent due to an extremely volatile cluster. While there may be a way to be more dynamic in how long to wait before sending this change, this approach is simpler and more predicable. - If an isolated updated comes in (e.g. last change was submitted an hour ago but the window period is set to 10s), it may not neccessarily wait until the window period before sending change, it can send immediately. - The default interval is set to 1s to be inline with current change velocity. - A snapshot update won't be sent until all resources are fully bootstrapped, regardless of what interval is set. This is the ensure that the other requirements for the Accumulator are still satisfied. For testing, we add new test cases. Signed-off-by: Hamzah Qudsi <[email protected]>
AMBASSADOR_RECONFIG_MAX_DELAY controls the interval to wait before sending snapshot updates when listening for K8s resources, especially when many resources are updated in quick succession. Signed-off-by: Hamzah Qudsi <[email protected]>
Signed-off-by: Hamzah Qudsi <[email protected]>
9ded3cb
to
d8634b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
[v2.4] Port commits from PR #4488
Description
The Accumulator struct attempts to coalesce changes into a single snapshot update as a way to do graceful load shedding.
However, while this was the behavior on bootstrap, it didn't always happen mid-watch - each event that was received turned into a single snapshot update, thus not really satisfying this requirement.
We add a new option to batch changes for a specified window interval before sending a snapshot update.
The batching behavior is as follows:
The Accumulator will receive raw changes up until the window period where it will then send a change, even if new updates are still coming in.
This is to prevent the potential of a scenario where a change is never sent due to an extremely volatile cluster.
While there may be a way to be more dynamic in how long to wait before sending this change, this approach is simpler and more predicable.
If an isolated updated comes in (e.g. last change was submitted an hour ago but the window period is set to 10s), it may not neccessarily wait until the window period before sending change, it can send immediately.
The default interval is set to 1s to be inline with current change velocity.
A snapshot update won't be sent until all resources are fully bootstrapped, regardless of what interval is set.
This is the ensure that the other requirements for the Accumulator are still satisfied.
AMBASSADOR_RECONFIG_MAX_DELAY
controls the interval to wait before sending snapshot updates when listening for K8s resources, especially when many resources are updated in quick succession.Related Issues
Issue is not in this repo.
Testing
Add new test cases.
For performance testing, ran a test that concurrently applies namespaces with each namespace applying 35 deployments, host, and mappings each. Each deployment has 2 replicas. We track the number of snapshot versions pushed. Prevously the number of snapshot versions created when applying 1 namespace was ~118. After setting a 10s interval, the snapshot version reduced to 16. For max concurrent namespaces it would previously start OOMing at 6 concurrent namespaces; with a 10 sec interval it's at least >15. Peak memory usage was 600MB so with a higher memory limit, it can likely support much higher.
Checklist
I made sure to update
CHANGELOG.md
.Remember, the CHANGELOG needs to mention:
This is unlikely to impact how Ambassador performs at scale - load testing shows xxxx.
Remember, things that might have an impact at scale include:
My change is adequately tested.
Remember when considering testing:
I updated
DEVELOPING.md
with any any special dev tricks I had to use to work on this code efficiently - N/A.The changes in this PR have been reviewed for security concerns and adherence to security best practices.