-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/contour: HoldoffMaxDelay timer should increase if there are pending updates #2275
Comments
I'd like take a stab at this. The goal of |
Updates projectcontour#2275 The holdoff logic is subtle, and under tested. To address both of these issues I'm going to make the fix for projectcontour#2275 in several stages so we have a chance to git bisect. This first change removes the explicit force code path when holdoffmaxdelay is exceeded. The logic remains the same, if holdoffmaxdelay is exceeded a delayed update will occur, however after this change rather than their being a specific code path for the forced update we set the timer delay to zero, which will make the <-pending channel ready shortly afterwards. Strictly speaking a stream of <-op updates can trump a <-pending which can add further delay to rebuilding the dag but in practice this is unlikely to occur indefinitely; 1. When more than one channel is selectable, the runtime chooses between them pseudo randomly. 2. If <-op updates occur continually it is in the users' interest that the DAG rebuild is delayed, this is the point of 2275. A future PR will refactor the delay calculation so that we can test it in isolation. Signed-off-by: Dave Cheney <[email protected]>
Updates projectcontour#2275 The holdoff logic is subtle, and under tested. To address both of these issues I'm going to make the fix for projectcontour#2275 in several stages so we have a chance to git bisect. This first change removes the explicit force code path when holdoffmaxdelay is exceeded. The logic remains the same, if holdoffmaxdelay is exceeded a delayed update will occur, however after this change rather than their being a specific code path for the forced update we set the timer delay to zero, which will make the <-pending channel ready shortly afterwards. Strictly speaking a stream of <-op updates can trump a <-pending which can add further delay to rebuilding the dag but in practice this is unlikely to occur indefinitely; 1. When more than one channel is selectable, the runtime chooses between them pseudo randomly. 2. If <-op updates occur continually it is in the users' interest that the DAG rebuild is delayed, this is the point of 2275. A future PR will refactor the delay calculation so that we can test it in isolation. Signed-off-by: Dave Cheney <[email protected]>
Updates projectcontour#2275 The holdoff logic is subtle, and under tested. To address both of these issues I'm going to make the fix for projectcontour#2275 in several stages so we have a chance to git bisect. This first change removes the explicit force code path when holdoffmaxdelay is exceeded. The logic remains the same, if holdoffmaxdelay is exceeded a delayed update will occur, however after this change rather than their being a specific code path for the forced update we set the timer delay to zero, which will make the <-pending channel ready shortly afterwards. Strictly speaking a stream of <-op updates can trump a <-pending which can add further delay to rebuilding the dag but in practice this is unlikely to occur indefinitely; 1. When more than one channel is selectable, the runtime chooses between them pseudo randomly. 2. If <-op updates occur continually it is in the users' interest that the DAG rebuild is delayed, this is the point of 2275. A future PR will refactor the delay calculation so that we can test it in isolation. Signed-off-by: Dave Cheney <[email protected]>
Updates #2275 The holdoff logic is subtle, and under tested. To address both of these issues I'm going to make the fix for #2275 in several stages so we have a chance to git bisect. This first change removes the explicit force code path when holdoffmaxdelay is exceeded. The logic remains the same, if holdoffmaxdelay is exceeded a delayed update will occur, however after this change rather than their being a specific code path for the forced update we set the timer delay to zero, which will make the <-pending channel ready shortly afterwards. Strictly speaking a stream of <-op updates can trump a <-pending which can add further delay to rebuilding the dag but in practice this is unlikely to occur indefinitely; 1. When more than one channel is selectable, the runtime chooses between them pseudo randomly. 2. If <-op updates occur continually it is in the users' interest that the DAG rebuild is delayed, this is the point of 2275. A future PR will refactor the delay calculation so that we can test it in isolation. Signed-off-by: Dave Cheney <[email protected]>
Updates projectcontour#2275 Continue to chip away at the holdoff logic's hooks into the EventHandler's state. This change renames e.last to a local lastDAGUpdate variable and cleans up some incorrect comments. Signed-off-by: Dave Cheney <[email protected]>
Updates #2275 Continue to chip away at the holdoff logic's hooks into the EventHandler's state. This change renames e.last to a local lastDAGUpdate variable and cleans up some incorrect comments. Signed-off-by: Dave Cheney <[email protected]>
Hello, I've noodled with this for a few days and landed some cleanup PRs for 1.3 however I have decided not to proceed any further with this work. This is for several reasons.
On balance I think this is something that is worthwhile doing, but I can't justify it now with the current issue load. |
The Contour project currently lacks enough contributors to adequately respond to all Issues. This bot triages Issues according to the following rules:
You can:
Please send feedback to the #contour channel in the Kubernetes Slack |
The Contour project currently lacks enough contributors to adequately respond to all Issues. This bot triages Issues according to the following rules:
You can:
Please send feedback to the #contour channel in the Kubernetes Slack |
Currently, in
handler.go
, the holdoff timer has a maximum delay, after which it will fire a forced update.As part of an investigation into large amount of memory being used by Envoy at startup, we came up with the theory that, if it takes longer than HoldoffMaxDelay to process all the objects in a cluster, we could end up firing multiple times, maybe a lot of times.
This would create a lot of separate Envoy configs very quickly, which would all also drain quickly (since there would be very few connections actually using each one).
A way to mitigate this may be to dynamically manage the holdoffMaxDelay, and increase it for the next time if updates are pending when a max-delay update occurs. It should reset to its original value once updates complete draining.
This would produce an backoff-style behaviour in Envoy updates when there are a lot of updates pending (such as at startup), which should hopefully help with not generating as many.
The text was updated successfully, but these errors were encountered: