internal/contour: only write status updates if we're the leader #1745

davecheney · 2019-10-21T02:06:38Z

Fixes #1425
Fixes #1385
Updates #499

This PR threads the leader elected signal throught to
contour.EventHandler allowing it to skip writing status back to the API
unless it is currently the leader.

This should fixes #1425 by removing the condition where several Contours
would fight to update status. This updates #499 by continuing to reduce
the number of updates that Contour generates, thereby processes.

This PR does create a condition where during startup no Contour may be
the leader and the xDS tables reach steady state before anyone is
elected. This would mean the status of an object would be stale until
the next update from the API server after leadership was established.
To address this a mechanism to force a rebuild of the dag is added to
the EventHandler and wired to election success.

Signed-off-by: Dave Cheney [email protected]

Fixes projectcontour#1425 Fixes projectcontour#1385 Updates projectcontour#499 This PR threads the leader elected signal throught to contour.EventHandler allowing it to skip writing status back to the API unless it is currently the leader. This should fixes projectcontour#1425 by removing the condition where several Contours would fight to update status. This updates projectcontour#499 by continuing to reduce the number of updates that Contour generates, thereby processes. This PR does create a condition where during startup no Contour may be the leader and the xDS tables reach steady state before anyone is elected. This would mean the status of an object would be stale until the next update from the API server after leadership was established. To address this a mechanism to force a rebuild of the dag is added to the EventHandler and wired to election success. Signed-off-by: Dave Cheney <[email protected]>

youngnick

LGTM. Nice.

This PR addresses incorrect comments that I noticed during projectcontour#1745. This PR also moves the holdoff exceeded calculation to its own function cleaning up the caller slightly. Something I noticed doing this is the holdoff logic is as follows. 1. When the EventHandler is created e.last is set to time.Now, this causes the stream of updates from the informers to be heldoff for up to e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several times until all updates from the API server are processed. 2. Once the cluster settles down the _next_ update from the API server will likely be larger than e.HoldoffMaxDelay from e.last and will be processed _immediately_. 3. If another update arrives after 2. but before another e.HoldoffMaxDelay then a pending timer will be started and the update processed after e.HoldoffDelay. The long and the short of the current logic is infrequent, O(seconds), updates will be processed immediately, not after e.HoldoffDelay. This was surprising to me as this was not the logic that I thought that I wrote when I last went through this function. That is not to say it is wrong, however it does open the possibility that a large stream of updates (k apply ...) arriving after the cluster is stable will cause at least 2 updates with the second being at least e.HoldoffDelay since the first. Signed-off-by: Dave Cheney <[email protected]>

This PR addresses incorrect comments that I noticed during projectcontour#1745. Something I noticed doing this is the holdoff logic is as follows. 1. When the EventHandler is created e.last is set to time.Now, this causes the stream of updates from the informers to be heldoff for up to e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several times until all updates from the API server are processed. 2. Once the cluster settles down the _next_ update from the API server will likely be larger than e.HoldoffMaxDelay from e.last and will be processed _immediately_. 3. If another update arrives after 2. but before another e.HoldoffMaxDelay then a pending timer will be started and the update processed after e.HoldoffDelay. The long and the short of the current logic is infrequent, O(seconds), updates will be processed immediately, not after e.HoldoffDelay. This was surprising to me as this was not the logic that I thought that I wrote when I last went through this function. That is not to say it is wrong, however it does open the possibility that a large stream of updates (k apply ...) arriving after the cluster is stable will cause at least 2 updates with the second being at least e.HoldoffDelay since the first. Signed-off-by: Dave Cheney <[email protected]>

This PR addresses incorrect comments that I noticed during #1745. Something I noticed doing this is the holdoff logic is as follows. 1. When the EventHandler is created e.last is set to time.Now, this causes the stream of updates from the informers to be heldoff for up to e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several times until all updates from the API server are processed. 2. Once the cluster settles down the _next_ update from the API server will likely be larger than e.HoldoffMaxDelay from e.last and will be processed _immediately_. 3. If another update arrives after 2. but before another e.HoldoffMaxDelay then a pending timer will be started and the update processed after e.HoldoffDelay. The long and the short of the current logic is infrequent, O(seconds), updates will be processed immediately, not after e.HoldoffDelay. This was surprising to me as this was not the logic that I thought that I wrote when I last went through this function. That is not to say it is wrong, however it does open the possibility that a large stream of updates (k apply ...) arriving after the cluster is stable will cause at least 2 updates with the second being at least e.HoldoffDelay since the first. Signed-off-by: Dave Cheney <[email protected]>

davecheney added this to the 1.0.0-rc.2 milestone Oct 21, 2019

youngnick approved these changes Oct 21, 2019

View reviewed changes

youngnick mentioned this pull request Oct 21, 2019

Leader election should be multi-read, single write style #1385

Closed

1 task

davecheney merged commit 3a7ca08 into projectcontour:master Oct 21, 2019

davecheney deleted the issue/1425 branch October 21, 2019 03:22

davecheney mentioned this pull request Oct 21, 2019

internal/contour: clean up holdoff timer #1747

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/contour: only write status updates if we're the leader #1745

internal/contour: only write status updates if we're the leader #1745

davecheney commented Oct 21, 2019

youngnick left a comment

internal/contour: only write status updates if we're the leader #1745

internal/contour: only write status updates if we're the leader #1745

Conversation

davecheney commented Oct 21, 2019

youngnick left a comment

Choose a reason for hiding this comment