Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/contour: only write status updates if we're the leader #1745

Merged
merged 1 commit into from
Oct 21, 2019

Conversation

davecheney
Copy link
Contributor

Fixes #1425
Fixes #1385
Updates #499

This PR threads the leader elected signal throught to
contour.EventHandler allowing it to skip writing status back to the API
unless it is currently the leader.

This should fixes #1425 by removing the condition where several Contours
would fight to update status. This updates #499 by continuing to reduce
the number of updates that Contour generates, thereby processes.

This PR does create a condition where during startup no Contour may be
the leader and the xDS tables reach steady state before anyone is
elected. This would mean the status of an object would be stale until
the next update from the API server after leadership was established.
To address this a mechanism to force a rebuild of the dag is added to
the EventHandler and wired to election success.

Signed-off-by: Dave Cheney [email protected]

Fixes projectcontour#1425
Fixes projectcontour#1385
Updates projectcontour#499

This PR threads the leader elected signal throught to
contour.EventHandler allowing it to skip writing status back to the API
unless it is currently the leader.

This should fixes projectcontour#1425 by removing the condition where several Contours
would fight to update status. This updates projectcontour#499 by continuing to reduce
the number of updates that Contour generates, thereby processes.

This PR does create a condition where during startup no Contour may be
the leader and the xDS tables reach steady state before anyone is
elected. This would mean the status of an object would be stale until
the next update from the API server after leadership was established.
To address this a mechanism to force a rebuild of the dag is added to
the EventHandler and wired to election success.

Signed-off-by: Dave Cheney <[email protected]>
@davecheney davecheney added this to the 1.0.0-rc.2 milestone Oct 21, 2019
Copy link
Member

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice.

@davecheney davecheney merged commit 3a7ca08 into projectcontour:master Oct 21, 2019
@davecheney davecheney deleted the issue/1425 branch October 21, 2019 03:22
davecheney added a commit to davecheney/contour that referenced this pull request Oct 21, 2019
This PR addresses incorrect comments that I noticed during projectcontour#1745.
This PR also moves the holdoff exceeded calculation to its own function
cleaning up the caller slightly.

Something I noticed doing this is the holdoff logic is as follows.

1. When the EventHandler is created e.last is set to time.Now, this
causes the stream of updates from the informers to be heldoff for up to
e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several
times until all updates from the API server are processed.
2. Once the cluster settles down the _next_ update from the API server
will likely be larger than e.HoldoffMaxDelay from e.last and will be
processed _immediately_.
3. If another update arrives after 2. but before another
e.HoldoffMaxDelay then a pending timer will be started and the update
processed after e.HoldoffDelay.

The long and the short of the current logic is infrequent, O(seconds),
updates will be processed immediately, not after e.HoldoffDelay. This
was surprising to me as this was not the logic that I thought that I
wrote when I last went through this function. That is not to say it is
wrong, however it does open the possibility that a large stream of
updates (k apply ...) arriving after the cluster is stable will cause at
least 2 updates with the second being at least e.HoldoffDelay since the
first.

Signed-off-by: Dave Cheney <[email protected]>
davecheney added a commit to davecheney/contour that referenced this pull request Oct 21, 2019
This PR addresses incorrect comments that I noticed during projectcontour#1745.
This PR also moves the holdoff exceeded calculation to its own function
cleaning up the caller slightly.

Something I noticed doing this is the holdoff logic is as follows.

1. When the EventHandler is created e.last is set to time.Now, this
causes the stream of updates from the informers to be heldoff for up to
e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several
times until all updates from the API server are processed.
2. Once the cluster settles down the _next_ update from the API server
will likely be larger than e.HoldoffMaxDelay from e.last and will be
processed _immediately_.
3. If another update arrives after 2. but before another
e.HoldoffMaxDelay then a pending timer will be started and the update
processed after e.HoldoffDelay.

The long and the short of the current logic is infrequent, O(seconds),
updates will be processed immediately, not after e.HoldoffDelay. This
was surprising to me as this was not the logic that I thought that I
wrote when I last went through this function. That is not to say it is
wrong, however it does open the possibility that a large stream of
updates (k apply ...) arriving after the cluster is stable will cause at
least 2 updates with the second being at least e.HoldoffDelay since the
first.

Signed-off-by: Dave Cheney <[email protected]>
davecheney added a commit to davecheney/contour that referenced this pull request Oct 21, 2019
This PR addresses incorrect comments that I noticed during projectcontour#1745.
This PR also moves the holdoff exceeded calculation to its own function
cleaning up the caller slightly.

Something I noticed doing this is the holdoff logic is as follows.

1. When the EventHandler is created e.last is set to time.Now, this
causes the stream of updates from the informers to be heldoff for up to
e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several
times until all updates from the API server are processed.
2. Once the cluster settles down the _next_ update from the API server
will likely be larger than e.HoldoffMaxDelay from e.last and will be
processed _immediately_.
3. If another update arrives after 2. but before another
e.HoldoffMaxDelay then a pending timer will be started and the update
processed after e.HoldoffDelay.

The long and the short of the current logic is infrequent, O(seconds),
updates will be processed immediately, not after e.HoldoffDelay. This
was surprising to me as this was not the logic that I thought that I
wrote when I last went through this function. That is not to say it is
wrong, however it does open the possibility that a large stream of
updates (k apply ...) arriving after the cluster is stable will cause at
least 2 updates with the second being at least e.HoldoffDelay since the
first.

Signed-off-by: Dave Cheney <[email protected]>
davecheney added a commit to davecheney/contour that referenced this pull request Oct 21, 2019
This PR addresses incorrect comments that I noticed during projectcontour#1745.
Something I noticed doing this is the holdoff logic is as follows.

1. When the EventHandler is created e.last is set to time.Now, this
causes the stream of updates from the informers to be heldoff for up to
e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several
times until all updates from the API server are processed.
2. Once the cluster settles down the _next_ update from the API server
will likely be larger than e.HoldoffMaxDelay from e.last and will be
processed _immediately_.
3. If another update arrives after 2. but before another
e.HoldoffMaxDelay then a pending timer will be started and the update
processed after e.HoldoffDelay.

The long and the short of the current logic is infrequent, O(seconds),
updates will be processed immediately, not after e.HoldoffDelay. This
was surprising to me as this was not the logic that I thought that I
wrote when I last went through this function. That is not to say it is
wrong, however it does open the possibility that a large stream of
updates (k apply ...) arriving after the cluster is stable will cause at
least 2 updates with the second being at least e.HoldoffDelay since the
first.

Signed-off-by: Dave Cheney <[email protected]>
davecheney added a commit that referenced this pull request Oct 21, 2019
This PR addresses incorrect comments that I noticed during #1745.
Something I noticed doing this is the holdoff logic is as follows.

1. When the EventHandler is created e.last is set to time.Now, this
causes the stream of updates from the informers to be heldoff for up to
e.HoldoffDelay + e.HoldoffMaxDelay. This process may happen several
times until all updates from the API server are processed.
2. Once the cluster settles down the _next_ update from the API server
will likely be larger than e.HoldoffMaxDelay from e.last and will be
processed _immediately_.
3. If another update arrives after 2. but before another
e.HoldoffMaxDelay then a pending timer will be started and the update
processed after e.HoldoffDelay.

The long and the short of the current logic is infrequent, O(seconds),
updates will be processed immediately, not after e.HoldoffDelay. This
was surprising to me as this was not the logic that I thought that I
wrote when I last went through this function. That is not to say it is
wrong, however it does open the possibility that a large stream of
updates (k apply ...) arriving after the cluster is stable will cause at
least 2 updates with the second being at least e.HoldoffDelay since the
first.

Signed-off-by: Dave Cheney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework how statuses are written during dag rebuild Leader election should be multi-read, single write style
2 participants