-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-37668: [release-4.15] operator: LeaderElectionReleaseOnCancel #974
Conversation
/jira cherrypick OCPBUGS-23795 |
@zeeke: Jira Issue OCPBUGS-23795 has been cloned as Jira Issue OCPBUGS-37668. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@zeeke: This pull request references Jira Issue OCPBUGS-37668, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
When manually restarting the operator, the leader election may take 5+ minutest to acquire the lease on startup: ``` I1205 16:06:02.101302 1 leaderelection.go:245] attempting to acquire leader lease openshift-sriov-network-operator/a56def2a.openshift.io... ... I1205 16:08:40.133558 1 leaderelection.go:255] successfully acquired lease openshift-sriov-network-operator/a56def2a.openshift.io ``` The manager's option `LeaderElectionReleaseOnCancel` would solve this problem, but it's not safe as the shutdown cleanup procedures (inhibiting webhooks and removing finalizers) would run without any leader guard. This commit moves the LeaderElection mechanism from the namespaced manager to a dedicated, no-op controller manager. This approach has been preferred to directly dealing with the LeaderElection API as: - It leverages library code that has been proved to be stable - It includes recording k8s Events about the Lease process - The election process must come after setting up the health probe. Doing it manually would involve handling the healthz endpoint as well. Signed-off-by: Andrea Panattoni <[email protected]>
Signed-off-by: Andrea Panattoni <[email protected]>
Add CoordinationV1 to `test/util/clients.go` to make assertions on `coordination.k8s.io/Lease` objects. Add `OPERATOR_LEADER_ELECTION_ENABLE` environment variable to `deploy/operator.yaml` to let user enable leader election on the operator. Signed-off-by: Andrea Panattoni <[email protected]>
/jira backport release-4.14 |
@zeeke: The following backport issues have been created:
Queuing cherrypicks to the requested branches to be created after this PR merges: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of release-4.14 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@zeeke: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -41,6 +41,8 @@ spec: | |||
image: $SRIOV_NETWORK_OPERATOR_IMAGE | |||
command: | |||
- sriov-network-operator | |||
args: | |||
- --leader-elect=$OPERATOR_LEADER_ELECTION_ENABLE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this in the bundle also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Openshift bundle already has --leader-elect
option set by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great thanks!
btw I am almost sure I already asked that :P
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: SchSeba, zeeke The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold wait for the fix in: |
@zeeke: This pull request references Jira Issue OCPBUGS-37668. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
backport of:
This ensure the lease is released by the operator as soon as it gracefully stops.