-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-35343: make shutdown-delay-duration configurable #1685
base: master
Are you sure you want to change the base?
OCPBUGS-35343: make shutdown-delay-duration configurable #1685
Conversation
@tkashem: This pull request references Jira Issue OCPBUGS-30860, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cc @p0lyn0mial @vrutkovs |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tkashem The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
@tkashem: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -97,6 +106,7 @@ func (r *renderOpts) AddFlags(fs *pflag.FlagSet) { | |||
fs.StringVar(&r.clusterConfigFile, "cluster-config-file", r.clusterConfigFile, "Openshift Cluster API Config file.") | |||
fs.StringVar(&r.clusterAuthFile, "cluster-auth-file", r.clusterAuthFile, "Openshift Cluster Authentication API Config file.") | |||
fs.StringVar(&r.infraConfigFile, "infra-config-file", "", "File containing infrastructure.config.openshift.io manifest.") | |||
fs.DurationVar(&r.shutdownDelayDuration, "shutdown-delay-duration", r.shutdownDelayDuration, "shutdown-delay-duration argument for the bootstrap kube-apiserver.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not to add it directly to the bootstrap-api-server's manifest ? - https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/bindata/bootkube/bootstrap-manifests/kube-apiserver-pod.yaml#L46
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootkube already has an override -
cluster-kube-apiserver-operator/bindata/bootkube/config/bootstrap-config-overrides.yaml
Lines 79 to 81 in 04eba64
{{- if .ShutdownDelayDuration}} | |
shutdown-delay-duration: | |
- {{ .ShutdownDelayDuration }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, cool!
should we also set shutdown-send-retry-after
?
@@ -192,7 +206,7 @@ func (r *renderOpts) Run() error { | |||
BindAddress: "0.0.0.0:6443", | |||
BindNetwork: "tcp4", | |||
TerminationGracePeriodSeconds: 135, // bit more than 70s (minimal termination period) + 60s (apiserver graceful termination) | |||
ShutdownDelayDuration: "", // do not override | |||
ShutdownDelayDuration: r.shutdownDelayDuration.String(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Do we want to expose it as a command line arg ? If yes then we have to coordinate updating it with terminationGracePeriodSeconds
. Otherwise kubelet
might terminate kas
process before graceful termination completes.
// by default, we are giving the load balancer 20s to remove the | ||
// bootstrap kube-apiserver from its pool after TERM signal is | ||
// sent to the kube-apiserver on the bootstrap node. | ||
shutdownDelayDuration: 20 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the default value for shutdownDelayDuration
?
It looks like it is 0
, does it mean that the bootstrap api server didn't give any time to LBs to react ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc its ) for SNO and 129 for AWS:
Lines 37 to 46 in 37df1b1
case infra.Status.ControlPlaneTopology == configv1.SingleReplicaTopologyMode: | |
// reduce the shutdown delay to 0 to reach the maximum downtime for SNO | |
observedShutdownDelayDuration = "0s" | |
case infra.Spec.PlatformSpec.Type == configv1.AWSPlatformType: | |
// AWS has a known issue: https://bugzilla.redhat.com/show_bug.cgi?id=1943804 | |
// We need to extend the shutdown-delay-duration so that an NLB has a chance to notice and remove unhealthy instance. | |
// Once the mentioned issue is resolved this code must be removed and default values applied | |
// | |
// Note this is the official number we got from AWS | |
observedShutdownDelayDuration = "129s" |
/retitle OCPBUGS-35343: make shutdown-delay-duration configurable |
@tkashem: This pull request references Jira Issue OCPBUGS-35343, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
No description provided.