Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding ctrl manager graceful shutdown timeout #570

Merged
merged 1 commit into from
Dec 9, 2022

Conversation

relu
Copy link
Member

@relu relu commented Nov 25, 2022

Overriding the default GracefulShutdownTimeout option given to the controller manager with a default of 0 (no timeout) since the helm operations are sensitive to interruption and can lead to leaving the HelmRelease in a bad state.

This will also allow users to override the option via a CLI flag -graceful-shutdown-timeout how much time to wait before forcibly exiting.

Overriding the default GracefulShutdownTimeout option given to the
controller manager with a default of 0 (no timeout) since the helm
operations are sensitive to interruption and can lead to leaving the
HelmRelease in a bad state.

This will also allow users to override the option via a cli flag
`-graceful-shutdown-timeout` how much time to wait before forcibly
exiting.

Related to #569

Signed-off-by: Aurel Canciu <[email protected]>
@hiddeco hiddeco added the enhancement New feature or request label Nov 25, 2022
@relu
Copy link
Member Author

relu commented Nov 28, 2022

I've added a commit to remove the readinessProbe as well since without this change graceful shutdown won't work anyway.
The alternative would be to adjust its failureThreshold so it allows the container to run for a longer period of time (needing to match the terminationGracePeriodSeconds which we have set to 600 by default) before being terminated, but I don't think this is quite the right solution.
All in all, I don't really think there's much value in having the readinessProbe configured because the helm-controller does not expose a service endpoint and thus readiness doesn't quite make sense anyway in its runtime lifecycle context.

@relu relu requested a review from hiddeco November 28, 2022 13:40
@pjbgf pjbgf mentioned this pull request Dec 8, 2022
22 tasks
@hiddeco
Copy link
Member

hiddeco commented Dec 8, 2022

Sorry for the late reply, was floored by COVID the past couple of days.

I am concerned about the readiness probe being removed for scenarios in which someone has configured multiple replicas, as this would report the non-elected Pod as being Ready while it is not actually handling reconciliation requests. Which does touch the runtime lifecycle context without it exposing a Service endpoint.

@stefanprodan
Copy link
Member

stefanprodan commented Dec 8, 2022

I don't think we should remove readiness in this PR. If we decided to do this, it should be done across all controllers which don't have ClusterIP services like KC, IRC, IAC.

@relu
Copy link
Member Author

relu commented Dec 8, 2022

Sorry for the late reply, was floored by COVID the past couple of days.

No worries, I'm sorry to hear that, hope you're feeling better now!

Thanks for the feedback. I agree with both of your assessments. I think ideally this problem should be addressed somehow at the level of the controller-runtime, having the behavior changed so the probes are not shut down during graceful termination.

Going to remove the second commit.

@relu relu force-pushed the fix-graceful-shutdown branch from 20176bb to e242bb0 Compare December 8, 2022 15:09
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @relu

PS. Please add the new flag to the controller options documentation here: https://github.com/fluxcd/website/blob/main/content/en/flux/components/helm/options.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants