-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Allow to specify grace period for pod GC #5033
Conversation
cc @stefansedich who's looking for this feature as well. |
Love your work! Question however will this do what we want? If a pod shuts down right away the grace period won't help right? I believe this handles the time between pod stop and force kill. |
A pod is added to the podCleanupQueue when it meets the podGCStrategy. This grace period is the time to wait before the pod in the queue gets deleted. |
config/config.go
Outdated
@@ -92,6 +92,10 @@ type Config struct { | |||
// PodSpecLogStrategy enables the logging of podspec on controller log. | |||
PodSpecLogStrategy PodSpecLogStrategy `json:"podSpecLogStrategy,omitempty"` | |||
|
|||
// PodGCGracePeriodSeconds specifies the duration in seconds before the pods in the GC queue get deleted. | |||
// Value must be non-negative integer. Defaults to zero, which indicates delete immediately. | |||
PodGCGracePeriodSeconds int64 `json:"podGCGracePeriodSeconds,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64 allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this to be *int64
to be consistent with the type of DeleteOptions.GracePeriodSeconds
.
workflow/controller/controller.go
Outdated
err := pods.Delete(ctx, podName, metav1.DeleteOptions{PropagationPolicy: &propagation}) | ||
err := pods.Delete(ctx, podName, metav1.DeleteOptions{ | ||
PropagationPolicy: &propagation, | ||
GracePeriodSeconds: pointer.Int64Ptr(wfc.Config.PodGCGracePeriodSeconds)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is 30s by default, so presumably, you'll make this longer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's necessary to make this longer for certain scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@terrytangyuan @alexec I am still trying to understand how this change waits before deleing pods, this call here deletes the pod setting the grace-period-seconds, which as far as I understand what will happen:
- SIGTERM is sent to container
- SIGKILL is sent if container does not gracefully shutdown within the grace-period
If my container exits immediately after the SIGTERM or in this case is not even running as it is completed how is the grace period helping to delay it's deletion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity, the above was resolved and implemented in #6168
Signed-off-by: terrytangyuan <[email protected]>
Signed-off-by: terrytangyuan <[email protected]>
Signed-off-by: terrytangyuan <[email protected]>
Use case: we need some grace period to allow other services to complete the pod information collection (e.g. log and db persistence), especially during high load where those services have certain amount of delays.
Checklist: