-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore reconcile errors that occur because a pod is being terminated #1233
Ignore reconcile errors that occur because a pod is being terminated #1233
Conversation
Signed-off-by: Kevin Earls <[email protected]>
Signed-off-by: Kevin Earls <[email protected]>
@@ -168,9 +169,12 @@ func (r *OpenTelemetryCollectorReconciler) Reconcile(ctx context.Context, req ct | |||
func (r *OpenTelemetryCollectorReconciler) RunTasks(ctx context.Context, params reconcile.Params) error { | |||
for _, task := range r.tasks { | |||
if err := task.Do(ctx, params); err != nil { | |||
r.log.Error(err, fmt.Sprintf("failed to reconcile %s", task.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often does this logline occur when a pod is terminated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many. :-). Like for everything the reconciler touches. To be honest, it's not that big a deal, and probably something that will only be seen when running kuttl tests. But I found it very annoying, so I submitted the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to end the reconciling with by returing nil
if the reason is a terminating pod? That way the user gets informed that something did not work and we would get rid of all the unnecessary log lines?
Signed-off-by: Kevin Earls <[email protected]>
@@ -168,6 +169,10 @@ func (r *OpenTelemetryCollectorReconciler) Reconcile(ctx context.Context, req ct | |||
func (r *OpenTelemetryCollectorReconciler) RunTasks(ctx context.Context, params reconcile.Params) error { | |||
for _, task := range r.tasks { | |||
if err := task.Do(ctx, params); err != nil { | |||
// If we get an error that occur because a pod is being terminated then exit this loop | |||
if apierrors.IsForbidden(err) && strings.Contains(err.Error(), "because it is being terminated") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last thing, should we print the termination logline before? And maybe an extra hint, that we cancel the reconciliation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frzifus what log level should I print that at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought maybe:
r.log.Error(err, fmt.Sprintf("failed to reconcile %s", task.Name))
And maybe an additional line that says that the reconciliation will be canceled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit skeptical about checking the string because it is being terminated"
. This could easily change in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to check for v1.NamespaceTerminatingCause
in the error: https://github.com/kubernetes/apiserver/blob/v0.26.0-rc.0/pkg/admission/plugin/namespace/lifecycle/admission.go#L177
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damemi Thanks!
Signed-off-by: Kevin Earls <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to the others
@pavolloffay Can you merge this? |
Signed-off-by: Kevin Earls <[email protected]>
Signed-off-by: Kevin Earls <[email protected]>
@@ -168,6 +168,11 @@ func (r *OpenTelemetryCollectorReconciler) Reconcile(ctx context.Context, req ct | |||
func (r *OpenTelemetryCollectorReconciler) RunTasks(ctx context.Context, params reconcile.Params) error { | |||
for _, task := range r.tasks { | |||
if err := task.Do(ctx, params); err != nil { | |||
// If we get an error that occurs because a pod is being terminated, then exit this loop | |||
if apierrors.IsForbidden(err) && apierrors.HasStatusCause(err, corev1.NamespaceTerminatingCause) { | |||
r.log.V(2).Info("Exiting reconcile loop because namespace", params.Instance.Namespace, "is being terminated") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.log.V(2).Info("Exiting reconcile loop because namespace", params.Instance.Namespace, "is being terminated") | |
r.log.V(2).Info("Exiting reconcile loop because namespace is being terminated", "namespace", params.Instance.Namespace) |
(see https://pkg.go.dev/github.com/go-logr/logr#Logger.Info)
Signed-off-by: Kevin Earls <[email protected]>
Signed-off-by: Kevin Earls <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking issue #1291
…pen-telemetry#1233) * Ignore reconcile errors that occur because a pod is being terminated Signed-off-by: Kevin Earls <[email protected]> * Appease the all powerfull linter Signed-off-by: Kevin Earls <[email protected]> * Change behavior to end reconcile loop if pod has been terminated Signed-off-by: Kevin Earls <[email protected]> * Print a log message if we exit the reconciler loop Signed-off-by: Kevin Earls <[email protected]> * Look for NamespaceTerminatingCause Signed-off-by: Kevin Earls <[email protected]> * Appease the almighty linter Signed-off-by: Kevin Earls <[email protected]> * Fix log message Signed-off-by: Kevin Earls <[email protected]> * Skip flaky test Signed-off-by: Kevin Earls <[email protected]> Signed-off-by: Kevin Earls <[email protected]> Co-authored-by: Ben B <[email protected]>
…pen-telemetry#1233) * Ignore reconcile errors that occur because a pod is being terminated Signed-off-by: Kevin Earls <[email protected]> * Appease the all powerfull linter Signed-off-by: Kevin Earls <[email protected]> * Change behavior to end reconcile loop if pod has been terminated Signed-off-by: Kevin Earls <[email protected]> * Print a log message if we exit the reconciler loop Signed-off-by: Kevin Earls <[email protected]> * Look for NamespaceTerminatingCause Signed-off-by: Kevin Earls <[email protected]> * Appease the almighty linter Signed-off-by: Kevin Earls <[email protected]> * Fix log message Signed-off-by: Kevin Earls <[email protected]> * Skip flaky test Signed-off-by: Kevin Earls <[email protected]> Signed-off-by: Kevin Earls <[email protected]> Co-authored-by: Ben B <[email protected]>
Signed-off-by: Kevin Earls [email protected]
This fixes #1219 by not logging errors that occur because we are trying to reconcile something in a namespace that is being terminated.