Throttle reconciliation in case of error #203

raffis · 2020-12-14T10:28:16Z

If there is an error it looks like the configured interval does not get considered.

I have a 10m configured interval and one error in a manifest and see a reconciliation every ~10s.
Which also leads to a slack alert for each. (Also the slack alerts are useless because of #190, I don't see the actual error in the notification nor the log because of #202).

{"level":"error","ts":"2020-12-14T10:23:53.606Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:23:53.606Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:21.658Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:21.658Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:48.822Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:24:48.822Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:16.280Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:16.280Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:41.754Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/devops-k8s","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}
{"level":"error","ts":"2020-12-14T10:25:41.754Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"devops-k8s","namespace":"flux-system","error":"Kustomization.kustomize.toolkit.fluxcd.io \"devops-k8s\" is invalid: status.conditions.message: Invalid value: \"\": status.conditions.message in body should be at most 32768 chars long"}

The text was updated successfully, but these errors were encountered:

stefanprodan · 2020-12-14T14:17:18Z

We use the controller-runtime exponential backoff, the retries interval is slowly increased.

raffis · 2020-12-14T14:25:50Z

We use the controller-runtime exponential backoff, the retries interval is slowly increased.

In this case I see no point in that, these kind of errors don't go away until the next source sync. This only fixes temporary issues.

stefanprodan · 2020-12-14T14:32:50Z

Well a kubectl apply will fail if your master nodes are restarting or if there is a temporary connection errors between pods and Kubernetes API service.

raffis · 2020-12-14T14:42:23Z

Well the reconciliation with exponential backoff is not really problem but in combination with slack alerts quite unusable, you end up with way too many alerts. Is there a workaround for the notification controller?

If not maybe the solution is better placed there, I've only seen suspend but it does not get resumed after source changes for what I've seen now. So a check which does not send the same events again in a certain time window may do it.

stefanprodan · 2020-12-14T14:47:21Z

I think this can be a feature request for notification-controller, if we make it into a statefulset, then it can store events in a database and prevent spurious events being send.

stefanprodan · 2020-12-14T16:32:49Z

Well the reconciliation with exponential backoff is not really problem but in combination with slack alerts quite unusable, you end up with way too many alerts

@raffis with this bug fixed, there will be no more Slack spam on kubectl apply errors, once the reconciliation status can be persisted in etcd, the controller will retry at the configured interval.

stefanprodan mentioned this issue Dec 14, 2020

Refactor apply error reporting #205

Merged

stefanprodan closed this as completed in #205 Dec 14, 2020

raffis mentioned this issue Feb 19, 2021

Notifications of errors and recoveries fluxcd/notification-controller#76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttle reconciliation in case of error #203

Throttle reconciliation in case of error #203

raffis commented Dec 14, 2020

stefanprodan commented Dec 14, 2020

raffis commented Dec 14, 2020

stefanprodan commented Dec 14, 2020 •

edited

Loading

raffis commented Dec 14, 2020

stefanprodan commented Dec 14, 2020

stefanprodan commented Dec 14, 2020 •

edited

Loading

Throttle reconciliation in case of error #203

Throttle reconciliation in case of error #203

Comments

raffis commented Dec 14, 2020

stefanprodan commented Dec 14, 2020

raffis commented Dec 14, 2020

stefanprodan commented Dec 14, 2020 • edited Loading

raffis commented Dec 14, 2020

stefanprodan commented Dec 14, 2020

stefanprodan commented Dec 14, 2020 • edited Loading

stefanprodan commented Dec 14, 2020 •

edited

Loading

stefanprodan commented Dec 14, 2020 •

edited

Loading