Using TemporaryError, but handler is not being called again. #358

kopf-archiver · 2020-08-18T20:04:35Z

An issue by GrahamDumpleton at 2020-05-05 08:30:40+00:00
Original URL: zalando-incubator/kopf#358

Long story short

When raising TemporaryError, the custom resource create handler is not being called after delay to try again.

Description

Logs show:

│ eduk8s-operator [2020-05-05 08:12:57,051] kopf.objects         [DEBUG   ] [None/eduk8s-training-portal] Invoking handler 'eduk8s'.                  │
│ eduk8s-operator [2020-05-05 08:12:57,058] kopf.objects         [ERROR   ] [None/eduk8s-training-portal] Handler 'eduk8s' failed temporarily: Worksh │
│ op lab-k8s-fundamentals is not available.                                                                                                           │
│ eduk8s-operator [2020-05-05 08:12:57,058] kopf.objects         [DEBUG   ] [None/eduk8s-training-portal] Patching with: {'status': {'kopf': {'progre │
│ ss': {'eduk8s': {'started': '2020-05-05T08:12:57.050903', 'stopped': None, 'delayed': '2020-05-05T08:13:27.058567', 'retries': 1, 'success': False, │
│  'failure': False, 'message': 'Workshop lab-k8s-fundamentals is not available.'}}}}}                                                                │
│ eduk8s-operator [2020-05-05 08:12:57,065] kopf.objects         [DEBUG   ] [None/eduk8s-training-portal] Sleeping was skipped because of the patch,  │
│ 29.999904 seconds left.

but it is never retried.

What is odd is that it works fine on Minikube, and previously worked fine on a separate production grade cluster, but now the production grade cluster no longer works. Also works when run kopf on macOS connected to Minikube in place of operator running in cluster.

Going to keep trying to debug it, but suggestions on how to do that would be welcome.

Environment

Kopf version:

kopf==0.26
kubernetes==11.0.0

Kubernetes version:

1.17

Python version:

3.7.7 (in container)
3.7.3 (on macOS)

OS/platform:

Using official python3.7 docker image from Docker Hub. Presumably Debian or Ubuntu based.

Python packages installed

aiohttp==3.6.2
aiojobs==0.2.2
async-timeout==3.0.1
attrs==19.3.0
bcrypt==3.1.7
cachetools==4.1.0
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
click==7.1.2
google-auth==1.14.1
idna==2.9
iso8601==0.1.12
kopf==0.26
kubernetes==11.0.0
multidict==4.7.5
oauthlib==3.1.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pykube-ng==20.4.1
python-dateutil==2.8.1
PyYAML==5.3.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
six==1.14.0
typing-extensions==3.7.4.2
urllib3==1.25.9
websocket-client==0.57.0
wrapt==1.12.1
yarl==1.4.2

Commented by GrahamDumpleton at 2020-05-05 09:21:55+00:00

So this is going to be the problem when using v1 of CRDs, of not allowing for kopf object in the status as examples in #321 show. IOW,

            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true

or:

            status:
              type: object
              properties:
                result:
                  type: string
                  description: The result from the handler
                  enum:
                  - created
                  - updated
                # Other structured subresource schema data...
                kopf:
                  type: object
                  x-kubernetes-preserve-unknown-fields: true

I can't see that the need for using x-kubernetes-preserve-unknown-fields: true is mentioned in the documentation, so perhaps something at least needs to be added in there in a section related specifically to v1 CRD. This of course assumes I am not blind and couldn't find it.

Was only getting this issue on the one cluster as the person who installed it there used the v1 versions of the CRDs and not the v1beta1 versions.

Commented by nolar at 2020-05-12 13:00:16+00:00

GrahamDumpleton Thanks for reporting this.

Indeed, we also had few little issues on our side with these "structural schemas" of Kubernetes 1.16+ and v1beta1->v1 transition.

I've now added few special notes to the docs (#364).

The whole problem is also addressed in #331 with switching from status to annotations by default (docs). This should keep Kopf running out of the box for all CRs & builtins.

There is also a PR #339 to warn if the resulting object does match the patching intentions, but it is slightly more complicated to implement — we'll try to add it to Kopf 0.28 (with a general topic: stability & recoverability) a bit later. Otherwise, Kubernetes silently ignores the patches and returns HTTP 200 OK, despite the patch is not applied — which cause this and similar issues in Kopf.

Commented by GrahamDumpleton at 2020-05-13 02:00:53+00:00

I got another thing for you to think about (not creating separate issue for it as yet), although it goes counter a little bit to the idea of using annotations to track operator state. But then am not seeing this as something you rely on, but informational only.

I would like to see one kopf status field that tracks the overall processing state, but persists and doesn't go away when the processing of the custom resource is complete, which is usually when kopf state is removed. Whether anyone relies on this would be optional.

The issue is that when going 'kubectl get crdname', it is hard to have a 'STATUS' field as printer column which reflects overall processing state.

I would like to be able to say:

  additionalPrinterColumns:
  - name: Status
    type: string
    priority: 0
    description: Overall processing state of custom resource.
    JSONPath: .status.kopf.processingState

When CR just created and nothing done, would show empty value in 'kubectl get' output.

Once kopf/operator starts processing it, could change value based on what is happening, eg., processing, retrying, failed, error.

Importantly, when the operator has successfully finished processing and all is good, the 'status.kopf.processingState' wouldn't be removed. Instead, would be updated with value of success or complete.

This would make it much easier when using 'kubectl get' to see what is going on without delving into the YAML or using 'describe'.

The only other option is one has to have two state columns, one for kopf side which tracks something from its state (although that will stop working now if you use annotations) and another displaying something from the users status output by the operator.

Hopefully you understand what suggesting. If you want to compare it to something, imagine the 'STATUS' field of a pod.

Commented by nolar at 2020-05-13 08:12:17+00:00

First, to note: now, both annotations AND status are used. And in general, I would prefer status stanza, because it is a status of resource. Annotations are there only because of "structural schemas" in K8s 1.16+ by default and silent loss of payload in PATCHes of status without x-kubernetes-preserve-unknown-fields: true, see #364) — for which I have no good solution yet except as persisting in annotations.

Second: This is actually a good feature request! I.e. making the handling progress/state exposable via "printer columns", both individually per-handler and aggregated. Which includes persisting the progress after the handling is done; but also implies few more columns for better presentation.

One way, which works right now, would be to return things from handlers, which are persisted in the status under the handlers' names without cleanups (see: results delivery). But there is no aggregated result of the whole group of handlers.

This is what we do now in our apps: report from children pods and other children resource to status.children.{podname}. And in @kopf.on.field(..., field='status.children'), we aggregate and summarise and calculate the overall status as status.message. Which works, but adds a lot of boilerplate code. It would be nicer if that is supported by the framework (for children, there is #58 (comment); but for just handlers in general, there is no issue yet).

I never thought of reporting the handlers' statuses or the overall processing state in the printed columns.

I'll definitely think on this now!

The text was updated successfully, but these errors were encountered:

kopf-archiver bot added the archive label Aug 18, 2020

kopf-archiver bot closed this as completed Aug 18, 2020

kopf-archiver bot changed the title ~~[archival placeholder]~~ Using TemporaryError, but handler is not being called again. Aug 19, 2020

kopf-archiver bot added the bug Something isn't working label Aug 19, 2020

kopf-archiver bot mentioned this issue Aug 19, 2020

[PR] Document the preservation of Kopf's persistence state in "v1" CRDs #364

Closed

1 task

nolar mentioned this issue Aug 31, 2020

Warn on inconsistencies of a patch and a patched object #527

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using TemporaryError, but handler is not being called again. #358

Using TemporaryError, but handler is not being called again. #358

kopf-archiver bot commented Aug 18, 2020 •

edited

Loading

Using TemporaryError, but handler is not being called again. #358

Using TemporaryError, but handler is not being called again. #358

Comments

kopf-archiver bot commented Aug 18, 2020 • edited Loading

Long story short

Description

Environment

kopf-archiver bot commented Aug 18, 2020 •

edited

Loading