Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TemporaryError, but handler is not being called again. #358

Closed
kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Closed

Using TemporaryError, but handler is not being called again. #358

kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Labels
archive bug Something isn't working

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

An issue by GrahamDumpleton at 2020-05-05 08:30:40+00:00
Original URL: zalando-incubator/kopf#358
 

Long story short

When raising TemporaryError, the custom resource create handler is not being called after delay to try again.

Description

Logs show:

│ eduk8s-operator [2020-05-05 08:12:57,051] kopf.objects         [DEBUG   ] [None/eduk8s-training-portal] Invoking handler 'eduk8s'.                  │
│ eduk8s-operator [2020-05-05 08:12:57,058] kopf.objects         [ERROR   ] [None/eduk8s-training-portal] Handler 'eduk8s' failed temporarily: Worksh │
│ op lab-k8s-fundamentals is not available.                                                                                                           │
│ eduk8s-operator [2020-05-05 08:12:57,058] kopf.objects         [DEBUG   ] [None/eduk8s-training-portal] Patching with: {'status': {'kopf': {'progre │
│ ss': {'eduk8s': {'started': '2020-05-05T08:12:57.050903', 'stopped': None, 'delayed': '2020-05-05T08:13:27.058567', 'retries': 1, 'success': False, │
│  'failure': False, 'message': 'Workshop lab-k8s-fundamentals is not available.'}}}}}                                                                │
│ eduk8s-operator [2020-05-05 08:12:57,065] kopf.objects         [DEBUG   ] [None/eduk8s-training-portal] Sleeping was skipped because of the patch,  │
│ 29.999904 seconds left.

but it is never retried.

What is odd is that it works fine on Minikube, and previously worked fine on a separate production grade cluster, but now the production grade cluster no longer works. Also works when run kopf on macOS connected to Minikube in place of operator running in cluster.

Going to keep trying to debug it, but suggestions on how to do that would be welcome.

Environment

  • Kopf version:

kopf==0.26
kubernetes==11.0.0

  • Kubernetes version:

1.17

  • Python version:

3.7.7 (in container)
3.7.3 (on macOS)

  • OS/platform:

Using official python3.7 docker image from Docker Hub. Presumably Debian or Ubuntu based.

Python packages installed
aiohttp==3.6.2
aiojobs==0.2.2
async-timeout==3.0.1
attrs==19.3.0
bcrypt==3.1.7
cachetools==4.1.0
certifi==2020.4.5.1
cffi==1.14.0
chardet==3.0.4
click==7.1.2
google-auth==1.14.1
idna==2.9
iso8601==0.1.12
kopf==0.26
kubernetes==11.0.0
multidict==4.7.5
oauthlib==3.1.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
pykube-ng==20.4.1
python-dateutil==2.8.1
PyYAML==5.3.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
six==1.14.0
typing-extensions==3.7.4.2
urllib3==1.25.9
websocket-client==0.57.0
wrapt==1.12.1
yarl==1.4.2

Commented by GrahamDumpleton at 2020-05-05 09:21:55+00:00
 

So this is going to be the problem when using v1 of CRDs, of not allowing for kopf object in the status as examples in #321 show. IOW,

            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true

or:

            status:
              type: object
              properties:
                result:
                  type: string
                  description: The result from the handler
                  enum:
                  - created
                  - updated
                # Other structured subresource schema data...
                kopf:
                  type: object
                  x-kubernetes-preserve-unknown-fields: true

I can't see that the need for using x-kubernetes-preserve-unknown-fields: true is mentioned in the documentation, so perhaps something at least needs to be added in there in a section related specifically to v1 CRD. This of course assumes I am not blind and couldn't find it.

Was only getting this issue on the one cluster as the person who installed it there used the v1 versions of the CRDs and not the v1beta1 versions.


Commented by nolar at 2020-05-12 13:00:16+00:00
 

GrahamDumpleton Thanks for reporting this.

Indeed, we also had few little issues on our side with these "structural schemas" of Kubernetes 1.16+ and v1beta1->v1 transition.

I've now added few special notes to the docs (#364).

The whole problem is also addressed in #331 with switching from status to annotations by default (docs). This should keep Kopf running out of the box for all CRs & builtins.

There is also a PR #339 to warn if the resulting object does match the patching intentions, but it is slightly more complicated to implement — we'll try to add it to Kopf 0.28 (with a general topic: stability & recoverability) a bit later. Otherwise, Kubernetes silently ignores the patches and returns HTTP 200 OK, despite the patch is not applied — which cause this and similar issues in Kopf.


Commented by GrahamDumpleton at 2020-05-13 02:00:53+00:00
 

I got another thing for you to think about (not creating separate issue for it as yet), although it goes counter a little bit to the idea of using annotations to track operator state. But then am not seeing this as something you rely on, but informational only.

I would like to see one kopf status field that tracks the overall processing state, but persists and doesn't go away when the processing of the custom resource is complete, which is usually when kopf state is removed. Whether anyone relies on this would be optional.

The issue is that when going 'kubectl get crdname', it is hard to have a 'STATUS' field as printer column which reflects overall processing state.

I would like to be able to say:

  additionalPrinterColumns:
  - name: Status
    type: string
    priority: 0
    description: Overall processing state of custom resource.
    JSONPath: .status.kopf.processingState

When CR just created and nothing done, would show empty value in 'kubectl get' output.

Once kopf/operator starts processing it, could change value based on what is happening, eg., processing, retrying, failed, error.

Importantly, when the operator has successfully finished processing and all is good, the 'status.kopf.processingState' wouldn't be removed. Instead, would be updated with value of success or complete.

This would make it much easier when using 'kubectl get' to see what is going on without delving into the YAML or using 'describe'.

The only other option is one has to have two state columns, one for kopf side which tracks something from its state (although that will stop working now if you use annotations) and another displaying something from the users status output by the operator.

Hopefully you understand what suggesting. If you want to compare it to something, imagine the 'STATUS' field of a pod.


Commented by nolar at 2020-05-13 08:12:17+00:00
 

First, to note: now, both annotations AND status are used. And in general, I would prefer status stanza, because it is a status of resource. Annotations are there only because of "structural schemas" in K8s 1.16+ by default and silent loss of payload in PATCHes of status without x-kubernetes-preserve-unknown-fields: true, see #364) — for which I have no good solution yet except as persisting in annotations.

Second: This is actually a good feature request! I.e. making the handling progress/state exposable via "printer columns", both individually per-handler and aggregated. Which includes persisting the progress after the handling is done; but also implies few more columns for better presentation.

One way, which works right now, would be to return things from handlers, which are persisted in the status under the handlers' names without cleanups (see: results delivery). But there is no aggregated result of the whole group of handlers.

This is what we do now in our apps: report from children pods and other children resource to status.children.{podname}. And in @kopf.on.field(..., field='status.children'), we aggregate and summarise and calculate the overall status as status.message. Which works, but adds a lot of boilerplate code. It would be nicer if that is supported by the framework (for children, there is #58 (comment); but for just handlers in general, there is no issue yet).

I never thought of reporting the handlers' statuses or the overall processing state in the printed columns.

I'll definitely think on this now!

@kopf-archiver kopf-archiver bot closed this as completed Aug 18, 2020
@kopf-archiver kopf-archiver bot changed the title [archival placeholder] Using TemporaryError, but handler is not being called again. Aug 19, 2020
@kopf-archiver kopf-archiver bot added the bug Something isn't working label Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive bug Something isn't working
Projects
None yet
Development

No branches or pull requests

0 participants