Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR] Filter by arbitrary callback function #258

Closed
2 tasks
kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Closed
2 tasks

[PR] Filter by arbitrary callback function #258

kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Labels
archive enhancement New feature or request

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

A pull request by Jc2k at 2019-11-27 21:30:34+00:00
Original URL: zalando-incubator/kopf#258
Merged by nolar at 2020-01-17 16:32:02+00:00

Filter by arbitrary callback function

Issue : #98

Description

In some cases, it might be needed that only a subset of resources are handled. An example might be implementing the cert-manager external issuer process where cert-manager creates a CertificateRequest CRD and waits for it to be acted on and marked as complete, and each issuer should filter on spec.issuerRef to find the requests it should service.

This PR enables that use case by allowing you to specify a when callback that is run before invoking a handler to filter out handlers that are not appropriate.

Types of Changes

What types of changes does your code introduce? Keep the ones that apply:

  • New feature (non-breaking change which adds functionality)

Review

List of tasks the reviewer must do to review the PR

  • Tests
  • Documentation

Commented by Jc2k at 2019-11-27 22:35:55+00:00
 

I couldn't use reactor.invocation.invoke as suggested in #98 because it's async and match is not. This would mean making things like iter_cause_handlers async and felt like a bridge too far for my first PR. I don't think we should be doing I/O during the filter step anyway?

So I can see 2 ways forward:

  • The current approach in my PR - to only base body to the handler
  • Make a helper in reactor.invocation that does the args + kwargs setup and use it for the existing invoke method. Add an invoke_blocking and use that with my when function.

What do you think?


Commented by nolar at 2019-11-28 11:44:57+00:00
 

True, it is async. The main purpose of this is for the async handlers, which are executed directly in the stack of the operator, without thread executors. Both async and sync-threaded approaches are needed for remote I/O operation.

You are right: Unlike the handlers, the callbacks should not contain any I/O, so there is no reason for them to be async or thread-pool'ed.

I suggest to go the 2nd way: extract the args/kwargs generation from the current invoke() as a regular sync function, with invoke() just calling it and then doing whatever it does now for the function invocation. And then, you can call that callback with the kwargs generation function and a direct call — there is no need to use invoke() then (assuming it is always synchronous and non-blocking, as the callbacks should be).

This solution seems clean enough, and can be reused in some other places with sync-only callbacks. And it provides the same conventions on the kwargs, as any other invocables/callables in the framework (incl. the per-object logger kwarg & etc).


Commented by Jc2k at 2019-12-10 17:37:50+00:00
 

I've hit a couple of snags while working on this.

The first makes me feel like we are violating our layering somewhat:

  • The invoke function renders kwargs based on a cause object
  • To get a cause we need to set requires_finalizer
  • To set requires_finalizer we need to call match() to see if any DELETE handlers apply to this resource
  • To call match we want to apply the new when filters
  • We want the new when filters to get the same kwargs as the handler would
  • That means it needs a cause object.

One option is to generate a dummy cause where requires_finalizer is False and when we have called our match functions make the real one... but that seems like a hack.

I'm leaning towards making some of the kwargs available but just the ones i can get from the body alone. In particular for requires_finalizer its weird to make whether or not there is a finalizer dependent on the cause.reason.


Commented by nolar at 2019-12-11 13:10:41+00:00
 

Quite an interesting and non-trivial case of the chicken & the egg problem.

Let me write down few thoughts on how it can be solved (though, I didn't check that in action — there can be some other problems in each option):


Option A:

To break the circle, I suggest that we think of some use-cases. Which information would be needed to these callbacks? Body/spec/meta/status — certainly. Logger — most likely. Memo — maybe. diff/old/new -- highly unlikely: in the context of the deletion, it is alwayys a diff from whatever to None.

Perhaps, we can always create and use a ResourceWatchingCause object, and also pass it to the callback invocation, while passing ResourceChangingCause for the handlers. Con: it requires passing two causes to the methods. It can be worked around by converting ResourceChangingCause to contain a ResourceWatchingCause rather than being a sibling class. Con: the structure of causes as contextual containers becomes complicated (hierarchical composition instead of a flat field list).

Or we can use the base class ResourceCause directly:

async def resource_handler(...):
    ...

    resource_base_cause = causation.ResourceCause(
        logger=logger,
        resource=resource,
        patch=patch,
        body=event['object'],
        memo=memory.user_data,
    )

    if registry.has_resource_watching_handlers(resource=resource):
        resource_watching_cause = causation.detect_resource_watching_cause(
            ...
        )
        await handle_resource_watching_cause(cause=resource_watching_cause, ...)

    if registry.has_resource_changing_handlers(resource=resource):
        resource_changing_cause = causation.detect_resource_changing_cause(
            ...,
            requires_finalizer=registry.requires_finalizer(cause=resource_base_cause),  # <<<<< HERE
        )
        delay = await handle_resource_changing_cause(cause=resource_changing_cause, ...)

As a follow-up, the watching/changing cause detection can use the base cause as a source of fields instead of using all fields explicitly, but this can be left for future refactorings.

In the future, if the base class is declared in the involved function signatures, we will be able to replace the objects with the actual watching/changing clauses without breaking the typing system — in case we find some other way to solve this problem.

This will allow the deletion handlers to be filtered using the existing invoke() function, though will limit the kwargs available: there will be no old/new/diff or initial or raw event — but they are probably useless for filtering purposes anyway.


Option B:

The semantically weak link in this chain of dependencies seems to be the requires_finalizer flag. It is needed to detect whether the finalizer should be added on the object's first appearance (ACQUIRE/RELEASE pseudo-reasons), or should we go directly to the creation/deletion handlers (CREATE/UPDATE/DELETE real reasons). And this depends on the existence of the matching mandatory deletion handlers. As such, it should not affect the filtering callback, in theory.

Can we redesign the solution so that the handler selection happens that way?

Can we detect a cause without passing requires_finalizer to detect_resource_changing_cause()? We can, but it comes at a cost: the following two conditions should be moved from causation.detect_resource_changing_cause() to handling.handle_resource_changing_cause():

    if requires_finalizer and not finalizers.has_finalizers(body):
        return ResourceChangingCause(reason=Reason.ACQUIRE, **kwargs)
    if not requires_finalizer and finalizers.has_finalizers(body):
        return ResourceChangingCause(reason=Reason.RELEASE, **kwargs)

Not only their execution should be there (as it is now already), but also their detection, which should happen strictly before actually detected cause's reasons are executed (i.e. before if cause.reason in causation.HANDLER_REASONS).

Con: we will "detect" the resource-changing causes, but will not use them in the ACQUIRE/RELEASE situations. But this does not produce any significant extra CPU overhead — happens once or twice per resource lifetime.

So, it will be something like that.

As is:

async def handle_resource_changing_cause(...):
    if cause.reason in causation.HANDLER_REASONS:
        ...

    if cause.reason == causation.Reason.ACQUIRE:
        logger.debug("Adding the finalizer, thus preventing the actual deletion.")
        finalizers.append_finalizers(body=body, patch=patch)

    if cause.reason == causation.Reason.RELEASE:
        logger.debug("Removing the finalizer, as there are no handlers requiring it.")
        finalizers.remove_finalizers(body=body, patch=patch)

To be:

async def handle_resource_changing_cause(...):

    requires_finalizer = registry.requires_finalizer(resource=cause.resource, body=cause.body)
    has_finalizer = finalizers.has_finalizers(body=cause.body)

    if requires_finalizer and not has_finalizer:
        logger.debug("Adding the finalizer, thus preventing the actual deletion.")
        finalizers.append_finalizers(body=body, patch=patch)
        return

    if not requires_finalizer and has_finalizer:
        logger.debug("Removing the finalizer, as there are no handlers requiring it.")
        finalizers.remove_finalizers(body=body, patch=patch)
        return

    if cause.reason in causation.HANDLER_REASONS:
        ...

In that case, the cause detection will not require registry.requires_finalizer(), and so the circular dependency will be broken, and so the callbacks will not be called during the cause detection, and so the solution becomes clean.

Do I miss any complications in that approach?


Both of the solutions seem relatively clean to me (from the code's semantics point of view).

I'm not sure which one is better. Are there any future hypothetical use-cases that we can consider to make one option more preferred than the other? E.g. different finalizers for different operators on the same resource kind (as in your case), so that they do not collide with each other?


Commented by Jc2k at 2020-01-06 12:50:25+00:00
 

I implemented option B here. I will rebase on top of this branch and see if unblocks me.


Commented by Jc2k at 2020-01-06 14:00:26+00:00
 

Ok, i have rebased this on top of #288 and all the tests are passing. Sorry it took me so long!


Commented by Jc2k at 2020-01-13 11:51:38+00:00
 

Saw that you merged #288, so have rebased to try and make it a bit easier to review.


Commented by Jc2k at 2020-01-13 12:04:53+00:00
 

nolar I think this is ready for review again now.


Commented by nolar at 2020-01-15 15:47:38+00:00
 

Awesome! I've played with this filter in different edge cases. Works fine!

def flt(spec, **_):
    return spec.get('field') == 'valueZ'

@kopf.on.update('zalando.org', 'v1', 'kopfexamples', when=flt)
def update_it(logger, **kwargs):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples', when=flt)
def delete_it(logger, **kwargs):
    pass

With the deletion handlers, there is a fancy behaviour with addition/removal of a finalizer: if the filter function uses the spec/status fields of the object itself, and the object is edited/patched to satisfy the criterion, the finalizer is added; when edited/patched to miss it, the finalizer is removed — before executing the relevant handlers.

This can lead to some fancy effects if the filter functions sporadically fluctuate for the same object based on some external circumstances (e.g. random.randint(0, 1)), and there are no other mandatory deletion handlers, which would make the finalizer persistent.

But let it be a problem of developers who design the app this way.

This behaviour is fully correct (logically).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

0 participants