Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reaction cycle _sometimes_ just stops for no reason #284

Closed
kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Closed

Reaction cycle _sometimes_ just stops for no reason #284

kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Labels
archive bug Something isn't working

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

An issue by nolar at 2020-01-03 13:00:05+00:00
Original URL: zalando-incubator/kopf#284
 

Long story short

Kopf-based operator sometimes stops the reaction cycle and handler invocations in the middle of the process, and does nothing until a new change is applied or restarted.

Description

An investigation has shown that it is caused by datetime.datetime.utcnow() being called in multiple places during one handling cycle. As a result, the "now" timestamp is inconsistent, and different decisions are made. A difference of less than 1ms is sufficient to stop the reaction cycle this way — due to the code overhead.

The "now" timestamp is taken in 2 places:

  • Handler.awakened — to compare the delayed timestamp with "now", and decide should the handler sleep or run.
  • State.delay — to compare the soonest delayed timestamp with "now", and decide how long should we sleep.

Specifically, the operator neither selected the handlers to be executed (because it was "1ms too early" for the handler.awakened), nor did it sleep (because it was "1ms too late" in the state.delay property) — with the same value of delayed field, just the code being executed exactly during this awakening time.

Due to how zero-time delay is treated (i.e. same as None — no sleep at all), the dummy patch was not applied to trigger the next steps of the reaction cycle.

I was able to reproduce this with the following test:

The code snippet to reproduce the issue
import asyncio
import datetime

import freezegun

import kopf
from kopf.reactor.handling import resource_handler
from kopf.structs.containers import ResourceMemories


async def test_1ms_awakening(clear_default_registry, resource, mocker):

    # Simulate that the object is scheduled to be awakened between the watch-event and sleep.
    ts0 = datetime.datetime(2019, 12, 30, 10, 56, 43)
    tsA_triggered = "2019-12-30T10:56:42.999999"
    ts0_scheduled = "2019-12-30T10:56:43.000000"
    tsB_delivered = "2019-12-30T10:56:43.000001"

    # A dummy handler: it will not be selected for execution anyway, we just need to have it.
    @kopf.on.create(resource.group, resource.version, resource.plural, id='some-id')
    def handler_fn(**_):
        pass

    # Simulate the ticking of time, so that it goes beyond the scheduled awakening time.
    # Any hook point between handler selection and delay calculation is fine,
    # but State.store() also prevents other status-fields from being added and the patch populated.
    def move_to_tsB(*_, **__):
        frozen_dt.move_to(tsB_delivered)

    state_store = mocker.patch('kopf.reactor.states.State.store', side_effect=move_to_tsB)
    body = {'status': {'kopf': {'progress': {'some-id': {'delayed': ts0_scheduled}}}}}

    # Prevent the actual API calls and sleeps.
    patch_obj = mocker.patch('kopf.clients.patching.patch_obj')
    asyncio_sleep = mocker.patch('asyncio.sleep')

    # Simulate the call as if the event has just arrived on the watch-stream.
    # Another way (same effect): handle_resource_changing_cause() and its result.
    with freezegun.freeze_time(tsA_triggered) as frozen_dt:
        assert datetime.datetime.utcnow() < ts0  # extra precaution
        await resource_handler(
            lifecycle=kopf.lifecycles.all_at_once,
            registry=clear_default_registry,
            resource=resource,
            memories=ResourceMemories(),
            event={'type': 'ADDED', 'object': body},
            replenished=asyncio.Event(),
            event_queue=asyncio.Queue(),
        )
        assert datetime.datetime.utcnow() > ts0  # extra precaution

    assert state_store.called

    # Without "now"-time consistency, neither sleep() would be called, nor a patch applied.
    # Verify that the patch was actually applied, so that the reaction cycle continues.
    assert not asyncio_sleep.called
    assert patch_obj.called  # <<< MUST BE CALLED, BUT IS NOT CALLED
The exact command to reproduce the issue
kopf run handlers.py --verbose

Environment

  • Kopf version: 0.24
  • Kubernetes version:
  • Python version: 3.7
  • OS/platform: Mac OS

Commented by nolar at 2020-01-29 13:56:53+00:00
 

Fixed in kopf==0.25.

@kopf-archiver kopf-archiver bot closed this as completed Aug 18, 2020
@kopf-archiver kopf-archiver bot changed the title [archival placeholder] Reaction cycle _sometimes_ just stops for no reason Aug 19, 2020
@kopf-archiver kopf-archiver bot added the bug Something isn't working label Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive bug Something isn't working
Projects
None yet
Development

No branches or pull requests

0 participants