Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

Reaction cycle _sometimes_ just stops for no reason #284

Closed
nolar opened this issue Jan 3, 2020 · 1 comment · Fixed by #285
Closed

Reaction cycle _sometimes_ just stops for no reason #284

nolar opened this issue Jan 3, 2020 · 1 comment · Fixed by #285
Assignees
Labels
bug Something isn't working

Comments

@nolar
Copy link
Contributor

nolar commented Jan 3, 2020

Long story short

Kopf-based operator sometimes stops the reaction cycle and handler invocations in the middle of the process, and does nothing until a new change is applied or restarted.

Description

An investigation has shown that it is caused by datetime.datetime.utcnow() being called in multiple places during one handling cycle. As a result, the "now" timestamp is inconsistent, and different decisions are made. A difference of less than 1ms is sufficient to stop the reaction cycle this way — due to the code overhead.

The "now" timestamp is taken in 2 places:

  • Handler.awakened — to compare the delayed timestamp with "now", and decide should the handler sleep or run.
  • State.delay — to compare the soonest delayed timestamp with "now", and decide how long should we sleep.

Specifically, the operator neither selected the handlers to be executed (because it was "1ms too early" for the handler.awakened), nor did it sleep (because it was "1ms too late" in the state.delay property) — with the same value of delayed field, just the code being executed exactly during this awakening time.

Due to how zero-time delay is treated (i.e. same as None — no sleep at all), the dummy patch was not applied to trigger the next steps of the reaction cycle.

I was able to reproduce this with the following test:

The code snippet to reproduce the issue
import asyncio
import datetime

import freezegun

import kopf
from kopf.reactor.handling import resource_handler
from kopf.structs.containers import ResourceMemories


async def test_1ms_awakening(clear_default_registry, resource, mocker):

    # Simulate that the object is scheduled to be awakened between the watch-event and sleep.
    ts0 = datetime.datetime(2019, 12, 30, 10, 56, 43)
    tsA_triggered = "2019-12-30T10:56:42.999999"
    ts0_scheduled = "2019-12-30T10:56:43.000000"
    tsB_delivered = "2019-12-30T10:56:43.000001"

    # A dummy handler: it will not be selected for execution anyway, we just need to have it.
    @kopf.on.create(resource.group, resource.version, resource.plural, id='some-id')
    def handler_fn(**_):
        pass

    # Simulate the ticking of time, so that it goes beyond the scheduled awakening time.
    # Any hook point between handler selection and delay calculation is fine,
    # but State.store() also prevents other status-fields from being added and the patch populated.
    def move_to_tsB(*_, **__):
        frozen_dt.move_to(tsB_delivered)

    state_store = mocker.patch('kopf.reactor.states.State.store', side_effect=move_to_tsB)
    body = {'status': {'kopf': {'progress': {'some-id': {'delayed': ts0_scheduled}}}}}

    # Prevent the actual API calls and sleeps.
    patch_obj = mocker.patch('kopf.clients.patching.patch_obj')
    asyncio_sleep = mocker.patch('asyncio.sleep')

    # Simulate the call as if the event has just arrived on the watch-stream.
    # Another way (same effect): handle_resource_changing_cause() and its result.
    with freezegun.freeze_time(tsA_triggered) as frozen_dt:
        assert datetime.datetime.utcnow() < ts0  # extra precaution
        await resource_handler(
            lifecycle=kopf.lifecycles.all_at_once,
            registry=clear_default_registry,
            resource=resource,
            memories=ResourceMemories(),
            event={'type': 'ADDED', 'object': body},
            replenished=asyncio.Event(),
            event_queue=asyncio.Queue(),
        )
        assert datetime.datetime.utcnow() > ts0  # extra precaution

    assert state_store.called

    # Without "now"-time consistency, neither sleep() would be called, nor a patch applied.
    # Verify that the patch was actually applied, so that the reaction cycle continues.
    assert not asyncio_sleep.called
    assert patch_obj.called  # <<< MUST BE CALLED, BUT IS NOT CALLED
The exact command to reproduce the issue
kopf run handlers.py --verbose

Environment

  • Kopf version: 0.24
  • Kubernetes version:
  • Python version: 3.7
  • OS/platform: Mac OS
@nolar nolar added the bug Something isn't working label Jan 3, 2020
@nolar nolar self-assigned this Jan 3, 2020
@nolar nolar closed this as completed in #285 Jan 8, 2020
@nolar
Copy link
Contributor Author

nolar commented Jan 29, 2020

Fixed in kopf==0.25.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
1 participant