You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.
Kopf-based operator sometimes stops the reaction cycle and handler invocations in the middle of the process, and does nothing until a new change is applied or restarted.
Description
An investigation has shown that it is caused by datetime.datetime.utcnow() being called in multiple places during one handling cycle. As a result, the "now" timestamp is inconsistent, and different decisions are made. A difference of less than 1ms is sufficient to stop the reaction cycle this way — due to the code overhead.
The "now" timestamp is taken in 2 places:
Handler.awakened — to compare the delayed timestamp with "now", and decide should the handler sleep or run.
State.delay — to compare the soonest delayed timestamp with "now", and decide how long should we sleep.
Specifically, the operator neither selected the handlers to be executed (because it was "1ms too early" for the handler.awakened), nor did it sleep (because it was "1ms too late" in the state.delay property) — with the same value of delayed field, just the code being executed exactly during this awakening time.
Due to how zero-time delay is treated (i.e. same as None — no sleep at all), the dummy patch was not applied to trigger the next steps of the reaction cycle.
I was able to reproduce this with the following test:
The code snippet to reproduce the issue
importasyncioimportdatetimeimportfreezegunimportkopffromkopf.reactor.handlingimportresource_handlerfromkopf.structs.containersimportResourceMemoriesasyncdeftest_1ms_awakening(clear_default_registry, resource, mocker):
# Simulate that the object is scheduled to be awakened between the watch-event and sleep.ts0=datetime.datetime(2019, 12, 30, 10, 56, 43)
tsA_triggered="2019-12-30T10:56:42.999999"ts0_scheduled="2019-12-30T10:56:43.000000"tsB_delivered="2019-12-30T10:56:43.000001"# A dummy handler: it will not be selected for execution anyway, we just need to have it.@kopf.on.create(resource.group, resource.version, resource.plural, id='some-id')defhandler_fn(**_):
pass# Simulate the ticking of time, so that it goes beyond the scheduled awakening time.# Any hook point between handler selection and delay calculation is fine,# but State.store() also prevents other status-fields from being added and the patch populated.defmove_to_tsB(*_, **__):
frozen_dt.move_to(tsB_delivered)
state_store=mocker.patch('kopf.reactor.states.State.store', side_effect=move_to_tsB)
body= {'status': {'kopf': {'progress': {'some-id': {'delayed': ts0_scheduled}}}}}
# Prevent the actual API calls and sleeps.patch_obj=mocker.patch('kopf.clients.patching.patch_obj')
asyncio_sleep=mocker.patch('asyncio.sleep')
# Simulate the call as if the event has just arrived on the watch-stream.# Another way (same effect): handle_resource_changing_cause() and its result.withfreezegun.freeze_time(tsA_triggered) asfrozen_dt:
assertdatetime.datetime.utcnow() <ts0# extra precautionawaitresource_handler(
lifecycle=kopf.lifecycles.all_at_once,
registry=clear_default_registry,
resource=resource,
memories=ResourceMemories(),
event={'type': 'ADDED', 'object': body},
replenished=asyncio.Event(),
event_queue=asyncio.Queue(),
)
assertdatetime.datetime.utcnow() >ts0# extra precautionassertstate_store.called# Without "now"-time consistency, neither sleep() would be called, nor a patch applied.# Verify that the patch was actually applied, so that the reaction cycle continues.assertnotasyncio_sleep.calledassertpatch_obj.called# <<< MUST BE CALLED, BUT IS NOT CALLED
The exact command to reproduce the issue
kopf run handlers.py --verbose
Environment
Kopf version: 0.24
Kubernetes version:
Python version: 3.7
OS/platform: Mac OS
The text was updated successfully, but these errors were encountered:
Long story short
Kopf-based operator sometimes stops the reaction cycle and handler invocations in the middle of the process, and does nothing until a new change is applied or restarted.
Description
An investigation has shown that it is caused by
datetime.datetime.utcnow()
being called in multiple places during one handling cycle. As a result, the "now" timestamp is inconsistent, and different decisions are made. A difference of less than 1ms is sufficient to stop the reaction cycle this way — due to the code overhead.The "now" timestamp is taken in 2 places:
Handler.awakened
— to compare thedelayed
timestamp with "now", and decide should the handler sleep or run.State.delay
— to compare the soonestdelayed
timestamp with "now", and decide how long should we sleep.Specifically, the operator neither selected the handlers to be executed (because it was "1ms too early" for the
handler.awakened
), nor did it sleep (because it was "1ms too late" in thestate.delay
property) — with the same value ofdelayed
field, just the code being executed exactly during this awakening time.Due to how zero-time delay is treated (i.e. same as
None
— no sleep at all), the dummy patch was not applied to trigger the next steps of the reaction cycle.I was able to reproduce this with the following test:
The code snippet to reproduce the issue
The exact command to reproduce the issue
Environment
The text was updated successfully, but these errors were encountered: