You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When defining a very special edge case configuration having max_batch_size=max_entries, the queue can fail with an assertion error when removing the frontmost element. This happens especially when the callback repeatedly fails (eg. an unavailable backend system receiving data).
What happens:
we add max_batch_size elements, all of which "post" resources
the batch queue consumes all of those resources in process_once by wait()ing for them, but gets stuck processing/sending the batch
as process_once is stuck until max_retry_time passed, the function does not run delete_frontmost_entry() and thus actually moves the front reference
when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left:
When defining a very special edge case configuration having
max_batch_size=max_entries, the queue can fail with an assertion error when
removing the frontmost element. This happens especially when the
callback repeatedly fails (eg. an unavailable backend system receiving
data).
What happens:
1. we add max_batch_size elements, all of which "post" resources
2. the batch queue consumes all of those resources in `process_once` by `wait()`ing for them, but gets stuck processing/sending the batch
3. as `process_once` is stuck until `max_retry_time` passed, the function does not run `delete_frontmost_entry()` and thus actually moves the `front` reference
4. when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left
This commit fixes#11377 by removing currently processed elements out
of the race condition window.
Is there an existing issue for this?
Kong version (
$ kong version
)Kong 3.3.1
Current Behavior
When defining a very special edge case configuration having
max_batch_size=max_entries
, the queue can fail with an assertion error when removing the frontmost element. This happens especially when the callback repeatedly fails (eg. an unavailable backend system receiving data).What happens:
we add max_batch_size elements, all of which "post" resources
the batch queue consumes all of those resources in
process_once
bywait()
ing for them, but gets stuck processing/sending the batchas
process_once
is stuck untilmax_retry_time
passed, the function does not rundelete_frontmost_entry()
and thus actually moves thefront
referencewhen enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left:
Potential thoughts for fixing the issue:
max_batch_size<max_entries
, which is not really a reasonable configuration anywaymax_entries
by 1 when both configuration values are the same (our current workaround)Kudos to @27ascii for discovering the edge case configuration.
Expected Behavior
Configuration either not allowed or assertion does not fail worker
Steps To Reproduce
Pull request containing unit test reproducing the issue: #11378
Anything else?
No response
Jens Erat <jens.erat@mercedes-benz.com>, Mercedes-Benz Tech Innovation GmbH, imprint
The text was updated successfully, but these errors were encountered: