New queue fails in edge case max_batch_size=max_entries with assertion error #11377

JensErat · 2023-08-09T13:41:57Z

Is there an existing issue for this?

I have searched the existing issues

Kong version (`$ kong version`)

Kong 3.3.1

Current Behavior

When defining a very special edge case configuration having max_batch_size=max_entries, the queue can fail with an assertion error when removing the frontmost element. This happens especially when the callback repeatedly fails (eg. an unavailable backend system receiving data).

What happens:

we add max_batch_size elements, all of which "post" resources
the batch queue consumes all of those resources in process_once by wait()ing for them, but gets stuck processing/sending the batch
as process_once is stuck until max_retry_time passed, the function does not run delete_frontmost_entry() and thus actually moves the front reference
when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left:
```
 /kong-plugin/kong/tools/queue.lua:215: assertion failed!
```

Potential thoughts for fixing the issue:

enforce max_batch_size<max_entries, which is not really a reasonable configuration anyway
increase max_entries by 1 when both configuration values are the same (our current workaround)
further refactor the queue to remove the race condition

Kudos to @27ascii for discovering the edge case configuration.

Expected Behavior

Configuration either not allowed or assertion does not fail worker

Steps To Reproduce

Pull request containing unit test reproducing the issue: #11378

Anything else?

No response

^{Jens Erat <jens.erat@mercedes-benz.com>, Mercedes-Benz Tech Innovation GmbH, imprint}

The text was updated successfully, but these errors were encountered:

When defining a very special edge case configuration having max_batch_size=max_entries, the queue can fail with an assertion error when removing the frontmost element. This happens especially when the callback repeatedly fails (eg. an unavailable backend system receiving data). What happens: 1. we add max_batch_size elements, all of which "post" resources 2. the batch queue consumes all of those resources in `process_once` by `wait()`ing for them, but gets stuck processing/sending the batch 3. as `process_once` is stuck until `max_retry_time` passed, the function does not run `delete_frontmost_entry()` and thus actually moves the `front` reference 4. when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left This commit fixes #11377 by removing currently processed elements out of the race condition window.

JensErat mentioned this issue Aug 9, 2023

add failing test for edge case max_batch_size=max_entries #11378

Closed

3 tasks

samugi linked a pull request Aug 11, 2023 that will close this issue

add failing test for edge case max_batch_size=max_entries #11378

Closed

3 tasks

samugi added the bug label Aug 11, 2023

hanshuebner mentioned this issue Aug 21, 2023

fix(queue): add failing test for edge case max_batch_size=max_entries #11431

Merged

3 tasks

kikito closed this as completed in #11431 Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New queue fails in edge case max_batch_size=max_entries with assertion error #11377

New queue fails in edge case max_batch_size=max_entries with assertion error #11377

JensErat commented Aug 9, 2023 •

edited

Loading

New queue fails in edge case max_batch_size=max_entries with assertion error #11377

New queue fails in edge case max_batch_size=max_entries with assertion error #11377

Comments

JensErat commented Aug 9, 2023 • edited Loading

Is there an existing issue for this?

Kong version ($ kong version)

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

JensErat commented Aug 9, 2023 •

edited

Loading

Kong version (`$ kong version`)