Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New queue fails in edge case max_batch_size=max_entries with assertion error #11377

Closed
1 task done
JensErat opened this issue Aug 9, 2023 · 0 comments · Fixed by #11431
Closed
1 task done

New queue fails in edge case max_batch_size=max_entries with assertion error #11377

JensErat opened this issue Aug 9, 2023 · 0 comments · Fixed by #11431
Labels

Comments

@JensErat
Copy link
Contributor

JensErat commented Aug 9, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

Kong 3.3.1

Current Behavior

When defining a very special edge case configuration having max_batch_size=max_entries, the queue can fail with an assertion error when removing the frontmost element. This happens especially when the callback repeatedly fails (eg. an unavailable backend system receiving data).

What happens:

  1. we add max_batch_size elements, all of which "post" resources

  2. the batch queue consumes all of those resources in process_once by wait()ing for them, but gets stuck processing/sending the batch

  3. as process_once is stuck until max_retry_time passed, the function does not run delete_frontmost_entry() and thus actually moves the front reference

  4. when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left:

     /kong-plugin/kong/tools/queue.lua:215: assertion failed!
    

Potential thoughts for fixing the issue:

  • enforce max_batch_size<max_entries, which is not really a reasonable configuration anyway
  • increase max_entries by 1 when both configuration values are the same (our current workaround)
  • further refactor the queue to remove the race condition

Kudos to @27ascii for discovering the edge case configuration.

Expected Behavior

Configuration either not allowed or assertion does not fail worker

Steps To Reproduce

Pull request containing unit test reproducing the issue: #11378

Anything else?

No response

Jens Erat <jens.erat@mercedes-benz.com>, Mercedes-Benz Tech Innovation GmbH, imprint

@samugi samugi linked a pull request Aug 11, 2023 that will close this issue
3 tasks
@samugi samugi added the bug label Aug 11, 2023
hanshuebner pushed a commit that referenced this issue Aug 21, 2023
When defining a very special edge case configuration having
max_batch_size=max_entries, the queue can fail with an assertion error when
removing the frontmost element. This happens especially when the
callback repeatedly fails (eg. an unavailable backend system receiving
data).

What happens:

1. we add max_batch_size elements, all of which "post" resources
2. the batch queue consumes all of those resources in `process_once` by `wait()`ing for them, but gets stuck processing/sending the batch
3. as `process_once` is stuck until `max_retry_time` passed, the function does not run `delete_frontmost_entry()` and thus actually moves the `front` reference
4. when enqueuing the next item, it tries to drop the oldest entry, but triggers the assertion in queue.lua as no resources are left

This commit fixes #11377 by removing currently processed elements out
of the race condition window.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants