Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input: send resume signal to the input thread event loop if plugin is threaded #7812

Merged
merged 1 commit into from
Aug 17, 2023

Conversation

danlenar
Copy link
Contributor

@danlenar danlenar commented Aug 9, 2023

Fixes #7071

I was getting the following error.

[2023/08/09 06:48:50] [ warn] [input] tail.0 paused (mem buf overlimit)
[2023/08/09 06:48:50] [ info] [input] tail.0 resume (mem buf overlimit)
[2023/08/09 06:48:50] [ warn] [input] tail.0 paused (mem buf overlimit)
[2023/08/09 06:48:50] [error] [input] cannot resume collector tail.0:2, already running

Log processing would start going to standstill once the "cannot resume collector" line was logged.
After looking at the code, I found a race condition around pause/resume.

flb_input_pause sends a signal to input thread event loop
https://github.com/fluent/fluent-bit/blob/v2.1.8/src/flb_input.c#L1673

flb_input_resume is done by the main thread instead of sending signal to input thread event loop
https://github.com/fluent/fluent-bit/blob/v2.1.8/src/flb_input.c#L1695

The fix is to have flb_input_resume also send signal to the input thread, so pause and resume don't happen out of order or stomp on each other.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@danlenar danlenar temporarily deployed to pr August 9, 2023 06:54 — with GitHub Actions Inactive
@danlenar danlenar temporarily deployed to pr August 9, 2023 06:54 — with GitHub Actions Inactive
@danlenar danlenar temporarily deployed to pr August 9, 2023 06:54 — with GitHub Actions Inactive
@danlenar danlenar temporarily deployed to pr August 9, 2023 07:19 — with GitHub Actions Inactive
@leonardo-albertovich
Copy link
Collaborator

@edsiper, this PR looks good to me and I think we should prioritize it.

Thanks a lot for taking the time to address this issue @danlenar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

could not enqueue records into the ring buffer
4 participants