Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: fix race between flb_start and flb_destroy #3853

Merged
merged 1 commit into from
Jul 25, 2021
Merged

lib: fix race between flb_start and flb_destroy #3853

merged 1 commit into from
Jul 25, 2021

Conversation

rittneje
Copy link
Contributor

Signed-off-by: Jesse Rittner [email protected]

Fixes a race condition between flb_start and flb_destroy that was caused by flb_start leaking the worker thread in the failure case.

Fixes #3852.

@nokute78 Can you test this branch with the thread sanitizer? I am not able to run it on my end.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • [N/A] Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

There was no more output from Valgrind with this fix.

Documentation

  • [N/A] Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@nokute78
Copy link
Collaborator

@rittneje We can build cmake .. -DSANITIZE_THREAD=yes && make to enable thread sanitizer.
Can you check it ?

@rittneje
Copy link
Contributor Author

rittneje commented Jul 25, 2021

@nokute78 It reports a race in both "Test regex" and " Test exclude", but both of those races are already present in master. It does not report any races in "Test invalid" anymore.

Test regex:

8: ==================
8: WARNING: ThreadSanitizer: data race (pid=24016)
8: Write of size 4 at 0x7b4c00002580 by main thread:
8: #0 flb_input_pause_all /home/jrittner/fluent-bit/src/flb_input.c:882 (flb-rt-filter_grep+0x68073)
8: #1 flb_engine_exit /home/jrittner/fluent-bit/src/flb_engine.c:779 (flb-rt-filter_grep+0x84990)
8: #2 flb_stop /home/jrittner/fluent-bit/src/flb_lib.c:722 (flb-rt-filter_grep+0x62c35)
8: #3 flb_test_filter_grep_regex /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:55 (flb-rt-filter_grep+0x5f980)
8: #4 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #5 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #6 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: Previous read of size 4 at 0x7b4c00002580 by thread T1:
8: #0 flb_input_buf_paused /home/jrittner/fluent-bit/include/fluent-bit/flb_input.h:484 (flb-rt-filter_grep+0xbfa0c)
8: #1 flb_input_chunk_append_raw /home/jrittner/fluent-bit/src/flb_input_chunk.c:888 (flb-rt-filter_grep+0xc25a3)
8: #2 in_lib_collect /home/jrittner/fluent-bit/plugins/in_lib/in_lib.c:93 (flb-rt-filter_grep+0x14c2e5)
8: #3 flb_input_collector_fd /home/jrittner/fluent-bit/src/flb_input.c:1075 (flb-rt-filter_grep+0x6905c)
8: #4 flb_engine_handle_event /home/jrittner/fluent-bit/src/flb_engine.c:377 (flb-rt-filter_grep+0x84231)
8: #5 flb_engine_start /home/jrittner/fluent-bit/src/flb_engine.c:639 (flb-rt-filter_grep+0x84231)
8: #6 flb_lib_worker /home/jrittner/fluent-bit/src/flb_lib.c:628 (flb-rt-filter_grep+0x62663)
8: #7 (libtsan.so.0+0x296ad)
8:
8: As if synchronized via sleep:
8: #0 nanosleep (libtsan.so.0+0x4dac0)
8: #1 flb_time_msleep /home/jrittner/fluent-bit/src/flb_time.c:83 (flb-rt-filter_grep+0x9287d)
8: #2 flb_test_filter_grep_regex /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:53 (flb-rt-filter_grep+0x5f971)
8: #3 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #4 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #5 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: Location is heap block of size 392 at 0x7b4c000024c0 allocated by main thread:
8: #0 calloc (libtsan.so.0+0x2afc3)
8: #1 flb_calloc /home/jrittner/fluent-bit/include/fluent-bit/flb_mem.h:78 (flb-rt-filter_grep+0x6512e)
8: #2 flb_input_new /home/jrittner/fluent-bit/src/flb_input.c:153 (flb-rt-filter_grep+0x65b3f)
8: #3 flb_input /home/jrittner/fluent-bit/src/flb_lib.c:255 (flb-rt-filter_grep+0x610a7)
8: #4 flb_test_filter_grep_regex /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:28 (flb-rt-filter_grep+0x5f62c)
8: #5 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #6 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #7 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: Thread T1 'flb-pipeline' (tid=24018, running) created by main thread at:
8: #0 pthread_create (libtsan.so.0+0x2bcee)
8: #1 mk_utils_worker_spawn /home/jrittner/fluent-bit/lib/monkey/mk_core/mk_utils.c:284 (flb-rt-filter_grep+0x4bc857)
8: #2 flb_test_filter_grep_regex /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:43 (flb-rt-filter_grep+0x5f849)
8: #3 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #4 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #5 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: SUMMARY: ThreadSanitizer: data race /home/jrittner/fluent-bit/src/flb_input.c:882 in flb_input_pause_all
8: ==================

Test exclude:

8: ==================
8: WARNING: ThreadSanitizer: data race (pid=24020)
8: Write of size 4 at 0x7b4c00002580 by main thread:
8: #0 flb_input_pause_all /home/jrittner/fluent-bit/src/flb_input.c:882 (flb-rt-filter_grep+0x68073)
8: #1 flb_engine_exit /home/jrittner/fluent-bit/src/flb_engine.c:779 (flb-rt-filter_grep+0x84990)
8: #2 flb_stop /home/jrittner/fluent-bit/src/flb_lib.c:722 (flb-rt-filter_grep+0x62c35)
8: #3 flb_test_filter_grep_exclude /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:99 (flb-rt-filter_grep+0x5fd5a)
8: #4 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #5 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #6 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: Previous read of size 4 at 0x7b4c00002580 by thread T1:
8: #0 flb_input_buf_paused /home/jrittner/fluent-bit/include/fluent-bit/flb_input.h:484 (flb-rt-filter_grep+0xbfa0c)
8: #1 flb_input_chunk_append_raw /home/jrittner/fluent-bit/src/flb_input_chunk.c:888 (flb-rt-filter_grep+0xc25a3)
8: #2 in_lib_collect /home/jrittner/fluent-bit/plugins/in_lib/in_lib.c:93 (flb-rt-filter_grep+0x14c2e5)
8: #3 flb_input_collector_fd /home/jrittner/fluent-bit/src/flb_input.c:1075 (flb-rt-filter_grep+0x6905c)
8: #4 flb_engine_handle_event /home/jrittner/fluent-bit/src/flb_engine.c:377 (flb-rt-filter_grep+0x84231)
8: #5 flb_engine_start /home/jrittner/fluent-bit/src/flb_engine.c:639 (flb-rt-filter_grep+0x84231)
8: #6 flb_lib_worker /home/jrittner/fluent-bit/src/flb_lib.c:628 (flb-rt-filter_grep+0x62663)
8: #7 (libtsan.so.0+0x296ad)
8:
8: As if synchronized via sleep:
8: #0 nanosleep (libtsan.so.0+0x4dac0)
8: #1 flb_time_msleep /home/jrittner/fluent-bit/src/flb_time.c:83 (flb-rt-filter_grep+0x9287d)
8: #2 flb_test_filter_grep_exclude /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:97 (flb-rt-filter_grep+0x5fd4b)
8: #3 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #4 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #5 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: Location is heap block of size 392 at 0x7b4c000024c0 allocated by main thread:
8: #0 calloc (libtsan.so.0+0x2afc3)
8: #1 flb_calloc /home/jrittner/fluent-bit/include/fluent-bit/flb_mem.h:78 (flb-rt-filter_grep+0x6512e)
8: #2 flb_input_new /home/jrittner/fluent-bit/src/flb_input.c:153 (flb-rt-filter_grep+0x65b3f)
8: #3 flb_input /home/jrittner/fluent-bit/src/flb_lib.c:255 (flb-rt-filter_grep+0x610a7)
8: #4 flb_test_filter_grep_exclude /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:72 (flb-rt-filter_grep+0x5fa06)
8: #5 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #6 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #7 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: Thread T1 'flb-pipeline' (tid=24022, running) created by main thread at:
8: #0 pthread_create (libtsan.so.0+0x2bcee)
8: #1 mk_utils_worker_spawn /home/jrittner/fluent-bit/lib/monkey/mk_core/mk_utils.c:284 (flb-rt-filter_grep+0x4bc857)
8: #2 flb_test_filter_grep_exclude /home/jrittner/fluent-bit/tests/runtime/filter_grep.c:87 (flb-rt-filter_grep+0x5fc23)
8: #3 test_do_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1007 (flb-rt-filter_grep+0x5d12e)
8: #4 test_run_ /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1103 (flb-rt-filter_grep+0x5d4ae)
8: #5 main /home/jrittner/fluent-bit/tests/runtime/../lib/acutest/acutest.h:1700 (flb-rt-filter_grep+0x5f06a)
8:
8: SUMMARY: ThreadSanitizer: data race /home/jrittner/fluent-bit/src/flb_input.c:882 in flb_input_pause_all
8: ==================

Can we investigate those other two races separately? They are not responsible for the sporadic test failures.

@rittneje
Copy link
Contributor Author

@nokute78 Filed #3854 for those other race conditions. In short, they are not related to this and will require a different fix.

Copy link
Collaborator

@nokute78 nokute78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a simple question.

src/flb_lib.c Show resolved Hide resolved
Copy link
Collaborator

@nokute78 nokute78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, LGTM.

@nokute78 nokute78 merged commit 3174392 into fluent:master Jul 25, 2021
@rittneje rittneje deleted the lib-flb-start-fix-race branch July 25, 2021 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

filter_grep: sporadic unit test failures
2 participants