-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in "select" test #707
Comments
Discovered another race condition in here - this uses the same TCP port number as the "network-api-test" does. This means when running ctest with a "-j8" option (parallel) that it might run at the same time as network-api-test, which causes it to fail. Still not clear on why this wasn't part of the network-api-test as I originally suggested - which would have avoided this - but nonetheless it has been implemented separately so this becomes a race condition issue. Should use a different port number for this test so it doesn't interfere with the other test. |
It was not put into the network api test because not all the select tests use network features. As for selecting a new port number, is there a list of all the ports that are already being used or do I just have to search through the tests to find them. |
change port numbers to be different from network test
change sem flush to solve race condition change port numbers to be different from network test
Fix #707 change sem flush to solve race condition
Changes Message Key from uint16 to uint32 to avoid rollover
Fix nasa#707, Resolve highest MsgID of 0xFFFF bug
Describe the bug
Running the OSAL select test, I ran into a deadlock situation where the "multi" test got stuck and never finished.
To Reproduce
Hit or miss... Run test repeatedly on a system with other loads (e.g. parallel builds)
Expected behavior
Test should complete
Code snips
Checking the test status/backtrace it looks like two tasks (main + "Server_Fn") are waiting on the binary sem. In particular the Server_Fn is stuck here:
osal/src/tests/select-test/select-test.c
Line 162 in d698a4d
While the main task is waiting in the teardown code (the
TestSelectMultipleRead
has completed, and it has invokedTeardown_Multi
which in turn invokesTeardown_Single
here):osal/src/tests/select-test/select-test.c
Line 273 in d698a4d
System observed on:
Ubuntu 20.04
Additional context
This is likely related to the use of
OS_BinSemFlush
. We should probably deprecate this function, as I cannot see how this can ever be used safely without it being a race condition. VxWorks offers it which (I think) is why OSAL also offers it, but its a fundamentally broken concept.I can confirm that looking at the traceback in gdb, the flush_count is indeed already 1 - meaning the flush had already happened by the time the Server_Fn entered the bin sem take routine.
Reporter Info
Joseph Hickey, Vantage Systems, Inc.
The text was updated successfully, but these errors were encountered: