FIX: Race conditions in `test_cli.py` when spawning test server processes #4178

reneme · 2024-07-04T07:57:23Z

Pull Request Dependencies

FIX: potential use-after-free in ./botan tls_proxy #4177

Description

The test_cli.py script handles and orchestrates a few long-running processes (namely ./botan tls_server, ./botan tls_client, ./botan tls_proxy). For those tests to work properly, we have to ensure that the server processes are ready before trying to interact with them. Especially on the CI where runtime behavior is notoriously unpredictable. Until now, this was done by invoking time.sleep(1) after launching a server process. This worked most of the time, but introduced unnecessary waiting time while still being potentially racey.

This now uses Python's asyncio module to handle such long-running processes and interact with them. Each CLI server now reports a "Listening for new connections..." string on stdout once its ready to receive connections. The asyncio-based wrapper can wait for this string and thus ensure that the server is ready before the test continues. Also, the wrapper makes sure to timeout on any of those waits (hard-coded 15 seconds) and kill the process if necessary.

Result

Such tests can now rely on their server processes being in a defined state, without sleeping some arbitrary time and hoping for the best. As a side effect, running ./test_cli.py --threads=4 now takes 2.5 seconds on my laptop, instead of the more than 20 seconds it used to take due to the time.sleep() workarounds.

Fixes #4112

coveralls · 2024-07-04T08:25:28Z

coverage: 91.729% (-0.002%) from 91.731%
when pulling 3294602 on Rohde-Schwarz:fix/race_conditions_in_test_cli
into f4c26e4 on randombit:master.

randombit

Very nice work!

The pylint failures in CI look relevant. If there is anything it's warning about that isn't sensible to change, feel free to locally disable the warning.

This uses Python's async process handling based on the asyncio module as a replacement for the low-level use of subprocess.Popen along with arbitrary (and inherently racey) time.sleep() calls.

reneme · 2024-07-04T10:13:20Z

In the first run macOS 14 (on Apple Silicon) failed obscurely:

     [...]
     INFO: Ran cli_zfec_tests in 0.06 sec
     INFO: Ran cli_trust_root_tests in 0.54 sec
     INFO: Ran cli_tls_proxy_tests in 5.14 sec
  Ran 234 tests with 0 failures in 6.40 seconds
  Command 'python3 ./src/scripts/test_cli.py --threads=3 ./botan' failed with error code -11

.... looks like the interpreter crashed after the script had finished. 😨
This didn't show up in a second CI run, but it is certainly worrisome. Obviously I cannot reproduce on my MacBook Air...

I'd suggest to just merge and see whether that comes back. Perhaps we'll end up having to update the python interpreter in the build job setup or disable those tests on macOS 14 for some time. 😒

coveralls · 2024-07-04T10:23:45Z

coverage: 91.723% (+0.001%) from 91.722%
when pulling 934f3fe on Rohde-Schwarz:fix/race_conditions_in_test_cli
into a75bb25 on randombit:master.

randombit · 2024-07-04T10:27:25Z

I think the macOS failure is #3991 so unrelated to this change.

reneme · 2024-07-04T10:55:42Z

Let's hope so. In it goes!

randombit · 2024-07-04T11:22:14Z

Couple of Windows builds in the merge are failing, maybe we need more than 15 seconds in practice on CI?

This was disabled in #3845 due to flakyness, then thought possibly fixed and enabled again in #4178. However with #4178 merged the test still occasionally fails. Disable it again pending diagnosis.

reneme added the enhancement Enhancement or new feature label Jul 4, 2024

reneme added this to the Botan 3.5.0 milestone Jul 4, 2024

reneme self-assigned this Jul 4, 2024

Ensure that CLI servers report for duty

c376a00

randombit approved these changes Jul 4, 2024

View reviewed changes

Centralize handling of long-running processes in test_cli.py

934f3fe

This uses Python's async process handling based on the asyncio module as a replacement for the low-level use of subprocess.Popen along with arbitrary (and inherently racey) time.sleep() calls.

reneme force-pushed the fix/race_conditions_in_test_cli branch from 3294602 to 934f3fe Compare July 4, 2024 09:54

reneme merged commit 3e65302 into randombit:master Jul 4, 2024
39 checks passed

reneme deleted the fix/race_conditions_in_test_cli branch July 4, 2024 10:55

randombit mentioned this pull request Jul 4, 2024

Disable tls_proxy cli test on Windows again #4181

Merged

reneme mentioned this pull request Jul 4, 2024

Attempt to fix use-after-free in tls_proxy CLI test #4148

Closed

reneme mentioned this pull request Aug 12, 2024

tls_proxy null reference #4262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Race conditions in `test_cli.py` when spawning test server processes #4178

FIX: Race conditions in `test_cli.py` when spawning test server processes #4178

reneme commented Jul 4, 2024 •

edited

Loading

coveralls commented Jul 4, 2024

randombit left a comment

reneme commented Jul 4, 2024

coveralls commented Jul 4, 2024

randombit commented Jul 4, 2024

reneme commented Jul 4, 2024

randombit commented Jul 4, 2024

FIX: Race conditions in test_cli.py when spawning test server processes #4178

FIX: Race conditions in test_cli.py when spawning test server processes #4178

Conversation

reneme commented Jul 4, 2024 • edited Loading

Pull Request Dependencies

Description

Result

coveralls commented Jul 4, 2024

randombit left a comment

Choose a reason for hiding this comment

reneme commented Jul 4, 2024

coveralls commented Jul 4, 2024

randombit commented Jul 4, 2024

reneme commented Jul 4, 2024

randombit commented Jul 4, 2024

FIX: Race conditions in `test_cli.py` when spawning test server processes #4178

FIX: Race conditions in `test_cli.py` when spawning test server processes #4178

reneme commented Jul 4, 2024 •

edited

Loading