Fix race condition of timeout thread interrupt to stabilize multi level build tests #4254

lihaoyi · 2025-01-06T04:49:45Z

The basic problem was that thread.interrupt() running before interrupt = false meant there was a chance the thread's if (interrupt) { conditional would be true, resulting in the timeout thread timing out the open socket immediately, and the Mill server process shutting down.

The solution is to move interrupt = false to before thread.interrupt()

This should hopefully fix the flakiness in multi-level-build tests, where the process shutting down would cause the next command to re-spawn a new process, resulting in all classloaders to be re-spawned, violating our assertions

Improved the test error checking so next time a similar thing happens, we get more precise reporting "an unwanted process restart occurred", rather than just "classloader invalidation was unexpected".

Tested manually with while ./mill 'integration.invalidation[multi-level-editing].server.test'; do :; done. This seems to reproduce the problem on my laptop in a few tens of minutes without this fix, after this fix I haven't managed to make it appear

lihaoyi · 2025-01-06T12:10:05Z

I'm hoping this also fixes the flakiness in example.fundamentals.tasks[6-workers].local.test (e.g. https://github.com/com-lihaoyi/mill/actions/runs/12579809859/job/35060755776) and integration.failure[fatal-error].local.test (e.g. https://github.com/com-lihaoyi/mill/actions/runs/12591901838/job/35095817421), both of which could be explained by the Mill process from the previous invocation exiting unexpectedly before the subsequent invocation happens

This might have been the cause of a lot of flakiness that seems to have gone away with #4254, as the server exiting caused the `runBackground` calls to exit causing the http servers to exit and fail to pick up requests. Might have been caused by com-lihaoyi/os-lib#324 which made `destroyOnExit` the default for spawned subprocesses. This PR explicitly disables `destroyOnExit` for the subprocesses where `background = true` Covered by a new `integration.invalidation` test that runs under both `server` and `fork`, that previously failed when run under `fork`

lihaoyi added 13 commits January 2, 2025 23:05

.

382d4d6

.

d538c6e

.

e469f25

.

cdcc77c

.

875fbf9

.

6ec5044

.

70bfec6

.

376ca84

Merge branch 'main' into stabilize-multi-level-build

4a7ce91

.

6441015

.

37bc7cb

.

ac2a233

.

3602eee

lihaoyi marked this pull request as ready for review January 6, 2025 07:50

lihaoyi merged commit f0aa010 into com-lihaoyi:main Jan 6, 2025
26 checks passed

lefou added this to the 0.12.6 milestone Jan 6, 2025

lihaoyi mentioned this pull request Jan 6, 2025

WIP fix runBackground with -i/--no-server #4259

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition of timeout thread interrupt to stabilize multi level build tests #4254

Fix race condition of timeout thread interrupt to stabilize multi level build tests #4254

lihaoyi commented Jan 6, 2025 •

edited

Loading

lihaoyi commented Jan 6, 2025

Fix race condition of timeout thread interrupt to stabilize multi level build tests #4254

Fix race condition of timeout thread interrupt to stabilize multi level build tests #4254

Conversation

lihaoyi commented Jan 6, 2025 • edited Loading

lihaoyi commented Jan 6, 2025

lihaoyi commented Jan 6, 2025 •

edited

Loading