-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRASH from race between detach and a new thread #2600
Comments
Can we use this counter on Windows? |
Adds a counter uninit_thread_count which is incremented in UNIX just prior to a new thread being created. Synchall will keep looping while this counter is positive, ensuring we avoid races with threads created but not yet initialized during a detach synchall which then use DR resources after detach, causing crashes. Leaves the counter at 0 on Windows where we can't distinguish external vs internal thread creation, but Windows does have the init_apc_go_native flag which eliminates some races. Adds a test. To more reliably hit the race (about half the time) I locally added a 50ms sleep at the top of dynamo_thread_init(). Fixes #2600
Adds a counter uninit_thread_count which is incremented in UNIX just prior to a new thread being created. Synchall will keep looping while this counter is positive, ensuring we avoid races with threads created but not yet initialized during a detach synchall which then use DR resources after detach, causing crashes. Leaves the counter at 0 on Windows where we can't distinguish external vs internal thread creation, but Windows does have the init_apc_go_native flag which eliminates some races. Adds a test. To more reliably hit the race (about half the time) I locally added a 50ms sleep at the top of dynamo_thread_init(). Fixes #2600
To handle a thread exiting on attach, adds a timeout to wait_event() and its corresponding implementations: os_wait_event() on Windows and ksynch_wait() on UNIX. Uses the timeout to check whether a thread exited, and if so, to move on. Augments the test from #2600 to test this race. Abandon the api.detach_spawn test on Windows:i#2611 covers fixing the tricky problems on Windows. Leaves in place some fixes toward #2611: + Fixes a race where we put interception_code hooks in place before marking them +x + Increases MAX_THREADS_WAITING_FOR_DR_INIT Fixes clang 32-bit missing __moddi3 by adding it to third_party/libgcc and linking that into x86 and arm. To enable adding race checks, moves doing_detach inside the synchall and adds started_detach for the few checks that need pre-synch querying. Removes the dynamo_thread_init_during_process_exit flag that was added in 45dd931 for #2075, as the UNIX uninit_thread_count solution from #2600 solves that problem on its own. Fixes #2601
To handle a thread exiting on attach, adds a timeout to wait_event() and its corresponding implementations: os_wait_event() on Windows and ksynch_wait() on UNIX. Uses the timeout to check whether a thread exited, and if so, to move on. Augments the test from #2600 to test this race. Abandon the api.detach_spawn test on Windows:i#2611 covers fixing the tricky problems on Windows. Leaves in place some fixes toward #2611: + Fixes a race where we put interception_code hooks in place before marking them +x + Increases MAX_THREADS_WAITING_FOR_DR_INIT Fixes clang 32-bit missing __moddi3 by adding it to third_party/libgcc and linking that into x86 and arm. To enable adding race checks, moves doing_detach inside the synchall and adds started_detach for the few checks that need pre-synch querying. Removes the dynamo_thread_init_during_process_exit flag that was added in 45dd931 for #2075, as the UNIX uninit_thread_count solution from #2600 solves that problem on its own. Fixes #2601
Detach on UNIX has a race where a new thread can have been created but not yet been added to the DR thread list at the point where detach does its synchall. The new thread will wait on thread_initexit_lock while detach proceeds and finishes. The new thread is on the dstack, and will proceed to use other resources, after detach has freed them.
Windows has a partial solution to this via its init_apc_go_native flags but technically still has a race window.
Trying to solve this once the child is created is complex and seems to require extra synch steps. I'm proposing a solution that is folded into the synchall process using a count parallel to the existing exiting_thread_count.
The text was updated successfully, but these errors were encountered: