Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRASH from race between detach and a new thread #2600

Closed
derekbruening opened this issue Aug 8, 2017 · 1 comment
Closed

CRASH from race between detach and a new thread #2600

derekbruening opened this issue Aug 8, 2017 · 1 comment
Assignees

Comments

@derekbruening
Copy link
Contributor

Detach on UNIX has a race where a new thread can have been created but not yet been added to the DR thread list at the point where detach does its synchall. The new thread will wait on thread_initexit_lock while detach proceeds and finishes. The new thread is on the dstack, and will proceed to use other resources, after detach has freed them.

Windows has a partial solution to this via its init_apc_go_native flags but technically still has a race window.

Trying to solve this once the child is created is complex and seems to require extra synch steps. I'm proposing a solution that is folded into the synchall process using a count parallel to the existing exiting_thread_count.

@derekbruening
Copy link
Contributor Author

Can we use this counter on Windows?
I don't see how it can cover everything: threads can be created from outside.
So there will always be a race window.
We can reduce it by incrementing the counter in syscall handlers that create
threads, but we won't know whether to decrement in dynamo_thread_init: we can't tell
whether the thread was internally or externally created.
I may leave the new counter solution as a UNIX-only thing for now.

derekbruening added a commit that referenced this issue Aug 8, 2017
Adds a counter uninit_thread_count which is incremented in UNIX just prior
to a new thread being created.  Synchall will keep looping while this
counter is positive, ensuring we avoid races with threads created but not
yet initialized during a detach synchall which then use DR resources after
detach, causing crashes.

Leaves the counter at 0 on Windows where we can't distinguish external vs
internal thread creation, but Windows does have the init_apc_go_native flag
which eliminates some races.

Adds a test.  To more reliably hit the race (about half the time) I locally
added a 50ms sleep at the top of dynamo_thread_init().

Fixes #2600
derekbruening added a commit that referenced this issue Aug 8, 2017
Adds a counter uninit_thread_count which is incremented in UNIX just prior
to a new thread being created.  Synchall will keep looping while this
counter is positive, ensuring we avoid races with threads created but not
yet initialized during a detach synchall which then use DR resources after
detach, causing crashes.

Leaves the counter at 0 on Windows where we can't distinguish external vs
internal thread creation, but Windows does have the init_apc_go_native flag
which eliminates some races.

Adds a test.  To more reliably hit the race (about half the time) I locally
added a 50ms sleep at the top of dynamo_thread_init().

Fixes #2600
derekbruening added a commit that referenced this issue Aug 9, 2017
To handle a thread exiting on attach, adds a timeout to wait_event() and
its corresponding implementations: os_wait_event() on Windows and
ksynch_wait() on UNIX.  Uses the timeout to check whether a thread exited,
and if so, to move on.

Augments the test from #2600 to test this race.

Fixes #2061
derekbruening added a commit that referenced this issue Nov 2, 2017
started_detach for the few checks that need pre-synch querying.

Reminder to change to #2601 and to add a xref to #2075: we've now removed
the dynamo_thread_init_during_process_exit flag that was added by 45dd931
as #2600's counter fixed that race.
derekbruening added a commit that referenced this issue Nov 6, 2017
To handle a thread exiting on attach, adds a timeout to wait_event() and
its corresponding implementations: os_wait_event() on Windows and
ksynch_wait() on UNIX.  Uses the timeout to check whether a thread exited,
and if so, to move on.

Augments the test from #2600 to test this race.

Abandon the api.detach_spawn test on Windows:i#2611 covers fixing the
tricky problems on Windows.  Leaves in place some fixes toward #2611:
+ Fixes a race where we put interception_code hooks in place before marking
  them +x
+ Increases MAX_THREADS_WAITING_FOR_DR_INIT

Fixes clang 32-bit missing __moddi3 by adding it to third_party/libgcc and linking
that into x86 and arm.

To enable adding race checks, moves doing_detach inside the synchall and adds
started_detach for the few checks that need pre-synch querying.

Removes the dynamo_thread_init_during_process_exit flag that was added in
45dd931 for #2075, as the UNIX uninit_thread_count solution from #2600 solves
that problem on its own.

Fixes #2601
fhahn pushed a commit that referenced this issue Dec 4, 2017
To handle a thread exiting on attach, adds a timeout to wait_event() and
its corresponding implementations: os_wait_event() on Windows and
ksynch_wait() on UNIX.  Uses the timeout to check whether a thread exited,
and if so, to move on.

Augments the test from #2600 to test this race.

Abandon the api.detach_spawn test on Windows:i#2611 covers fixing the
tricky problems on Windows.  Leaves in place some fixes toward #2611:
+ Fixes a race where we put interception_code hooks in place before marking
  them +x
+ Increases MAX_THREADS_WAITING_FOR_DR_INIT

Fixes clang 32-bit missing __moddi3 by adding it to third_party/libgcc and linking
that into x86 and arm.

To enable adding race checks, moves doing_detach inside the synchall and adds
started_detach for the few checks that need pre-synch querying.

Removes the dynamo_thread_init_during_process_exit flag that was added in
45dd931 for #2075, as the UNIX uninit_thread_count solution from #2600 solves
that problem on its own.

Fixes #2601
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant