-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race (TSan report) on worker thread startup on macOS in iree_thread_request_affinity
#15499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find! Tricky bugs typically can be fixed with oneliner. :)
I need to look at this - I don't think this is right - 1 sec. |
Yeah, this is a change in behavior - iree_thread_request_affinity sets the affinity of the |
Oh , right sorry, i didn't pay attention to the fact that this takes a |
I think your fix is good, just may need two methods - one that sets on self (so it doesn't need the queried handle) and one that sets on another thread that doesn't have to worry. |
Ah, I see. Sorry missed that.. |
was thinking exactly that --- i'll send something. |
No worries - the sad part is that Apple's schedulers "know better" and often take affinity as just a suggestion, so this would have just made scheduling slightly more non-deterministic :P |
…e_thread_request_affinity` (#15534) The thread having its affinity set is not the thread setting it; `mach_task_self()` != `thread->mach_port`. We'll want to fix the race while still having the same behavior when called from https://github.com/openxla/iree/blob/5b72c3908c8c216f3eee034ca74d61b8dbf37a12/runtime/src/iree/base/internal/threading_darwin.c#L141-L143 Reverts #15499 and rolls forward with fix.
…e_thread_request_affinity` (iree-org#15499) TSan report: ``` WARNING: ThreadSanitizer: data race (pid=45817) Read of size 4 at 0x0001084004e0 by thread T2: #0 iree_thread_request_affinity threading_darwin.c:230 (local-task_vmvx_semaphore_submission_test:arm64+0x100078f40) #1 iree_task_worker_main worker.c:385 (local-task_vmvx_semaphore_submission_test:arm64+0x100071594) #2 iree_thread_start_routine threading_darwin.c:72 (local-task_vmvx_semaphore_submission_test:arm64+0x100078e3c) Previous write of size 4 at 0x0001084004e0 by main thread: #0 iree_thread_create threading_darwin.c:140 (local-task_vmvx_semaphore_submission_test:arm64+0x100078ca4) #1 iree_task_worker_initialize worker.c:66 (local-task_vmvx_semaphore_submission_test:arm64+0x1000714f8) #2 iree_task_executor_create executor.c:161 (local-task_vmvx_semaphore_submission_test:arm64+0x10006b2b0) ``` The read of `thread->mach_port` at https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L230 is not ordered relatively to the write of that variable in the parent thread after `pthread_mach_thread_np` returns: https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L140 The proposed fix is that the worker thread shouldn't need to access its own `thread->mach_port`, it can equivalently call `mach_task_self()`.
…e_thread_request_affinity` (iree-org#15534) The thread having its affinity set is not the thread setting it; `mach_task_self()` != `thread->mach_port`. We'll want to fix the race while still having the same behavior when called from https://github.com/openxla/iree/blob/5b72c3908c8c216f3eee034ca74d61b8dbf37a12/runtime/src/iree/base/internal/threading_darwin.c#L141-L143 Reverts iree-org#15499 and rolls forward with fix.
TSan report:
The read of
thread->mach_port
at https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L230is not ordered relatively to the write of that variable in the parent thread after
pthread_mach_thread_np
returns: https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L140The proposed fix is that the worker thread shouldn't need to access its own
thread->mach_port
, it can equivalently callmach_task_self()
.