Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race (TSan report) on worker thread startup on macOS in iree_thread_request_affinity #15499

Merged
merged 1 commit into from
Nov 10, 2023

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented Nov 9, 2023

TSan report:

WARNING: ThreadSanitizer: data race (pid=45817)
  Read of size 4 at 0x0001084004e0 by thread T2:
    #0 iree_thread_request_affinity threading_darwin.c:230 (local-task_vmvx_semaphore_submission_test:arm64+0x100078f40)
    #1 iree_task_worker_main worker.c:385 (local-task_vmvx_semaphore_submission_test:arm64+0x100071594)
    #2 iree_thread_start_routine threading_darwin.c:72 (local-task_vmvx_semaphore_submission_test:arm64+0x100078e3c)

  Previous write of size 4 at 0x0001084004e0 by main thread:
    #0 iree_thread_create threading_darwin.c:140 (local-task_vmvx_semaphore_submission_test:arm64+0x100078ca4)
    #1 iree_task_worker_initialize worker.c:66 (local-task_vmvx_semaphore_submission_test:arm64+0x1000714f8)
    #2 iree_task_executor_create executor.c:161 (local-task_vmvx_semaphore_submission_test:arm64+0x10006b2b0)

The read of thread->mach_port at https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L230

is not ordered relatively to the write of that variable in the parent thread after pthread_mach_thread_np returns: https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L140

The proposed fix is that the worker thread shouldn't need to access its own thread->mach_port, it can equivalently call mach_task_self().

@bjacob bjacob marked this pull request as ready for review November 9, 2023 03:03
@bjacob bjacob requested a review from benvanik as a code owner November 9, 2023 03:03
Copy link
Contributor

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! Tricky bugs typically can be fixed with oneliner. :)

@bjacob bjacob merged commit 8633629 into iree-org:main Nov 10, 2023
benvanik added a commit that referenced this pull request Nov 10, 2023
… in `iree_thread_request_affinity` (#15499)"

This reverts commit 8633629.
@benvanik
Copy link
Collaborator

I need to look at this - I don't think this is right - 1 sec.

@benvanik
Copy link
Collaborator

Yeah, this is a change in behavior - iree_thread_request_affinity sets the affinity of the thread provided, but this change makes it so it always sets it on the thread calling iree_thread_request_affinity. Will revert so we can figure out how to fix while preserving behavior. Sorry!

@bjacob
Copy link
Contributor Author

bjacob commented Nov 10, 2023

Oh , right sorry, i didn't pay attention to the fact that this takes a thread.

@benvanik
Copy link
Collaborator

I think your fix is good, just may need two methods - one that sets on self (so it doesn't need the queried handle) and one that sets on another thread that doesn't have to worry.

@antiagainst
Copy link
Contributor

Ah, I see. Sorry missed that..

@bjacob
Copy link
Contributor Author

bjacob commented Nov 10, 2023

I think your fix is good, just may need two methods - one that sets on self (so it doesn't need the queried handle) and one that sets on another thread that doesn't have to worry.

was thinking exactly that --- i'll send something.

@benvanik
Copy link
Collaborator

No worries - the sad part is that Apple's schedulers "know better" and often take affinity as just a suggestion, so this would have just made scheduling slightly more non-deterministic :P

benvanik added a commit that referenced this pull request Nov 10, 2023
…e_thread_request_affinity` (#15534)

The thread having its affinity set is not the thread setting it;
`mach_task_self()` != `thread->mach_port`. We'll want to fix the race
while still having the same behavior when called from
https://github.com/openxla/iree/blob/5b72c3908c8c216f3eee034ca74d61b8dbf37a12/runtime/src/iree/base/internal/threading_darwin.c#L141-L143

Reverts #15499 and rolls forward with fix.
ramiro050 pushed a commit to ramiro050/iree that referenced this pull request Dec 19, 2023
…e_thread_request_affinity` (iree-org#15499)

TSan report:

```
WARNING: ThreadSanitizer: data race (pid=45817)
  Read of size 4 at 0x0001084004e0 by thread T2:
    #0 iree_thread_request_affinity threading_darwin.c:230 (local-task_vmvx_semaphore_submission_test:arm64+0x100078f40)
    #1 iree_task_worker_main worker.c:385 (local-task_vmvx_semaphore_submission_test:arm64+0x100071594)
    #2 iree_thread_start_routine threading_darwin.c:72 (local-task_vmvx_semaphore_submission_test:arm64+0x100078e3c)

  Previous write of size 4 at 0x0001084004e0 by main thread:
    #0 iree_thread_create threading_darwin.c:140 (local-task_vmvx_semaphore_submission_test:arm64+0x100078ca4)
    #1 iree_task_worker_initialize worker.c:66 (local-task_vmvx_semaphore_submission_test:arm64+0x1000714f8)
    #2 iree_task_executor_create executor.c:161 (local-task_vmvx_semaphore_submission_test:arm64+0x10006b2b0)
```

The read of `thread->mach_port` at
https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L230

is not ordered relatively to the write of that variable in the parent
thread after `pthread_mach_thread_np` returns:
https://github.com/openxla/iree/blob/ccc4c3719cea467477a783f1c9e9f1fc06b4c508/runtime/src/iree/base/internal/threading_darwin.c#L140

The proposed fix is that the worker thread shouldn't need to access its
own `thread->mach_port`, it can equivalently call `mach_task_self()`.
ramiro050 pushed a commit to ramiro050/iree that referenced this pull request Dec 19, 2023
…e_thread_request_affinity` (iree-org#15534)

The thread having its affinity set is not the thread setting it;
`mach_task_self()` != `thread->mach_port`. We'll want to fix the race
while still having the same behavior when called from
https://github.com/openxla/iree/blob/5b72c3908c8c216f3eee034ca74d61b8dbf37a12/runtime/src/iree/base/internal/threading_darwin.c#L141-L143

Reverts iree-org#15499 and rolls forward with fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants