-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of mHandleSelectThread
in SystemLayer is data-racy
#7818
Comments
And this general |
So poking at this some:
I question whether this optimization is really worthwhile, or as complete as it "should" be, but maybe for now we should just address item 4 there by making |
mHandleSelectThread is used for an optimization: when it's set to the current thread id, WakeIOThread knows that: 1) We're in the middle of running Layer::HandleTimeout 2) We're doing that on the same thread where WakeIOThread was called and hence WakeIOThread doesn't need to do anything, because we're already on the IO thread and it's already awake. The read of this member in WakeIOThread is not synchronized in any way, but as long as we guarantee that it correctly reads a value that was actually assigned to the member (which std::atomic does), things will work correctly. Either the value we read will be equal to the current thread id, in which case we know we're on the one thread that called Layer::HandleTimeout and are inside that function, or it will not be equal to our thread id and then we need to do the actual wakeup work, whatever that value is (including if it's null). Fixes project-chip#7818
mHandleSelectThread is used for an optimization: when it's set to the current thread id, WakeIOThread knows that: 1) We're in the middle of running Layer::HandleTimeout 2) We're doing that on the same thread where WakeIOThread was called and hence WakeIOThread doesn't need to do anything, because we're already on the IO thread and it's already awake. The read of this member in WakeIOThread is not synchronized in any way, but as long as we guarantee that it correctly reads a value that was actually assigned to the member (which std::atomic does), things will work correctly. Either the value we read will be equal to the current thread id, in which case we know we're on the one thread that called Layer::HandleTimeout and are inside that function, or it will not be equal to our thread id and then we need to do the actual wakeup work, whatever that value is (including if it's null). Fixes #7818
mHandleSelectThread is used for an optimization: when it's set to the current thread id, WakeIOThread knows that: 1) We're in the middle of running Layer::HandleTimeout 2) We're doing that on the same thread where WakeIOThread was called and hence WakeIOThread doesn't need to do anything, because we're already on the IO thread and it's already awake. The read of this member in WakeIOThread is not synchronized in any way, but as long as we guarantee that it correctly reads a value that was actually assigned to the member (which std::atomic does), things will work correctly. Either the value we read will be equal to the current thread id, in which case we know we're on the one thread that called Layer::HandleTimeout and are inside that function, or it will not be equal to our thread id and then we need to do the actual wakeup work, whatever that value is (including if it's null). Fixes project-chip#7818
Problem
When I tried to use
ScheduleWork
from outside the Matter thread, I get thread races reported by TSan like so:and indeed
RunEventLoop
ends up repeatedly callingHandleEvents
(with the CHIP lock held). Each call callsLayer::HandleTimeout
, which setsmHandleSelectThread
, then does some work, then unsetsmHandleSelectThread
.On the other hand,
ScheduleWork
ends up callingLayer::WakeIOThread
, which ends up no-opping via early return ifpthread_equal(this->mHandleSelectThread, pthread_self())
. There's no synchronization around this bit.This is clearly racy. Futher, since
mHandleSelectThread
is only set transiently, the only time we take the early return is if we are in the middle ofHandleTimeout
. It's not clear to me why we're doing this no-opping in thepthread_equal
case (I wish it were documented!), but doing so non-deterministically depending on where in the execution ofHandleTimeout
we are seems fishy.Proposed Solution
I'm not sure. If we tried to take the lock in
WakeIOThread
then we could conceivably deadlock if we're already on the thread that holds the lock, and more to the point once we get the lock we are guaranteed thatmHandleSelectThread
isPTHREAD_NULL
....This looks like fallout from #6561
@kpschoedel
The text was updated successfully, but these errors were encountered: