-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use notify_one()
in CallbackNotifier
#140
Use notify_one()
in CallbackNotifier
#140
Conversation
It is unclear why but for some reason `notify_all()` is causing futexes never to return in some situations. This occurs very frequently in CI and is also less frequently reproducible locally.
I need to confirm that this is the only change that really (seems to have) fixed the bug, I have some other local changes I've been testing against for 8 hours and it didn't deadlock once, where before some of the runs would start deadlocking in <1 hour. This has been the latest change that actually resolved the problem, I just want to confirm the other changes have no part in it. |
@wence- can you think of any reason |
I think switching to notify_one is fine. I think originally we had that, and then I suggested moving to notify_all in case we ever used it in single producer, multiple consumer mode. But we're not doing that. |
Yes, I agree. I'm wondering though if you have any ideas why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we adjust the docstrings in callback_notifier.h
to this change. They currently say things about notify_all
and "waiting threads", whereas now (which is fine), this becomes a mechanism for communicating between two threads.
Done in 54414d1, would you mind taking another look? |
My vague understanding is that pthread_cond_broadcast is one of the tricksiest parts of the pthread signalling code to get right, so 🤷 |
Thanks @wence- for reviewing! |
/merge |
It is unclear why but for some reason
notify_all()
is causing futexes never to return in some situations. This occurs very frequently in CI and is also less frequently reproducible locally.The typical stack trace for the blocked thread is shown below: