-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document experiments on thread-safety #350
Conversation
docs/is_thread_safe/README.md
Outdated
threading.Thread(target=wo_runner).start() | ||
``` | ||
|
||
The program exits with no error. Likely an exception is being thrown in the thread, but isn't being surfaced to the user. Surfacing this exception is the first step towards making this library thread-safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice Catch of this in this tutorial, would like to investigate this as it looks like the custom thread Exception hooks are not working the way it should in the main thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a Runtime error is being raised, because intentionally I and @aedsinger decided to mask this out in the exception hook because run-time errors felt common in async event loops in transport layer that time. Will have to revisit these decisions.
https://github.com/hello-robot/stretch_body/blob/master/body/stretch_body/robot.py#L267
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely worth trying. I'm a little wary about the custom exception hook and exception filtering we have in Stretch Body right now. In my experience, messing with the default exception behavior yields unexpected results.
However, I'm not familiar with the context that led to this, so I'm sure you have a good reason. It would be great if you could add docs that explain the design decision behind why Stretch Body has custom hooks / filtering, and how it's intended to work.
Lastly, I've seen a lot of ThreadServiceExits
in Stretch's ROS2 driver recently, and I can't recall why we added it to Stretch Body, and what it's supposed to be doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the note I had shared during the time of implementing the Exception hooks before S3 launch. It looks like I was addressing this issue #210 where Asyncio was causing trouble to exit cleanly with stalling behaviour because even after main thread exit the event loop threads used by asyncio was running (from Transport layer serial reads) and causing the stall with unhandled runtime error.
From November 2023
- Using the threading.exceptionhook feature to cleanly handle uncaught exceptions on all running threads and exit out of multithreaded applications. I have used this methodology to resolve the Robot Stalls on exit issue and the reason was uncaught exceptions from running threads that were holding to execute aioserial’s async functions in the transport layer. (still doing tests before merging to master)
- The threading library's Exception hook allows you to set a custom callback method to handle the exceptions that are usually uncaught in the running threads. I created a custom exception hook to catch these exceptions in the caller’s main thread.
The program exits with no error. Likely an exception is being thrown in the thread, but isn't being surfaced to the user. Surfacing this exception is the first step towards making this library thread-safe. | ||
|
||
Notably, if we command out the lift and arm motion (leaving only the Dxl joints), the thread works fine. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again dynamixels don't run any of it's methods in async event loop threads. They just run on regular threads which is to be noted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think asyncio could be the source of the silent error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be. I think we should try your tests with the async param disabled to see what happens and another case have the exception hooks commented out to see what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! Disabling use_asyncio
works!
I wonder what happens if you had used lock primitives from robot class to read and write. Will this issue go away? Would be nice additional test. |
In these tests, the issue was happening even when it was just a single process in a Python thread and no activity was happening on the main thread, so I don't think using Locks would help. |
@hello-fazil Updated this PR to show that Stretch Body is thread-safe if AsyncIO is disabled. |
This PR documents some experiments on the thread-safety of the Stretch Body library. It's not clear that thread safety is something this library needs to achieve, however, I needed to know whether Stretch Body already was thread-safe, and so the docs should help anyone in the future with the same question.