Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document experiments on thread-safety #350

Merged
merged 2 commits into from
Sep 3, 2024
Merged

Conversation

hello-binit
Copy link
Contributor

@hello-binit hello-binit commented Aug 27, 2024

This PR documents some experiments on the thread-safety of the Stretch Body library. It's not clear that thread safety is something this library needs to achieve, however, I needed to know whether Stretch Body already was thread-safe, and so the docs should help anyone in the future with the same question.

threading.Thread(target=wo_runner).start()
```

The program exits with no error. Likely an exception is being thrown in the thread, but isn't being surfaced to the user. Surfacing this exception is the first step towards making this library thread-safe.
Copy link
Contributor

@hello-fazil hello-fazil Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Catch of this in this tutorial, would like to investigate this as it looks like the custom thread Exception hooks are not working the way it should in the main thread.

Copy link
Contributor

@hello-fazil hello-fazil Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a Runtime error is being raised, because intentionally I and @aedsinger decided to mask this out in the exception hook because run-time errors felt common in async event loops in transport layer that time. Will have to revisit these decisions.
https://github.com/hello-robot/stretch_body/blob/master/body/stretch_body/robot.py#L267

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely worth trying. I'm a little wary about the custom exception hook and exception filtering we have in Stretch Body right now. In my experience, messing with the default exception behavior yields unexpected results.

However, I'm not familiar with the context that led to this, so I'm sure you have a good reason. It would be great if you could add docs that explain the design decision behind why Stretch Body has custom hooks / filtering, and how it's intended to work.

Lastly, I've seen a lot of ThreadServiceExits in Stretch's ROS2 driver recently, and I can't recall why we added it to Stretch Body, and what it's supposed to be doing.

Copy link
Contributor

@hello-fazil hello-fazil Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the note I had shared during the time of implementing the Exception hooks before S3 launch. It looks like I was addressing this issue #210 where Asyncio was causing trouble to exit cleanly with stalling behaviour because even after main thread exit the event loop threads used by asyncio was running (from Transport layer serial reads) and causing the stall with unhandled runtime error.

From November 2023

  • Using the threading.exceptionhook feature to cleanly handle uncaught exceptions on all running threads and exit out of multithreaded applications. I have used this methodology to resolve the Robot Stalls on exit issue and the reason was uncaught exceptions from running threads that were holding to execute aioserial’s async functions in the transport layer. (still doing tests before merging to master)
  • The threading library's Exception hook allows you to set a custom callback method to handle the exceptions that are usually uncaught in the running threads. I created a custom exception hook to catch these exceptions in the caller’s main thread.

The program exits with no error. Likely an exception is being thrown in the thread, but isn't being surfaced to the user. Surfacing this exception is the first step towards making this library thread-safe.

Notably, if we command out the lift and arm motion (leaving only the Dxl joints), the thread works fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again dynamixels don't run any of it's methods in async event loop threads. They just run on regular threads which is to be noted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think asyncio could be the source of the silent error?

Copy link
Contributor

@hello-fazil hello-fazil Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be. I think we should try your tests with the async param disabled to see what happens and another case have the exception hooks commented out to see what happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Disabling use_asyncio works!

@hello-fazil
Copy link
Contributor

hello-fazil commented Aug 27, 2024

I wonder what happens if you had used lock primitives from robot class to read and write. Will this issue go away? Would be nice additional test.

@hello-binit
Copy link
Contributor Author

I wonder what happens if you had used lock primitives from robot class to read and write. Will this issue go away? Would be nice additional test.

In these tests, the issue was happening even when it was just a single process in a Python thread and no activity was happening on the main thread, so I don't think using Locks would help.

@hello-binit
Copy link
Contributor Author

@hello-fazil Updated this PR to show that Stretch Body is thread-safe if AsyncIO is disabled.

@hello-binit hello-binit merged commit 01c6fbb into master Sep 3, 2024
@hello-binit hello-binit deleted the docs/is_thread_safe branch September 3, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants