Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(api): do not cache removed modules in thread_manager #7690

Merged
merged 6 commits into from
Apr 26, 2021

Conversation

sanni-t
Copy link
Member

@sanni-t sanni-t commented Apr 20, 2021

Closes #5359

Overview

Currently, when we unplug an OT module, a series of events ultimately leads to API._unregister_modules being called. This function then calls the __delete__ function on the AbstractModule instances, which is supposed to stop the associated module's pollers and disconnect cleanly. But the python garbage collector does not execute the object's destructor (__delete__), unless all references to that object are removed.

This is where the issue in #5359 comes into picture. The behavior we notice is that if a module is unplugged when the app has polled the /modules endpoint, the temperature polling thread doesn't stop. This happens because the module endpoints in robot server talk to the modules via the ThreadManager. The thread manager wraps every instance of the attached modules in a CallBridger and caches it using lru_cache. This cached instance doesn't leave the cache unless it's removed to make space for a new instance; which happens very rarely given the size of the cache. Which means that the ThreadManager keeps holding onto the AbstractModule instance, which prevents the module from being garbage-collected, resulting in the polling threads staying alive even when the module is unplugged.

To fix this behavior, we decided to implement our own caching of wrapped modules such that stale instances are removed timely. Additionally, we will explicitly call a cleanup function on all modules to stop any threads spawned by them instead of relying on the garbage collector.

Changelog

  • replaced the lru_cache with a WeakKeyDictionary that doesn't prevent objects from being garbage collected.
  • Added a cleanup method to all modules, which gets called in API._unregister_modules.
  • refactored tempdeck driver's update_temperature in order to catch exceptions better.

Review requests

Risk assessment

Medium. Could affect communication with modules if implemented poorly.

@codecov
Copy link

codecov bot commented Apr 20, 2021

Codecov Report

❗ No coverage uploaded for pull request base (edge@1968164). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             edge    #7690   +/-   ##
=======================================
  Coverage        ?   82.67%           
=======================================
  Files           ?      321           
  Lines           ?    21588           
  Branches        ?        0           
=======================================
  Hits            ?    17848           
  Misses          ?     3740           
  Partials        ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1968164...35b89b7. Read the comment docs.

@sanni-t sanni-t force-pushed the fix_module_caching branch from 1fc9969 to a0916f6 Compare April 21, 2021 15:26
@sanni-t sanni-t marked this pull request as ready for review April 22, 2021 22:30
@sanni-t sanni-t requested a review from a team as a code owner April 22, 2021 22:30
Copy link
Contributor

@mcous mcous left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Away from my robot right now (and don't have a tempdeck anyway) but this code looks solid

Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, cool use of weakref.

@sanni-t
Copy link
Member Author

sanni-t commented Apr 26, 2021

Tested with a tempdeck & a thermocycler. Threads clean up as expected.

@sanni-t sanni-t merged commit ca47a9c into edge Apr 26, 2021
@sanni-t sanni-t deleted the fix_module_caching branch April 26, 2021 20:21
@mcous mcous mentioned this pull request Jun 8, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: Input/output errors are repeatedly logged after removing a Temperature Module
4 participants