Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robot Discovery: Discovery lags / fails if cache has many robots #2408

Closed
mcous opened this issue Oct 3, 2018 · 1 comment
Closed

Robot Discovery: Discovery lags / fails if cache has many robots #2408

mcous opened this issue Oct 3, 2018 · 1 comment
Assignees
Labels
app Affects the `app` project bug

Comments

@mcous
Copy link
Contributor

mcous commented Oct 3, 2018

overview

The discovery-client is built right now to handle around 10 (order of magnitude) robots reliably. If, however, the app is operating in an environment where it sees and then caches many (order of magnitude 100) robots, the poll intervals / cache writes start to affect performance, sometimes to the point of robots never showing up

current behavior

  • App has two polling intervals
    • The interval represents the gap between polls of a given IP
    • Fast for when the list is refreshing (3000 ms)
    • Slow for idle operation (15000 ms)
    • The interval is divided by number of robots, and a GET /health request is fired for that interval
    • e.g. if interval is 3000 ms, and there are 10 robots, the poller will make a request every 300 ms, each time to a different robot until it loops back around
    • If the robot list gets big, then our setInterval gets called with increasingly tiny numbers
  • Robot entries are never evicted from the app's cache
  • Discovery Client fires an event per individual robot update, and app writes the entire cache file on every update event

ideas

  • Establish a minimum actual polling interval at something sane (e.g. 100ms)
  • Put a maximum size on the services list, evicting IP-less services first, then unhealthy ones
    • 100 seems sane, if arbitrary
    • Maybe do 256 (or 128) for old-times sake?
  • Throttle the cache update handler in the app
  • Add feature flag to disable discovery cache for environments where the cache isn't helpful (e.g. the factory)
@mcous mcous added bug app Affects the `app` project labels Oct 3, 2018
@mcous mcous self-assigned this Oct 3, 2018
@mcous
Copy link
Contributor Author

mcous commented Oct 5, 2018

Closing because optimizations added in #2404 / #2420 effectively mitigate lag and feature ticket #2435 tracks the desired solution to the underlying problem

@mcous mcous closed this as completed Oct 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app Affects the `app` project bug
Projects
None yet
Development

No branches or pull requests

1 participant