Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hung subprocess causes the agent to hang as well #443

Closed
clutchski opened this issue Apr 10, 2013 · 4 comments
Closed

hung subprocess causes the agent to hang as well #443

clutchski opened this issue Apr 10, 2013 · 4 comments
Labels
Milestone

Comments

@clutchski
Copy link
Contributor

Just encountered an agent that was hanging because of a hanging subprocess. This particular case was caused by calling df -k on a hung NFS mount. In this case, we should continue to collect other metrics and just log a disk metrics error. This is also just one instance of many subprocesses that could hang because of broken dependencies.

  • add a generic subprocess wrapper that handles all the things we'd want (mandatory timeouts, not trapping SystemExit or signal interrupts, etc).
  • add a timeout on each check as a whole. this would be nice because it would also handle network hangs as well.
  • something else?

I'd vote for doing the first no matter what, because I've noticed that in general, the exception handling around subprocesses could use some improvement in general. and perhaps the second as well.

@clofresh
Copy link
Contributor

That's odd, I figured the watchdog would have killed that thread. An NFS hang might be at the kernel level and prevent Python-level timeout kills

@clutchski
Copy link
Contributor Author

The watchdog killed the whole agent. But it would never report any metrics ever, because it hit the NFS mount every loop and hung.

@remh
Copy link

remh commented Sep 14, 2015

Will be fixed by #1892

@remh remh modified the milestones: 5.6.0, Future Sep 14, 2015
@yannmh
Copy link
Member

yannmh commented Nov 3, 2015

Fixeed with #1892

@yannmh yannmh closed this as completed Nov 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants