You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just encountered an agent that was hanging because of a hanging subprocess. This particular case was caused by calling df -k on a hung NFS mount. In this case, we should continue to collect other metrics and just log a disk metrics error. This is also just one instance of many subprocesses that could hang because of broken dependencies.
add a generic subprocess wrapper that handles all the things we'd want (mandatory timeouts, not trapping SystemExit or signal interrupts, etc).
add a timeout on each check as a whole. this would be nice because it would also handle network hangs as well.
something else?
I'd vote for doing the first no matter what, because I've noticed that in general, the exception handling around subprocesses could use some improvement in general. and perhaps the second as well.
The text was updated successfully, but these errors were encountered:
Just encountered an agent that was hanging because of a hanging subprocess. This particular case was caused by calling
df -k
on a hung NFS mount. In this case, we should continue to collect other metrics and just log a disk metrics error. This is also just one instance of many subprocesses that could hang because of broken dependencies.I'd vote for doing the first no matter what, because I've noticed that in general, the exception handling around subprocesses could use some improvement in general. and perhaps the second as well.
The text was updated successfully, but these errors were encountered: