-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding timeouts to wait methods, to never allow for infinite wait() state. #198
Conversation
If you configure the TCP stack of your Jenkins Controller and Jenkins Agents to a proper value for TIME_WAIT, The connection will die and the threads. The default value in Linux is 7200 seconds, I usually recommend to set it to 120 seconds. |
BTW magic numbers are bad idea |
Add timeouts to StreamGobbler methods
Use existing DEFAULT_WAIT_TIMEOUT for timeout
Add DEFAULT_WAIT_TIMEOUT, and use it for wait() calls
Updated the all values to default 20mins of wait. Overridable by variable as was already used in the previous cases. |
The CI do not pass |
Fixed issues with DM_BOXED_PRIMITIVE_FOR_PARSING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
was root cause why this was needed related to: |
Anything that causes half closed connections cause the issue, this would mitigate some cases, probably not all. |
In a couple of important methods the Object.wait() is invoked without any timeout, which in some cases produces threads waiting forever. I've added default timeout of 30mins, in one case of 12h. I'm open to any other timeouts as long as it's not measured in days. It'd be better to have them as some variable as is already done in some cases with e.g.: DEFAULT_WAIT_TIMEOUT, but I left it as just plain values for now, to discuss about the value for the wait(), and then set it as variable.
See JENKINS-73575.
I believe that when something is wrong with the process it should communicate and not stay in the background wasting resources never to do anything again. This update may create some new issues with regards to problems with agents connectivity, where before Jenkins would just forever try to reconnect, even though the agent stopped existing, and now it would just raise error, as in any other case.
The tests for this would be incredibly hard to perform as the situations where such infinite wait() happens are very specific and I was not able to replicate them so far.
Submitter checklist