Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding timeouts to wait methods, to never allow for infinite wait() state. #198

Merged
merged 8 commits into from
Aug 27, 2024

Conversation

Experrior
Copy link
Contributor

In a couple of important methods the Object.wait() is invoked without any timeout, which in some cases produces threads waiting forever. I've added default timeout of 30mins, in one case of 12h. I'm open to any other timeouts as long as it's not measured in days. It'd be better to have them as some variable as is already done in some cases with e.g.: DEFAULT_WAIT_TIMEOUT, but I left it as just plain values for now, to discuss about the value for the wait(), and then set it as variable.

See JENKINS-73575.

I believe that when something is wrong with the process it should communicate and not stay in the background wasting resources never to do anything again. This update may create some new issues with regards to problems with agents connectivity, where before Jenkins would just forever try to reconnect, even though the agent stopped existing, and now it would just raise error, as in any other case.

The tests for this would be incredibly hard to perform as the situations where such infinite wait() happens are very specific and I was not able to replicate them so far.

Submitter checklist

  • JIRA issue is well described
  • Appropriate autotests or explanation to why this change has no tests

@Experrior Experrior requested a review from a team as a code owner August 19, 2024 13:55
@Elisedlund-ericsson
Copy link
Contributor

there's two similar commits related to wait()
#44
#28
Were the main difference with these implementations is that they allow the timeouts to be overridden with vm-argument.

@kuisathaverat
Copy link
Contributor

If you configure the TCP stack of your Jenkins Controller and Jenkins Agents to a proper value for TIME_WAIT, The connection will die and the threads. The default value in Linux is 7200 seconds, I usually recommend to set it to 120 seconds.

@kuisathaverat
Copy link
Contributor

BTW magic numbers are bad idea

Experrior and others added 6 commits August 21, 2024 15:57
Add timeouts to StreamGobbler methods
Use existing DEFAULT_WAIT_TIMEOUT for timeout
Add DEFAULT_WAIT_TIMEOUT, and use it for wait() calls
@Experrior
Copy link
Contributor Author

Updated the all values to default 20mins of wait. Overridable by variable as was already used in the previous cases.

@kuisathaverat
Copy link
Contributor

The CI do not pass

@Experrior
Copy link
Contributor Author

Fixed issues with DM_BOXED_PRIMITIVE_FOR_PARSING

Copy link
Contributor

@kuisathaverat kuisathaverat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kuisathaverat kuisathaverat merged commit 15622af into jenkinsci:main Aug 27, 2024
5 of 6 checks passed
@Elisedlund-ericsson
Copy link
Contributor

was root cause why this was needed related to:
CESNET/libnetconf2#494
?

@kuisathaverat
Copy link
Contributor

Anything that causes half closed connections cause the issue, this would mitigate some cases, probably not all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants