-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More understandable timeouts in checks when value exceeds interval, Fixes #5834 #5835
Conversation
Will fix #5834 |
…ixes hashicorp#5834 When Timeouts were greater or equal than interval for checks, Consul used to set timeout to 10s. Now, use the interval as upper boundary in that case. Also outputs a warning in logs in that case
48fbc47
to
8f99bd5
Compare
Added unit tests for both TCP and HTTP checks
8cc7fb6
to
b714005
Compare
b714005
to
30f2727
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work!
There is one case in which your computation returns a value that I did not expect:
timeout = 12s, interval = 0
: the default interval is set and the timeout gets set to 10s
. I think it should still be 12s
because the user explicitly configured it like that.
This playground has a naive implementation of what I mean: https://play.golang.org/p/uTIQBvnghWg.
Since the comments are saying that the timeout must not be greater than the interval, I was under the impression that the checks were run at this particular interval. But this is not the case, the interval is used to determine the waiting time until the next check is performed:
Lines 99 to 101 in 30f2727
case <-next: | |
c.check() | |
next = time.After(c.Interval) |
Which means that even when the timeout is bigger than the interval, there will never be the situation were unfinished checks are piling up. Which was my assumption of why we had this code in the first place. Maybe it was that way in the past.
Looking forward to hear what you think.
@i0rek Thank you for watching this PR. To be frank, I don't know what is the best. I was not sure whether having timeout bigger than interval could cause issues, I thought that compared to existing implementation, it is the less disruptive. The interval = 0 seems impossible for checks, since it would be set to checks.MinInterval which means 1s , so the code is still valid. I can change it if you want, but compared to the existing implementation, I think it is a reasonable choice. Maybe we can document more that having timeouts greater than interval will cause timeouts to be used? |
@i0rek what do you think about my last comment #5835 (comment) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at this I realised again that the checks cannot overlap because the interval starts after the check returns. Which makes me think that we should just use whatever timeout was provided by the user or use the default one.
In case we are doing it like that, it would have to go into 1.6, because it is a breaking change.
What do you think about that?
@i0rek It basically means that for now: duration between 2 checks is between 1 and 2*interval (timeout which is bounded by interval + interval) So basically, this PR changes nothing to the existing behavior except for people using very long timeout (bigger than interval) that used to be time outed after 10s and will be at interval now... Sound quite OK to me and easier to understand for people without the knowledge about Consul internals without completely changing behavior in an insane way. Might probably deserves a note un documentation anyway |
@pierresouchay If the configured timeout is 12s and the interval is 10s why not set the timeout at 12s? This bounds the duration between checks at timeout + interval. We are leaning toward changing the implementation so that timeouts are configurable without being capped. (While likely maintaining the It's worth considering that the check registration docs don't mention the timeouts being restricted: |
Closing this in favor of #6094 which will go in with the 1.6 release. |
When Timeouts were greater or equal than interval for checks,
Consul used to set timeout to 10s. Now, use the interval as upper boundary in that case.
Also outputs a warning in logs in that case.
Thus:
Will now use a timeout of 30s and not 10s as it used to be the case