-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rpcHoldTimeout to calculate blocking timeout #15541
Conversation
009ce31
to
606b72a
Compare
606b72a
to
d10fbff
Compare
// Override the default client timeout but add RPCHoldTimeout | ||
// as a buffer for retries during leadership changes. | ||
timeout = blockingTimeout + p.RPCHoldTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all this timeout calculation is confusing, in a nutshell I am re-adding RPCHoldTimeout that I removed in https://github.com/hashicorp/consul/pull/14965/files#diff-8f12aa7f72e8647fddfac936ab541982aae517babd3064a581bbcdcc2595c2dcL347-R360
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXCEPT in the non-blocking case, which is controlled by a different config rpc_client_timeout
and should not have rpc_hold_timeout
included (since it's only relevant for blocking queries)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also guarantees that the server timeout is longer than the clients in case of leader rotation, which I guess is the whole point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other way around: assuming client and server calculate timeout the same way, it guarantees clients will always have the longer timeout (servers will timeout first, which is desired)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, yep, reversed that in my head
d10fbff
to
b3f544a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great job
err := c1.RPC("Long.Wait", &structs.NodeSpecificRequest{ | ||
QueryOptions: structs.QueryOptions{ | ||
MinQueryIndex: 1, | ||
MaxQueryTime: 20 * time.Millisecond, | ||
}, | ||
}, &out) | ||
require.Error(t, err) | ||
require.Contains(t, err.Error(), "rpc error making call: i/o deadline reached") | ||
require.ErrorContains(t, err, "rpc error making call: i/o deadline reached") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's nice, I didn't know about ErrorContains
// Override the default client timeout but add RPCHoldTimeout | ||
// as a buffer for retries during leadership changes. | ||
timeout = blockingTimeout + p.RPCHoldTimeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also guarantees that the server timeout is longer than the clients in case of leader rotation, which I guess is the whole point?
Adds buffer to clients so that servers have time to respond to blocking queries.
Description
Fixes #15246
Fixes #15537
Reverts a bug I introduced while refactoring:
https://github.com/hashicorp/consul/pull/14965/files#diff-8f12aa7f72e8647fddfac936ab541982aae517babd3064a581bbcdcc2595c2dcL347-R360
Note that Timeout included
rpcHoldTimeout
but I extracted it out ofTimeout
intoHasTimedOut
. But we still need to consider it when calculatingBlockingTimeout
.It's hard to write a test for this due to timeouts being very timing + random jitter dependent. It should not break any existing tests and at most it adds 7s to existing timeouts.
The problem I'm trying to solve is:
if timeouts are nearly identical it is likelier for the client to timeout the connection before the server has had a chance to respond (or time out the blocking query).
With this PR:
Server will always be the one to apply timeout and return from a blocking query early