Use rpcHoldTimeout to calculate blocking timeout #15541

kisunji · 2022-11-23T16:11:16Z

Description

Reverts a bug I introduced while refactoring:
https://github.com/hashicorp/consul/pull/14965/files#diff-8f12aa7f72e8647fddfac936ab541982aae517babd3064a581bbcdcc2595c2dcL347-R360

Note that Timeout included rpcHoldTimeout but I extracted it out of Timeout into HasTimedOut. But we still need to consider it when calculating BlockingTimeout.

It's hard to write a test for this due to timeouts being very timing + random jitter dependent. It should not break any existing tests and at most it adds 7s to existing timeouts.

The problem I'm trying to solve is:

/v1/catalog/services?wait=2s&index=16165

client uses BlockingTimeout to calculate read timeout
2s + 2s/16 = 2.125s

server adds max possible jitter
2s + 2s/16 = 2.125s

if timeouts are nearly identical it is likelier for the client to timeout the connection before the server has had a chance to respond (or time out the blocking query).

With this PR:

GET /v1/catalog/services?wait=2s&index=16165

client uses (*QueryOptions).BlockingTimeout to calculate read timeout
2s + 2s/16 + 7s = 9.125s
             ^ RPCHoldTimeout buffer

server adds max possible jitter in (*Server).rpcQueryTimeout
2s + 2s/16 = 2.125s

Server will always be the one to apply timeout and return from a blocking query early

kisunji · 2022-11-23T22:09:44Z

agent/pool/pool.go

+			// Override the default client timeout but add RPCHoldTimeout
+			// as a buffer for retries during leadership changes.
+			timeout = blockingTimeout + p.RPCHoldTimeout


If all this timeout calculation is confusing, in a nutshell I am re-adding RPCHoldTimeout that I removed in https://github.com/hashicorp/consul/pull/14965/files#diff-8f12aa7f72e8647fddfac936ab541982aae517babd3064a581bbcdcc2595c2dcL347-R360

EXCEPT in the non-blocking case, which is controlled by a different config rpc_client_timeout and should not have rpc_hold_timeout included (since it's only relevant for blocking queries)

This also guarantees that the server timeout is longer than the clients in case of leader rotation, which I guess is the whole point?

Other way around: assuming client and server calculate timeout the same way, it guarantees clients will always have the longer timeout (servers will timeout first, which is desired)

oh right, yep, reversed that in my head

wilkermichael

LGTM, great job

wilkermichael · 2022-11-23T22:32:26Z

agent/consul/client_test.go

 		err := c1.RPC("Long.Wait", &structs.NodeSpecificRequest{
 			QueryOptions: structs.QueryOptions{
 				MinQueryIndex: 1,
 				MaxQueryTime:  20 * time.Millisecond,
 			},
 		}, &out)
 		require.Error(t, err)
-		require.Contains(t, err.Error(), "rpc error making call: i/o deadline reached")
+		require.ErrorContains(t, err, "rpc error making call: i/o deadline reached")


oh that's nice, I didn't know about ErrorContains

wilkermichael · 2022-11-23T22:33:56Z

agent/pool/pool.go

+			// Override the default client timeout but add RPCHoldTimeout
+			// as a buffer for retries during leadership changes.
+			timeout = blockingTimeout + p.RPCHoldTimeout


This also guarantees that the server timeout is longer than the clients in case of leader rotation, which I guess is the whole point?

Adds buffer to clients so that servers have time to respond to blocking queries.

kisunji added backport/1.12 labels Nov 23, 2022

kisunji force-pushed the kisunji/blocking-timeout-bugfix branch from 009ce31 to 606b72a Compare November 23, 2022 17:48

kisunji requested a review from banks November 23, 2022 17:52

kisunji force-pushed the kisunji/blocking-timeout-bugfix branch from 606b72a to d10fbff Compare November 23, 2022 22:06

kisunji requested a review from a team November 23, 2022 22:08

kisunji commented Nov 23, 2022

View reviewed changes

Use rpcHoldTimeout to calculate blocking timeout

b3f544a

kisunji force-pushed the kisunji/blocking-timeout-bugfix branch from d10fbff to b3f544a Compare November 23, 2022 22:20

wilkermichael approved these changes Nov 23, 2022

View reviewed changes

kisunji mentioned this pull request Nov 23, 2022

Large number of error message RPC deadlines after the introduction of rpc_client_timeout #15246

Closed

kisunji merged commit 386da54 into main Nov 24, 2022

kisunji deleted the kisunji/blocking-timeout-bugfix branch November 24, 2022 15:13

kisunji mentioned this pull request Nov 28, 2022

rpc error making call: i/o deadline reached #15537

Closed

jmurret pushed a commit that referenced this pull request Dec 2, 2022

Use rpcHoldTimeout to calculate blocking timeout (#15541)

b95d8cf

Adds buffer to clients so that servers have time to respond to blocking queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use rpcHoldTimeout to calculate blocking timeout #15541

Use rpcHoldTimeout to calculate blocking timeout #15541

kisunji commented Nov 23, 2022 •

edited

Loading

kisunji Nov 23, 2022

kisunji Nov 23, 2022

wilkermichael Nov 23, 2022

kisunji Nov 23, 2022

wilkermichael Nov 23, 2022

wilkermichael left a comment

wilkermichael Nov 23, 2022

wilkermichael Nov 23, 2022

Use rpcHoldTimeout to calculate blocking timeout #15541

Use rpcHoldTimeout to calculate blocking timeout #15541

Conversation

kisunji commented Nov 23, 2022 • edited Loading

Description

kisunji Nov 23, 2022

Choose a reason for hiding this comment

kisunji Nov 23, 2022

Choose a reason for hiding this comment

wilkermichael Nov 23, 2022

Choose a reason for hiding this comment

kisunji Nov 23, 2022

Choose a reason for hiding this comment

wilkermichael Nov 23, 2022

Choose a reason for hiding this comment

wilkermichael left a comment

Choose a reason for hiding this comment

wilkermichael Nov 23, 2022

Choose a reason for hiding this comment

wilkermichael Nov 23, 2022

Choose a reason for hiding this comment

kisunji commented Nov 23, 2022 •

edited

Loading