RetryConfig: Add MaxBackoff option (fixes #938) #939

stefreak · 2017-05-20T11:46:19Z

This is a work in progress Pull request so you can have a look already

Questions:

Should this also be exposed to users via the config file?
I did not find a test for the RetryFunc itself, did I just overlook it or should I add one?

sethvargo · 2017-05-20T15:49:36Z

Thank you for the Pull Request. I'm a bit confused by the expected behavior here. Which of the following are you trying to achieve:

At most backoff for a total of X
Never backoff for more than X

As it's written here, it says "never back off for more than X", but the behavior @dadgar described to me seemed more akin to the first option above (at most backoff for a total of X).

As the code is written, if the user chooses a backoff of 2s, attempts of 5, and max-backoff as 10s, the total sleep time will be:

attempt	sleep	total sleep
1	2s	2s
2	4s	6s
3	16s	24s
...	...	...

In this case, as this code is written, attempt 3 will actually sleep for 10s (24s > 10s, so return 10s). This means the total sleep time is actually 18s though.

Now, if implemented as option 2, the third backoff would sleep for 2s (because we've already slept for 8s, which leaves 2s remaining in the total sleep to reach "max backoff").

As a user, I can see both options as being valid. I never want my data to be more than Xs stale, but I also never want to wait more than Ys for a single backoff.

Should this also be exposed to users via the config file?

Yes please

I did not find a test for the RetryFunc itself, did I just overlook it or should I add one?

That'd be nice, but I think we need to agree on the behavior first 😄

@dadgar this is what I was trying to explain in chat. It's very hard to describe this behavior in a way that makes sense to people. In my head "max_backoff" would mean "total maximum sleep time", but another person might thing "no single backoff should exceed this value".

stefreak · 2017-05-20T16:36:18Z

@sethvargo thanks for the quick feedback, I see where the confusion is coming from, naming things is hard right :)

The use case I have in mind is that you are in a situation where you don't want consul_template to ever terminate upon failure, but want it to retry forever.

What you are describing (limiting total sleep) is already kind of possible by limiting number of attempts.

When I set attempts to 0, attempts are unlimited and thus sleep will grow exponentially with every attempt, potentially way too large than necessary.

My idea was that this could simplify changing the nomad template behaviour to retry forever while keeping sane retry intervals facing longer outages.

Not sure what @dadgar has in mind though, I just wanted to contribute this as it was the most annoying part of nomad I discovered and I liked the rest very much so far.

If You have a better idea on how to accomplish this, the naming or how to express this in the config, I'd be happy to implement it. Will also think about it more.

config/retry_test.go

config/retry.go

dadgar · 2017-05-23T18:46:33Z

@stefreak Awesome work! I had this on my backlog to get done! So much appreciated!

@sethvargo The behavior would be something like this with your configuration (As the code is written, if the user chooses a backoff of 2s, attempts of 5, and max-backoff as 10s)

attempt	sleep	total sleep
1	2s	2s
2	4s	6s
3	10s	16s
4	10s	26s
...	...	...

@stefreak explanation of why you would want this is spot on.

sethvargo · 2017-05-23T19:00:13Z

Can we be sure to update the README as well?

config/retry.go

stefreak · 2017-05-24T00:43:01Z

@dadgar @sethvargo thank you for the feedback, now it should be done incl. unit test for RetryFunc

sethvargo · 2017-05-24T15:31:41Z

config/retry.go

@@ -13,6 +13,9 @@ const (
 	// DefaultRetryBackoff is the default base for the exponential backoff
 	// algorithm.
 	DefaultRetryBackoff = 250 * time.Millisecond
+
+	// DefaultRetryMaxBackoff is the default maximum of backoff time
+	DefaultRetryMaxBackoff = 0 * time.Millisecond


Let's merge this PR as-is, but I'd like to get both your thoughts on making this a non-zero default. Obviously this would be a "breaking" change, but I think setting this to a reasonably high value (like 30 minutes) would be a good idea.

Yeah I would change all the defaults. They are too short currently in my opinion.

I would do:

DefaultRetryAttempts = 12 DefaultRetryBackoff = 250 * time.Milliseconds DefaultRetryMaxBackoff = 1 * time.Minute

Leads to:

attempt sleep total

1 .25s .25s

2 .5s .75s

3 1s 1.75s

4 2s ~4s

5 4s ~8s

6 8s ~16s

7 16s ~32s

8 32s ~1m

9 ~1m ~2m

10 ~1m ~3m

11 ~1m ~4m

12 ~1m ~5m

... ... ...

Gives you five minutes of retrying with a good back off ramp and fast early retries.

@dadgar I like that very much

sethvargo · 2017-05-24T19:42:50Z

@dadgar do you wanna submit a PR for those changes?

stefreak · 2017-05-24T20:13:20Z

@sethvargo @dadgar thank you guys, happy to have contributed this :)

stefreak mentioned this pull request May 22, 2017

[WIP] Consul template: expose retry configuration, fixes #2623 hashicorp/nomad#2665

Closed

dadgar reviewed May 23, 2017

View reviewed changes

config/retry_test.go Outdated Show resolved Hide resolved

dadgar reviewed May 23, 2017

View reviewed changes

config/retry.go Outdated Show resolved Hide resolved

stefreak force-pushed the fix/master/retryBackoffMax branch 2 times, most recently from bbccbba to b2ec58c Compare May 23, 2017 23:15

RetryConfig: Add MaxBackoff option (fixes hashicorp#938)

ec42a4f

stefreak force-pushed the fix/master/retryBackoffMax branch from b2ec58c to ec42a4f Compare May 23, 2017 23:17

sethvargo suggested changes May 23, 2017

View reviewed changes

config/retry.go Outdated Show resolved Hide resolved

Expose max_backoff to the end user

cf0e39f

stefreak force-pushed the fix/master/retryBackoffMax branch from ee5d527 to cf0e39f Compare May 24, 2017 00:37

Add unit test for RetryFunc

3637715

stefreak changed the title ~~[WIP] RetryConfig: Add MaxBackoff option (fixes #938)~~ RetryConfig: Add MaxBackoff option (fixes #938) May 24, 2017

Also add max-backoff cli option for vault

6007408

sethvargo approved these changes May 24, 2017

View reviewed changes

sethvargo merged commit f277f55 into hashicorp:master May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RetryConfig: Add MaxBackoff option (fixes #938) #939

RetryConfig: Add MaxBackoff option (fixes #938) #939

stefreak commented May 20, 2017 •

edited

Loading

sethvargo commented May 20, 2017

stefreak commented May 20, 2017 •

edited

Loading

dadgar commented May 23, 2017

sethvargo commented May 23, 2017

stefreak commented May 24, 2017

sethvargo May 24, 2017

dadgar May 24, 2017

stefreak May 24, 2017

sethvargo commented May 24, 2017

stefreak commented May 24, 2017

attempt	sleep	total
1	.25s	.25s
2	.5s	.75s
3	1s	1.75s
4	2s	~4s
5	4s	~8s
6	8s	~16s
7	16s	~32s
8	32s	~1m
9	~1m	~2m
10	~1m	~3m
11	~1m	~4m
12	~1m	~5m
...	...	...

RetryConfig: Add MaxBackoff option (fixes #938) #939

RetryConfig: Add MaxBackoff option (fixes #938) #939

Conversation

stefreak commented May 20, 2017 • edited Loading

sethvargo commented May 20, 2017

stefreak commented May 20, 2017 • edited Loading

dadgar commented May 23, 2017

sethvargo commented May 23, 2017

stefreak commented May 24, 2017

sethvargo May 24, 2017

Choose a reason for hiding this comment

dadgar May 24, 2017

Choose a reason for hiding this comment

stefreak May 24, 2017

Choose a reason for hiding this comment

sethvargo commented May 24, 2017

stefreak commented May 24, 2017

stefreak commented May 20, 2017 •

edited

Loading

stefreak commented May 20, 2017 •

edited

Loading