Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RetryConfig: Add MaxBackoff option (fixes #938) #939

Merged
merged 4 commits into from
May 24, 2017

Conversation

stefreak
Copy link
Contributor

@stefreak stefreak commented May 20, 2017

This is a work in progress Pull request so you can have a look already

Questions:

  • Should this also be exposed to users via the config file?
  • I did not find a test for the RetryFunc itself, did I just overlook it or should I add one?

@sethvargo
Copy link
Contributor

Hi @stefreak

Thank you for the Pull Request. I'm a bit confused by the expected behavior here. Which of the following are you trying to achieve:

  1. At most backoff for a total of X
  2. Never backoff for more than X

As it's written here, it says "never back off for more than X", but the behavior @dadgar described to me seemed more akin to the first option above (at most backoff for a total of X).

As the code is written, if the user chooses a backoff of 2s, attempts of 5, and max-backoff as 10s, the total sleep time will be:

attempt sleep total sleep
1 2s 2s
2 4s 6s
3 16s 24s
... ... ...

In this case, as this code is written, attempt 3 will actually sleep for 10s (24s > 10s, so return 10s). This means the total sleep time is actually 18s though.

Now, if implemented as option 2, the third backoff would sleep for 2s (because we've already slept for 8s, which leaves 2s remaining in the total sleep to reach "max backoff").

As a user, I can see both options as being valid. I never want my data to be more than Xs stale, but I also never want to wait more than Ys for a single backoff.

Should this also be exposed to users via the config file?

Yes please

I did not find a test for the RetryFunc itself, did I just overlook it or should I add one?

That'd be nice, but I think we need to agree on the behavior first 😄

@dadgar this is what I was trying to explain in chat. It's very hard to describe this behavior in a way that makes sense to people. In my head "max_backoff" would mean "total maximum sleep time", but another person might thing "no single backoff should exceed this value".

@stefreak
Copy link
Contributor Author

stefreak commented May 20, 2017

@sethvargo thanks for the quick feedback, I see where the confusion is coming from, naming things is hard right :)

The use case I have in mind is that you are in a situation where you don't want consul_template to ever terminate upon failure, but want it to retry forever.

What you are describing (limiting total sleep) is already kind of possible by limiting number of attempts.

When I set attempts to 0, attempts are unlimited and thus sleep will grow exponentially with every attempt, potentially way too large than necessary.

My idea was that this could simplify changing the nomad template behaviour to retry forever while keeping sane retry intervals facing longer outages.

Not sure what @dadgar has in mind though, I just wanted to contribute this as it was the most annoying part of nomad I discovered and I liked the rest very much so far.

If You have a better idea on how to accomplish this, the naming or how to express this in the config, I'd be happy to implement it. Will also think about it more.

config/retry_test.go Outdated Show resolved Hide resolved
config/retry.go Outdated Show resolved Hide resolved
@dadgar
Copy link
Contributor

dadgar commented May 23, 2017

@stefreak Awesome work! I had this on my backlog to get done! So much appreciated!

@sethvargo The behavior would be something like this with your configuration (As the code is written, if the user chooses a backoff of 2s, attempts of 5, and max-backoff as 10s)

attempt sleep total sleep
1 2s 2s
2 4s 6s
3 10s 16s
4 10s 26s
... ... ...

@stefreak explanation of why you would want this is spot on.

@sethvargo
Copy link
Contributor

Can we be sure to update the README as well?

@stefreak stefreak force-pushed the fix/master/retryBackoffMax branch 2 times, most recently from bbccbba to b2ec58c Compare May 23, 2017 23:15
@stefreak stefreak force-pushed the fix/master/retryBackoffMax branch from b2ec58c to ec42a4f Compare May 23, 2017 23:17
config/retry.go Outdated Show resolved Hide resolved
@stefreak stefreak force-pushed the fix/master/retryBackoffMax branch from ee5d527 to cf0e39f Compare May 24, 2017 00:37
@stefreak stefreak changed the title [WIP] RetryConfig: Add MaxBackoff option (fixes #938) RetryConfig: Add MaxBackoff option (fixes #938) May 24, 2017
@stefreak
Copy link
Contributor Author

@dadgar @sethvargo thank you for the feedback, now it should be done incl. unit test for RetryFunc

@@ -13,6 +13,9 @@ const (
// DefaultRetryBackoff is the default base for the exponential backoff
// algorithm.
DefaultRetryBackoff = 250 * time.Millisecond

// DefaultRetryMaxBackoff is the default maximum of backoff time
DefaultRetryMaxBackoff = 0 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this PR as-is, but I'd like to get both your thoughts on making this a non-zero default. Obviously this would be a "breaking" change, but I think setting this to a reasonably high value (like 30 minutes) would be a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would change all the defaults. They are too short currently in my opinion.

I would do:

DefaultRetryAttempts = 12
DefaultRetryBackoff = 250 * time.Milliseconds
DefaultRetryMaxBackoff = 1 * time.Minute

Leads to:

attempt sleep total
1 .25s .25s
2 .5s .75s
3 1s 1.75s
4 2s ~4s
5 4s ~8s
6 8s ~16s
7 16s ~32s
8 32s ~1m
9 ~1m ~2m
10 ~1m ~3m
11 ~1m ~4m
12 ~1m ~5m
... ... ...

Gives you five minutes of retrying with a good back off ramp and fast early retries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dadgar I like that very much

@sethvargo sethvargo merged commit f277f55 into hashicorp:master May 24, 2017
@sethvargo
Copy link
Contributor

@dadgar do you wanna submit a PR for those changes?

@stefreak
Copy link
Contributor Author

@sethvargo @dadgar thank you guys, happy to have contributed this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants