-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RetryConfig: Add MaxBackoff option (fixes #938) #939
RetryConfig: Add MaxBackoff option (fixes #938) #939
Conversation
Hi @stefreak Thank you for the Pull Request. I'm a bit confused by the expected behavior here. Which of the following are you trying to achieve:
As it's written here, it says "never back off for more than X", but the behavior @dadgar described to me seemed more akin to the first option above (at most backoff for a total of X). As the code is written, if the user chooses a backoff of 2s, attempts of 5, and max-backoff as 10s, the total sleep time will be:
In this case, as this code is written, attempt 3 will actually sleep for 10s (24s > 10s, so return 10s). This means the total sleep time is actually 18s though. Now, if implemented as option 2, the third backoff would sleep for 2s (because we've already slept for 8s, which leaves 2s remaining in the total sleep to reach "max backoff"). As a user, I can see both options as being valid. I never want my data to be more than Xs stale, but I also never want to wait more than Ys for a single backoff.
Yes please
That'd be nice, but I think we need to agree on the behavior first 😄 @dadgar this is what I was trying to explain in chat. It's very hard to describe this behavior in a way that makes sense to people. In my head "max_backoff" would mean "total maximum sleep time", but another person might thing "no single backoff should exceed this value". |
@sethvargo thanks for the quick feedback, I see where the confusion is coming from, naming things is hard right :) The use case I have in mind is that you are in a situation where you don't want consul_template to ever terminate upon failure, but want it to retry forever. What you are describing (limiting total sleep) is already kind of possible by limiting number of attempts. When I set attempts to 0, attempts are unlimited and thus sleep will grow exponentially with every attempt, potentially way too large than necessary. My idea was that this could simplify changing the nomad template behaviour to retry forever while keeping sane retry intervals facing longer outages. Not sure what @dadgar has in mind though, I just wanted to contribute this as it was the most annoying part of nomad I discovered and I liked the rest very much so far. If You have a better idea on how to accomplish this, the naming or how to express this in the config, I'd be happy to implement it. Will also think about it more. |
@stefreak Awesome work! I had this on my backlog to get done! So much appreciated! @sethvargo The behavior would be something like this with your configuration (As the code is written, if the user chooses a backoff of 2s, attempts of 5, and max-backoff as 10s)
@stefreak explanation of why you would want this is spot on. |
Can we be sure to update the README as well? |
bbccbba
to
b2ec58c
Compare
b2ec58c
to
ec42a4f
Compare
ee5d527
to
cf0e39f
Compare
@dadgar @sethvargo thank you for the feedback, now it should be done incl. unit test for RetryFunc |
@@ -13,6 +13,9 @@ const ( | |||
// DefaultRetryBackoff is the default base for the exponential backoff | |||
// algorithm. | |||
DefaultRetryBackoff = 250 * time.Millisecond | |||
|
|||
// DefaultRetryMaxBackoff is the default maximum of backoff time | |||
DefaultRetryMaxBackoff = 0 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this PR as-is, but I'd like to get both your thoughts on making this a non-zero default. Obviously this would be a "breaking" change, but I think setting this to a reasonably high value (like 30 minutes) would be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I would change all the defaults. They are too short currently in my opinion.
I would do:
DefaultRetryAttempts = 12
DefaultRetryBackoff = 250 * time.Milliseconds
DefaultRetryMaxBackoff = 1 * time.Minute
Leads to:
attempt | sleep | total |
---|---|---|
1 | .25s | .25s |
2 | .5s | .75s |
3 | 1s | 1.75s |
4 | 2s | ~4s |
5 | 4s | ~8s |
6 | 8s | ~16s |
7 | 16s | ~32s |
8 | 32s | ~1m |
9 | ~1m | ~2m |
10 | ~1m | ~3m |
11 | ~1m | ~4m |
12 | ~1m | ~5m |
... | ... | ... |
Gives you five minutes of retrying with a good back off ramp and fast early retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dadgar I like that very much
@dadgar do you wanna submit a PR for those changes? |
@sethvargo @dadgar thank you guys, happy to have contributed this :) |
This is a work in progress Pull request so you can have a look already
Questions: