Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry certain errors more times #543

Closed
wants to merge 4 commits into from
Closed

Retry certain errors more times #543

wants to merge 4 commits into from

Conversation

ericphanson
Copy link
Member

@ericphanson ericphanson commented Apr 18, 2022

Closes #542
Closes #541

@mattBrzezinski
Copy link
Member

Mind just version bumping in here?

@ericphanson
Copy link
Member Author

Bumped!

@ericphanson
Copy link
Member Author

bors merge

@bors
Copy link
Contributor

bors bot commented Apr 25, 2022

🔒 Permission denied

Existing reviewers: click here to make ericphanson a reviewer

@mattBrzezinski
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Apr 25, 2022
543: Retry 500 errors more times r=mattBrzezinski a=ericphanson

Closes #542

Co-authored-by: Eric Hanson <[email protected]>
@mattBrzezinski
Copy link
Member

Added you as a maintainer for the repo :)

@bors
Copy link
Contributor

bors bot commented Apr 25, 2022

Build failed:

@ericphanson
Copy link
Member Author

Ahh, we are relying on exceeding the number of retries in some tests, I think. I'll have to update them.

@ericphanson
Copy link
Member Author

I think maybe that was it; let's see if it goes through:

bors r+

@bors
Copy link
Contributor

bors bot commented May 11, 2022

🔒 Permission denied

Existing reviewers: click here to make ericphanson a reviewer

@mattBrzezinski
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request May 11, 2022
543: Retry 500 errors more times r=mattBrzezinski a=ericphanson

Closes #542

Co-authored-by: Eric Hanson <[email protected]>
@mattBrzezinski
Copy link
Member

Updated you to an admin, should fix your perms 🤷

@bors
Copy link
Contributor

bors bot commented May 11, 2022

Build failed:

@ericphanson
Copy link
Member Author

Hm, it says the build was cancelled; not really sure how that can happen

@ericphanson
Copy link
Member Author

Going to just try again

bors r+

bors bot added a commit that referenced this pull request May 11, 2022
543: Retry 500 errors more times r=ericphanson a=ericphanson

Closes #542

Co-authored-by: Eric Hanson <[email protected]>
@bors
Copy link
Contributor

bors bot commented May 11, 2022

Build failed:

@ericphanson
Copy link
Member Author

Cancelled again 🤔

@ericphanson ericphanson changed the title Retry 500 errors more times Retry certain errors more times May 11, 2022
@ericphanson
Copy link
Member Author

bors try

bors bot added a commit that referenced this pull request May 12, 2022
@bors
Copy link
Contributor

bors bot commented May 12, 2022

try

Build failed:

@ericphanson
Copy link
Member Author

Oh! I know what's happening. It is reaching the 30 minute timeout and being cancelled. So it is likely a problem in the code, not a bors problem.

@ericphanson
Copy link
Member Author

ericphanson commented May 12, 2022

I wonder if Retry.jl's exponential delays just become too huge 😂

edit:

julia> sleep(x::Float64) = println("Would sleep for $x")
sleep (generic function with 1 method)

julia>  @repeat n try; error()
       catch
       @delay_retry if true end
       end
Would sleep for 0.052740343728706396
Would sleep for 0.5430940435531718
Would sleep for 5.6191153198632655
Would sleep for 43.07752844528991
Would sleep for 556.7462558824232
Would sleep for 4004.774901154357
Would sleep for 54213.91981134377
Would sleep for 502438.01830805355
Would sleep for 5.625640217525721e6

5e6! That's a long time to sleep. Ok, that must be the problem. Base.retry lets you set a max, and AWS recommends a 20s max (with only 3 retries by default!) -- https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html.

@ericphanson
Copy link
Member Author

Closing in favor of #550

@ericphanson ericphanson deleted the eph/retry-more branch May 12, 2022 17:46
bors bot added a commit that referenced this pull request May 25, 2022
548: translate Retry to `Base.retry` r=ericphanson a=ericphanson

Suggested in #542 (comment)

I'm not sure we should merge this as-is, because before we could do immediate retries OR delay retries, and now we can only do delay retries. ~~I'm also not 100% sure the delay amounts are the same.~~

The functional approach is nicer than the macro approach though in my opinion, ~~since the retries don't need to be literals~~. So we could let folks request more retries on a per-request basis, or even put it in the Backend struct.

edit 1: Retry.jl expands to a for-loop, so non-literals are OK. I thought it unrolled the loop.

edit 2: the delay amounts are definitely not the same, but that's a good thing- now we copy the upstream recommendation from https://docs.aws.amazon.com/sdkref/latest/guide/feature-retry-behavior.html. In particular, putting a cap of 20s means we can increase the retries further without danger, xref #550 (we can't do that with Retry.jl since the retries are hard-coded with no cap and get too big- ref #543 (comment)).

---
12 May update:

I'm now in favor of merging this, because I want to get in #550. The first few retries are pretty short, e.g.
```julia
julia> collect(AWS.AWSExponentialBackoff(; max_attempts=10))
9-element Vector{Any}:
  0.3365311959885935
  1.2370247946870188
  2.925043257220695
 12.165245886682294
 20.0
 20.0
  5.5969335490336505
 20.0
 20.0
```
So I think having a short delay instead of an immediate retry in a few cases is probably OK. Also, the AWS docs don't talk about non-delayed retries, they seem to only do delayed retries.

Co-authored-by: Eric Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increase number of retries for 500 errors? Missing retry for "resolve host" error with Downloads backend?
2 participants