-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 upload retrying #674
Comments
Thanks for the detailed report. I think the server errors (ie 500/503) are just a child to HTTPStatus errors, so that they would be retried already, see: https://github.com/excon/excon/blob/1e0c0ee3e975b2470262c6c882dccc752382e03c/lib/excon/error.rb#L83. Bumping the interval and retries is probably fine, though a nicer solution might be something like exponential backoff (for some reason I kind of thought that is what we already did, but I must be misremembering as I can't find anything presently that looks like that). Going from 4 to 5 retries seems fine, I guess bumping from 0 to 5 seconds on the retry seems like a much bigger jump. Could we start with something smaller, maybe 1 or 2? As I said above, I kind of feel like backoff might be ideal, but that is more involved in terms of implementation. What do you think? |
Oh, thanks for pointing out. You are right, I had missed it.
I totally agree.
I agree with you. I have looked into retry mechanisms in other libraries for reference. https://github.com/boto/botocore/blob/d95d441ef9b81ac4d0163fa97e736edf63a8b4b3/botocore/data/_retry.json#L219-L240
=> something like 0.5s, 1s, 1.5s 2s, 2.5s
=> 0.1s, 0.2s, 0.4s retry
I think 1s might be enough. |
I'll try this config in advance! |
Sounds good, just let me know if you need anything and what you find. Thanks! |
This configuration seems working well too. |
Sounds good, and thanks for sharing the context (I'm certainly not doing a million requests a day my self, so I had largely missed this error vector). |
We're currently using CarrierWave with fog-aws to upload a large number of files to S3 every day.
However, we sometimes encounter errors such as 500, 503, and Excon::Error::Socket intermittently.
We've found similar issues related to S3 retries on GitHub, such as #264 and carrierwaveuploader/carrierwave#2610.
After investigating the issue, we discovered that the fog storage upload request is configured as an idempotent request and is already retrying
threefour times using Excon, as explained in this comment. However,further investigation revealed that Excon seems not retrying on 500 nor 503 errors (if my understanding is correct)wasn't true. See the conversation below. Additionally, the defaultretry_interval
of 0 seconds may be too short.Therefore, we changed our Excon configuration for fog storage to the following:
This change greatly improved our S3 retries, and we haven't encountered an S3 error for a month with this configuration.
Do you think it would be reasonable to change the default configuration for fog storage to this?
The text was updated successfully, but these errors were encountered: