-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override Config aws-smithy-client #567
Comments
thanks! We should figure out the right knobs to enable quicker retries for services like DynamoDb. |
Hey @ymwjbxxq, I'm going to look into this issue. While we (the Rust SDK team) aim for parity with the other SDKs, we don't necessarily want to copy their behavior exactly. There is a relatively new internal-only AWS proposal doc that defines how SDKs should handle retries and also covers service-specific defaults. I'd like to stick to their recommendations as much as possible. I'm also going to do a survey of how multiple SDKs currently handle retries and retry configuration. While I appreciate your PR, I think removing the random element is not the right approach, and I hope you don't mind me proposing and alternative. This might take me some time, but I want to make sure we solve this problem thoroughly enough that we don't have to revisit it later and cause churn for people with custom retry configuration. Some extra context on backoff and jitter: |
It is ok do not worry. Maybe it could be random if I do not overwrite the configuration but it does not really make sense if the random is bigger than the overall timeout. I will report again how I would use it to achieve high performance regardless from the service
Given these settings:
I would expect:
Current implementation with these settings:
The current implementation goes against the idea of timeout and in fact, I do not have any retry because I timeout before unless I let the SDK do their things and eventually come back with some results. To sum up:
|
Hi @Velfi I read your links about backoff and jitter, and I think a FullJitterBackoffStrategy would be:
this will generate something like:
or to change the code:
At least the overall timeout with retry included is a value that can be calculated considering the worst-case scenario. Anyway I will wait for you guys |
@ymwjbxxq Thank you for doing so much leg work! These posts are helpful. |
@Velfi I did extra tests:
The results are with my expectations. A faster response!!! My configuration for example, with max 2 retries:
What I conclude is the SDK should apply maybe the:
The same rule should applies to all services. To make the SDK life easier, I would not apply any strategy based on errors but use the FullJitterBackoffStrategy with a BASE of 500ms and allow the user to overwrite the base. This should fix all the needs of SDK users. |
Hey @ymwjbxxq, here's the latest: I've been poring over internal specs and documentation on retries and I think I finally have a pretty good view on the state of things. Here are my findings:
In the Rust SDK, we have implemented backoff delay calculation this way: // b is a random number between 0 and 1
// r is a constant that is always set to 2
let backoff = b * (r.pow(self.local.attempts - 1) as f64);
let backoff = Duration::from_secs_f64(backoff).min(self.config.max_backoff);
My next subject of investigation is to find out why AWS used to recommend retries on the order of milliseconds when the new strategy is acting on the scale of seconds. Once I can get a solid answer on that, I believe I'll have enough information to decide on a resolution. |
Hey @ymwjbxxq , my retry PR got merged and is now pending release. We'll try to get a release out this week and then you can let me know if the update improves things for your use case. |
Great @Velfi to hear it. I will test it out once it is out. |
This went out in the July 21st SDK release. |
|
Describe the feature
Hi all,
as a follow-up of this issue #558
where we can set the timeout for the client to be latency aware with the retries
I have noticed the following:
DEBUG aws_smithy_client::retry: attempt 1 failed with Error(TransientError); retrying after 435.692227ms
And the aws-smithy-client/src/retry.rs is configured as:
And I have no way to overwrite the default parameters.
From the test, I cannot get exactly where the initial value of the backoff is coming from
https://github.com/awslabs/smithy-rs/blob/a0539e20b069a7de021c84521d8f3c7ba098ad6d/rust-runtime/aws-smithy-client/src/retry.rs#L276
It seems to me that this line of code is not correct:
Because:
It should be based on milliseconds https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7deed195b0811c97887dbcd6c98f1a6e
If we can pass the backoff value it could become
Use Case
When you are latency aware, you should be able to overwrite the default parameters to configure the backoff.
In my configuration case, I have the following:
the back-off moves the second retry after 435.692227ms
DEBUG aws_smithy_client::retry: attempt 1 failed with Error(TransientError); retrying after 435.692227ms
This means that it will never retry a second time.
Proposed Solution
Be able to overwrite the backoff policy.
If I wish to have this configuration
I could setup the backoff policy at 10ms. For example, I would have the following after the first failed attempt.
Retry 1 -> retrying after 20ms
Retry 2 -> retrying after 40ms
Retry 3 -> retrying after 60ms
Now I can setup my overall duration "with_call_timeout" to 210ms, giving the retries time to happen.
Other Information
No response
Acknowledgements
A note for the community
Community Note
The text was updated successfully, but these errors were encountered: