-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimum throughput detection incorrectly raises an error #1202
Comments
Hi @GeorgeHahn, thank you for bringing this to our attention and for providing detailed reproduction steps. We will investigate to see if we can first replicate the issue on our end. |
Still trying to reproduce it on our end. In the meantime, could you provide us with the following information?
|
The objects in this case are typically 4-10MiB. Log attached. I removed the config and setup log lines for simplicity. Let me know if there are any details you'd like to see from that part. |
Thank you, we've reproduced the issue on our end. Will further investigate it. |
## Motivation and Context awslabs/aws-sdk-rust#1202 ## Description The issue above demonstrated the incorrect [BinLabel](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L100-L119) ordering in [LogBuffer](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L173-L183), the underlying data structure we use for stall stream protection. The following trace logs are generated from executing the reproduction steps in the issue above. In the file labeled "no_sleep," we have commented out `std::thread::sleep(std::time::Duration::from_millis(120));` from the reproducer so the updated code can be tested as the happy path. [s3_throughput_min_repro_no_sleep.log](https://github.com/user-attachments/files/17299373/s3_throughput_min_repro_no_sleep.log) [s3_throughput_min_repro_with_sleep.log](https://github.com/user-attachments/files/17299447/s3_throughput_min_repro_with_sleep.log) In both files, it’s important to note that `Bin`s assigned `TransferredBytes` can be overwritten by `Pending` due to [`ThroughputLogs::push`](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L346). Once a `Bin` is labeled as `Pending`, it cannot be re-labeled. When this occurs, the only way to avoid the stall stream protection check going into the grace period is for time to advance beyond the current `Bin`'s resolution, the `LogBuffer` pushes a new `Bin` during [`catch_up`](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/throughput.rs#L355) , and this new `Bin` hopefully [gets assigned](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/http_body_0_4_x.rs#L78-L79) a `TransferredBytes`. However, this new `Bin` could also be overwritten by Pending in a subsequent call to [`MinimumThroughputDownloadBody::poll_data`](https://github.com/smithy-lang/smithy-rs/blob/07fe426697cc30ad613568902528c305f953deb1/rust-runtime/aws-smithy-runtime/src/client/http/body/minimum_throughput/http_body_0_4_x.rs#L78-L79), which can trigger the the grace period if the overall `LogBuffer` looks like it's violated the stall stream protection check. The reproducer without sleep does not fail the stall stream protection obviously because the execution completes way before the grace period ends, but more importantly because the execution periodically assigns new `TransferredBytes` `Bin`s in the throughput logs. This effectively resets the grace period for the stall stream protection (search for `throughput recovered; exiting grace period` in the `s3_throughput_min_repro_no_sleep.log`). However, with sleep, `Bin`s labeled as `TransferredBytes` are frequently (and almost immediately) overwritten by `Pending`. This results in the execution being unable to exit the grace period, ultimately leading to a stall stream protection error. To resolve this, we make `TransferredBytes` be the top priority in `BinLabel`. This means once a new `Bin` has earned `TransferredBytes`, it's green for that time resolution and that it should not be revoked by `Pending` overwriting it to make it look like no bytes transferred during that time. ## Testing - Existing tests in CI - Added unit tests for `BinLabel` ordering and for `ThroughputLogs` - Passed the customer's reproduction step - To confirm the stall stream protection for download still works, I switched off WiFi while running the customer's reproducer (with sleep) and it successfully failed with the stall stream protection error: ``` ---- s3_throughput_min_repro stdout ---- 2024-10-08T23:29:24.999477Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s 2024-10-08T23:29:24.999513Z TRACE aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: received poll pending 2024-10-08T23:29:24.999530Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s 2024-10-08T23:29:25.081811Z TRACE aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: received poll pending 2024-10-08T23:29:25.081938Z DEBUG aws_smithy_runtime::client::http::body::minimum_throughput::http_body_0_4_x: current throughput: 0 B/s is below minimum: 1 B/s test s3_throughput_min_repro ... FAILED ... called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: Error { kind: StreamingError(ThroughputBelowMinimum { expected: Throughput { bytes_read: 1, per_time_elapsed: 1s }, actual: Throughput { bytes_read: 0, per_time_elapsed: 1s } }) } } ``` ## Checklist <!--- If a checkbox below is not applicable, then please DELETE it rather than leaving it unchecked --> - [x] For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "client," "server," or both in the `applies_to` key. - [x] For changes to the AWS SDK, generated SDK code, or SDK runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "aws-sdk-rust" in the `applies_to` key. ---- _By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice._
This should be fixed in today's release of the aws-sdk-s3 crate (1.55.00). Thank you for the reproduction, we've used it to add an integration test so we can detect if this regresses in the future! Fix: smithy-lang/smithy-rs#3871 |
Comments on closed issues are hard for our team to see. |
Describe the bug
We observe frequent low throughput errors when reading a streamed S3 body in a lambda function. Our theory is that the decompression and business logic for processing the stream cause the async runtime to behave in a CPU-constrained manner and appears to break keepalive on the underlying stream. This appears to be a worst-case scenario for the throughput estimation logic. The attached reproduction returns an error even though reads separated by 120ms are able to receive data.
Regression Issue
Expected Behavior
Minimum throughput errors should not be raised unless the connection is unable to provide data.
Current Behavior
A minimum throughput error is incorrectly raised.
Reproduction Steps
Cargo.toml
main.rs
Possible Solution
No response
Additional Information/Context
No response
Version
The text was updated successfully, but these errors were encountered: