Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(download): work around hyper hang issue by adjusting reqwest config #3855

Merged
merged 2 commits into from
Jun 4, 2024

Conversation

rami3l
Copy link
Member

@rami3l rami3l commented Jun 3, 2024

This PR tries to mitigate #3852 by applying Azure/azure-sdk-for-rust#1550 's hack to all the ClientBuilder's I can find across the repo.

The verification of the mitigation is not easy, since async execution is not so deterministic. Nevertheless, as this seems to be a widely encountered issue with hyper and reqwest, I suggest we merge this first and see how it goes.

@rami3l rami3l requested a review from djc June 3, 2024 06:57
@rami3l rami3l added this to the 1.28.0 milestone Jun 3, 2024
@rami3l rami3l requested a review from rbtcollins June 4, 2024 10:20
@@ -345,6 +345,10 @@ pub mod reqwest_be {

fn client_generic() -> ClientBuilder {
Client::builder()
// HACK: set `pool_max_idle_per_host` to `0` to avoid an issue in the underlying
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is necessary/why there is no better solution for this?

Copy link
Member Author

@rami3l rami3l Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djc The underlying issue is hyperium/hyper#2312, see hyperium/hyper#2312 (comment) for the first appearance of this hack. It is suspected that there is a hyper pool deadlock underneath, and this workaround has been used by many to successfully mitigate the issue (see all the external references below).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a good reproduction scenario?

Copy link
Member Author

@rami3l rami3l Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djc Unfortunately not yet, and that's the tricky part of this thing I guess... If it really is a concurrency bug, then it'll be rather hard to track and to fix (that's also why that issue has been open for such a long time I guess, a previous project that I collaborated on has also hit it back in 2020, and the same hack was applied 🤔).

As far as our CI is concerned, we can see several failures in the past 5 days caused by this one, and they are all "solved" by relaunching the test task. The particular test(s) that failed varied from time to time, but I'm sure that all those tests are reqwest-related.

Anyway, this PR intentionally won't close #3852, and I'll keep an eye on hyperium/hyper#2312 as well as our CI in the upcoming weeks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djc The workaround seems to be working: our CI hasn't been failing for this reason in the past 5 days.

@rami3l rami3l added this pull request to the merge queue Jun 4, 2024
Merged via the queue into master with commit 91ca378 Jun 4, 2024
22 checks passed
@rami3l rami3l deleted the fix/ci-dispatch-task branch June 4, 2024 13:57
@rami3l rami3l mentioned this pull request Oct 6, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants