Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed failure to try next host after single-host connection timeout #7368

Merged
merged 19 commits into from
Oct 1, 2024

Conversation

brettdh
Copy link
Contributor

@brettdh brettdh commented Jul 17, 2023

What do these changes do?

See subject. Also updated the default ClientTimeout params to include sock_connect so that this correct behavior happens by default.

Are there changes in behavior for the user?

Yes; the default socket connect timeout changes from None (no timeout) to 30 seconds.

Related issue number

Closes #7342.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
    • Observed test failing before code change in connector.py; passing after.
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

Also updated the default ``ClientTimeout`` params to include ``sock_connect``
so that this correct behavior happens by default.
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jul 17, 2023
@brettdh
Copy link
Contributor Author

brettdh commented Jul 17, 2023

test_secure_https_proxy_absolute_path is failing on my fork, but only on 3.9 and only on Windows. No idea why, and I'm on a mac so I cannot repro. Please advise?

@Dreamsorcerer
Copy link
Member

test_secure_https_proxy_absolute_path is failing on my fork, but only on 3.9 and only on Windows. No idea why, and I'm on a mac so I cannot repro. Please advise?

It's failing on all Windows tests. The stderr output looks like there might be a problem with proxy.py, but not really sure. @webknjaz has more experience with the proxy stuff, maybe he has an idea.

Traceback (most recent call last):
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\proxy\core\work\threadless.py", line 380, in _run_forever
    if await self._run_once():
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\proxy\core\work\threadless.py", line 341, in _run_once
    work_by_ids, new_work_available = await self._selected_events()
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\site-packages\proxy\core\work\threadless.py", line 265, in _selected_events
    events = self.selector.select(
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\selectors.py", line 324, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "C:\hostedtoolcache\windows\Python\3.9.13\x64\lib\selectors.py", line 315, in _select
    r, w, x = select.select(r, w, w, timeout)
OSError: [WinError 10022] An invalid argument was supplied

@brettdh
Copy link
Contributor Author

brettdh commented Jul 17, 2023

Ah, I misread the Github Actions results; only one windows job reports failuire; the rest just appear "cancelled" until I drill into any of them.

Screenshot image

@Dreamsorcerer
Copy link
Member

Might also be worth testing what specific change causes the test failure. Does the error happen if we change the default timeout, without the TimeoutError change?

@brettdh
Copy link
Contributor Author

brettdh commented Jul 17, 2023

The error goes away if I temporarily revert the change to the default timeout.

@Dreamsorcerer
Copy link
Member

Yeah, I was wondering the other way. Maybe it's caused by the sock_connect timeout being reached, or maybe it's caused by the connection retry..

@brettdh
Copy link
Contributor Author

brettdh commented Jul 17, 2023

Whoops, sorry, I misread your earlier message. The same tests fail when the default timeout has sock_connect=30 set but the loop doesn't catch TimeoutError.

@Dreamsorcerer
Copy link
Member

OK, I guess it's an existing issue with sock_connect timeout then.

@brettdh
Copy link
Contributor Author

brettdh commented Aug 7, 2023

Is there anything else you need from me on this PR? I wasn't really sure about the resolution (or lack thereof) of our last exchange.

@Dreamsorcerer
Copy link
Member

I don't think we can change the default if it's going to cause errors for all Windows users. So, we'll need to figure out a fix for that. I've not had a chance to look into it yet, but if you have time to figure that out, it'd help us get this done.

Copy link

codecov bot commented Jan 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.57%. Comparing base (803d818) to head (b630750).
Report is 713 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7368   +/-   ##
=======================================
  Coverage   98.56%   98.57%           
=======================================
  Files         107      107           
  Lines       34995    35037   +42     
  Branches     4146     4150    +4     
=======================================
+ Hits        34492    34536   +44     
+ Misses        335      334    -1     
+ Partials      168      167    -1     
Flag Coverage Δ
CI-GHA 98.45% <100.00%> (+<0.01%) ⬆️
OS-Linux 98.11% <100.00%> (+<0.01%) ⬆️
OS-Windows 96.53% <100.00%> (+0.01%) ⬆️
OS-macOS 97.80% <100.00%> (+<0.01%) ⬆️
Py-3.10.11 97.67% <100.00%> (+<0.01%) ⬆️
Py-3.10.15 97.60% <100.00%> (+<0.01%) ⬆️
Py-3.11.10 97.67% <100.00%> (+<0.01%) ⬆️
Py-3.11.9 97.75% <100.00%> (+0.01%) ⬆️
Py-3.12.6 98.16% <100.00%> (+<0.01%) ⬆️
Py-3.13.0-rc.2 98.15% <100.00%> (+<0.01%) ⬆️
Py-3.9.13 97.57% <100.00%> (+<0.01%) ⬆️
Py-3.9.20 97.50% <100.00%> (+<0.01%) ⬆️
Py-pypy7.3.16 97.12% <100.00%> (+<0.01%) ⬆️
VM-macos 97.80% <100.00%> (+<0.01%) ⬆️
VM-ubuntu 98.11% <100.00%> (+<0.01%) ⬆️
VM-windows 96.53% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@webknjaz
Copy link
Member

@brettdh could you rebase the branch resolving the merge conflicts, to see if it'd pass? I think there were some fixes to the proxy tests last year.

@Dreamsorcerer
Copy link
Member

@brettdh could you rebase the branch resolving the merge conflicts, to see if it'd pass? I think there were some fixes to the proxy tests last year.

Still failing. I'm not sure it's related to the proxy anyway, I think this is just regular connections.

@Dreamsorcerer
Copy link
Member

Hmm, this seems to have changed. Windows now gives:

    def finish_connect(trans, key, ov):
>       ov.getresult()
E       OSError: [WinError 121] The semaphore timeout period has expired

While other test runs just hang and never complete the test.

@Dreamsorcerer
Copy link
Member

While other test runs just hang and never complete the test.

Locally, create_connection() is called once without a host argument. Not sure what has changed to cause the test to fail...

@bdraco bdraco added backport-3.10 backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot labels Oct 1, 2024
@bdraco
Copy link
Member

bdraco commented Oct 1, 2024

I am mildly worried about the 30 second time out affecting users with poor quality links, examples user classes are aviation, and multi backhaul rural wireless links. Still 30 seconds is a long time , and the speed of light is not that disappointingly slow so I think it’s OK to backport

CHANGES/7342.breaking.rst Outdated Show resolved Hide resolved
@bdraco
Copy link
Member

bdraco commented Oct 1, 2024

Assuming the CI passes, I'll put this on some of my HA production systems for testing

@brettdh
Copy link
Contributor Author

brettdh commented Oct 1, 2024

Sorry for the PR abandonment here, folks, but thanks very much for carrying it forward in my absence. 🙂

@bdraco
Copy link
Member

bdraco commented Oct 1, 2024

baseline

aiohttp_master

@bdraco

This comment was marked as outdated.

@bdraco
Copy link
Member

bdraco commented Oct 1, 2024

oh wait, baseline is a few PRs ahead... need to wait for master to catch up before profiling

@bdraco
Copy link
Member

bdraco commented Oct 1, 2024

This PR with master merged in. Performance is the same within the margin of error

7368

@bdraco
Copy link
Member

bdraco commented Oct 1, 2024

Seems fine on production as well

@bdraco bdraco enabled auto-merge (squash) October 1, 2024 21:15
@bdraco bdraco merged commit 8a8913b into aio-libs:master Oct 1, 2024
35 of 36 checks passed
Copy link
Contributor

patchback bot commented Oct 1, 2024

Backport to 3.10: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 8a8913b on top of patchback/backports/3.10/8a8913be2eefca3d7504880a235664f5dc44f076/pr-7368

Backporting merged PR #7368 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.10/8a8913be2eefca3d7504880a235664f5dc44f076/pr-7368 upstream/3.10
  4. Now, cherry-pick PR Fixed failure to try next host after single-host connection timeout #7368 contents into that branch:
    $ git cherry-pick -x 8a8913be2eefca3d7504880a235664f5dc44f076
    If it'll yell at you with something like fatal: Commit 8a8913be2eefca3d7504880a235664f5dc44f076 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 8a8913be2eefca3d7504880a235664f5dc44f076
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Fixed failure to try next host after single-host connection timeout #7368 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.10/8a8913be2eefca3d7504880a235664f5dc44f076/pr-7368
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Copy link
Contributor

patchback bot commented Oct 1, 2024

Backport to 3.11: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 8a8913b on top of patchback/backports/3.11/8a8913be2eefca3d7504880a235664f5dc44f076/pr-7368

Backporting merged PR #7368 into master

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/aio-libs/aiohttp.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.11/8a8913be2eefca3d7504880a235664f5dc44f076/pr-7368 upstream/3.11
  4. Now, cherry-pick PR Fixed failure to try next host after single-host connection timeout #7368 contents into that branch:
    $ git cherry-pick -x 8a8913be2eefca3d7504880a235664f5dc44f076
    If it'll yell at you with something like fatal: Commit 8a8913be2eefca3d7504880a235664f5dc44f076 is a merge but no -m option was given., add -m 1 as follows instead:
    $ git cherry-pick -m1 -x 8a8913be2eefca3d7504880a235664f5dc44f076
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Fixed failure to try next host after single-host connection timeout #7368 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.11/8a8913be2eefca3d7504880a235664f5dc44f076/pr-7368
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

bdraco pushed a commit that referenced this pull request Oct 1, 2024
…7368)

Co-authored-by: Sam Bull <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: J. Nick Koston <[email protected]>
(cherry picked from commit 8a8913b)
bdraco pushed a commit that referenced this pull request Oct 1, 2024
…7368)

Co-authored-by: Sam Bull <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: J. Nick Koston <[email protected]>
(cherry picked from commit 8a8913b)
bdraco added a commit that referenced this pull request Oct 2, 2024
…r single-host connection timeout (#9390)

Co-authored-by: Sam Bull <[email protected]>
Co-authored-by: pre-commit-ci[bot]
Co-authored-by: J. Nick Koston <[email protected]>
Co-authored-by: Brett Higgins <[email protected]>
bdraco added a commit that referenced this pull request Oct 2, 2024
…r single-host connection timeout (#9391)

Co-authored-by: Sam Bull <[email protected]>
Co-authored-by: pre-commit-ci[bot]
Co-authored-by: J. Nick Koston <[email protected]>
Co-authored-by: Brett Higgins <[email protected]>
@bdraco
Copy link
Member

bdraco commented Oct 2, 2024

Ran in production overnight, no obvious issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-3.11 Trigger automatic backporting to the 3.11 release branch by Patchback robot bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aiohttp does not try connecting to multiple IPs if first connect times out
4 participants