Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index URLs containing unescaped "@" in username cause "Failed to parse" error in pip 19.2 #6775

Closed
bomri opened this issue Jul 23, 2019 · 66 comments
Labels
auto-locked Outdated issues that have been locked by automation C: finder PackageFinder and index related code type: bug A confirmed bug or unintended behavior

Comments

@bomri
Copy link

bomri commented Jul 23, 2019

Environment

  • pip version: 19.2
  • Python version: 3.6.8
  • OS: ubuntu 18.04

Description
After updating to pip-19.2 from pip-19.1.1 installing packages with --extra-index-url doesn't work and I get:

ERROR: Could not find a version that satisfies the requirement package_name (from versions: none)
ERROR: No matching distribution found for package_name

going back to 19.1.1 works fine

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Jul 23, 2019
@shay108
Copy link

shay108 commented Jul 23, 2019

+1
Yup. Happened to us as well.

@TTRh
Copy link

TTRh commented Jul 23, 2019

We also have issue with pip-19.2 when using specific index url :

Looking in indexes: https://<email.com>:****@<artifactory_host>/artifactory/api/pypi/pypi-virtual/simple
Collecting setuptools
ERROR: Could not install packages due to an EnvironmentError: Failed to parse: https://<email.com>:[secure]@<artifactory_host>/artifactory/api/pypi/pypi-virtual/simple/setuptools/

It's working fine with pip-19.1.1. Could it be related ?

@cletomartin
Copy link

@TTRh just saw the same and open #6776. Not sure if it should be handled in the same issue though.

@chrahunt
Copy link
Member

@bomri, can you please re-run with --verbose and provide the output?

@pradyunsg pradyunsg changed the title error installing packages from --extra-index-url in pip 19.2 Index URLs containing @ in username are not parsed appropriately in pip 19.2 Jul 23, 2019
@brainerazer
Copy link

Might be related to:
urllib3/urllib3#1640
urllib3/urllib3#1586

@philipp-lein
Copy link

philipp-lein commented Jul 23, 2019

We had the same problem with a special character in our password.

@pradyunsg
Copy link
Member

Hi folks! Thanks for filing this issue and chiming in here.

It'd be best that if you're facing this issue, that you use GitHub's reactions on the first post and subscribe to this issue. That way we can avoid "me too" comments which don't really add much value to the discussion.

It'll make it easier for the maintainers to discuss how to go about resolving this issue, in this issue itself and provide updates to y'all more easily. 🙃

@bomri
Copy link
Author

bomri commented Jul 23, 2019

@chrahunt running with --verbose I get 404 on the pypi.org as expected (since the package doesn't exists there) but from my own pypi server I get:

403 Client Error: Forbidden for url:...

maybe it is related to special characters in the token id like @brainerazer mentioned

@slafs
Copy link

slafs commented Jul 23, 2019

This has bitten us too. Not sure if this issue is related to the ones @brainerazer linked for urllib3. Our issue appears when trying to download a released .zip package from GitHub with an access token. So stuff like pip install https://<my-access-token>@github.com/myorganization/myprivaterepo/archive/myrelease.zip#egg=mypackage doesn't work anymore with pip 19.2 (works for 19.1.1 as someone already mentioned).

I've created an empty private repo, with an empty release to test that. The output for pip install -vvv ... can be found here:

The issue/difference can be observed around Starting new HTTPS connection (1): github.com:443.

@pradyunsg pradyunsg added C: finder PackageFinder and index related code type: bug A confirmed bug or unintended behavior and removed S: needs triage Issues/PRs that need to be triaged labels Jul 23, 2019
@pradyunsg
Copy link
Member

Maybe @cjerdonek knows more about this stuff than me.

@slafs
Copy link

slafs commented Jul 23, 2019

I've also run a git bisect session for this issue and it pointed to this commit: eeb74aeb29 c63ee61

@pradyunsg
Copy link
Member

My thoughts: the current behavior is more technically correct and people should start using the new keyring support for this stuff.

At the same time, it definitely isn't nice that this broke existing reasonable workflows. I think if someone can figure out how to re-introduce the legacy behavior here, that'd be great.

Based on pip's documented deprecation policy, we don't need to have a deprecation cycle for such undocumented behaviors. None the less, we should do a deprecation cycle, warning users to switch to the more "correct" syntax or tooling here before dropping support and switching to a nicer error message.


I'm not sure that the bisection found the right commit here but none the less, thanks for doing that!

@shay108
Copy link

shay108 commented Jul 23, 2019

@pradyunsg, maybe I misunderstand.. can you please direct me to the needed "current behavior"?
If we need to refactor the way we use 3rd party repos (in our case gemfury.com) then that's fine. I just don't understand what the right syntax should be :)

@slafs
Copy link

slafs commented Jul 23, 2019

I'm not sure that the bisection found the right commit here but none the less, thanks for doing that!

you're totally right. I corrected my test script and now it points to c63ee61 which makes more sense now I guess

@vibhuyadav

This comment has been minimized.

@pradyunsg
Copy link
Member

pradyunsg commented Jul 23, 2019

can you please direct me to the needed "current behavior"?

Whoops. Yes. (This is why I needed a nap)

The issue here is that the parts of a URL should be... urlencoded.

Thus, the @ in the email used, as a username, should be %40. The following is in fact not a valid URL based on the RFC that defines these things:

The correct way to represent this information would be:

https://awesome%40pradyunsg.me:[email protected]/pypi

If someone has an issue with that... They should probably take that up with the folks who wrote that RFC. It's mentioned in the urllib issue linked further up in this thread.

Aside: @markuszoeller had pointed out the workaround in a different issue that's been closed as a duplicate of this.


What I think I'm suggesting in that comment... is that pip could detect when the user does use such invalid URLs (since it accepted that in the past), warn the user, and work like 19.1.1 -- after some time, the warning and workaround would be removed.

If y'all do think that's necessary after this breakage, do let us know.

I'll be more than happy to just skip the additional work if it's not helpful to the folks affected by it -- there's no point in doing things that are not useful for the end users. 🙃

@pradyunsg
Copy link
Member

pradyunsg commented Jul 23, 2019

I corrected my test script and now it points to c63ee61 which makes more sense now I guess

Indeed. I can see there why we weren't catching these uses earlier -- pip was doing "home grown" processing of the index URL's auth information which obviously didn't consider such input invalid. What changed is we're now using a urllib utility function to get parts of the URL, which is stricter and RFC compliant.

@slafs
Copy link

slafs commented Jul 23, 2019

pip was doing "home grown" processing of the index URL's auth information which obviously didn't consider such input invalid. What changed is we're now using a urllib3 utility function to get parts of the URL, which is stricter and RFC compliant.

right, now I get it, thanks @pradyunsg. For those watching at home ;) as a fix one can pass https://<my-access-token>:@github.com/myorganization/myprivaterepo/archive/myrelease.zip#egg=mypackage (notice the : after the access token) which makes the URL more RFC conformant (does it?) and pip will happily install the package.

EDIT:
or (even better) https://:<my-access-token>@github.com/...

@pradyunsg
Copy link
Member

makes the URL more RFC conformant (does it?)

No clue. I'm not a subject matter expert.

All that I've said above is basically a well informed guess that I've made by looking at all the relevant bits of code on my phone... because my laptop is too far away from the bed. 🙃

@pradyunsg
Copy link
Member

pradyunsg commented Jul 23, 2019

https://:@github.com/myorganization/myprivaterepo/

Noting the last thing that I'm going to say here based on (only) intuition + memory, this seems like pip incorrectly requiring both username / password components to be present.

If that's the case, it's almost certainly a bug, albeit a separate one.

@bomri bomri closed this as completed Jul 23, 2019
@bomri bomri reopened this Jul 23, 2019
@bomri
Copy link
Author

bomri commented Jul 23, 2019

I tried @slafs last comment and it seems to be working fine for me.
Adding ":" after the token works while adding it before the token does not

@slafs
Copy link

slafs commented Jul 27, 2019

Following #6775 (comment), I've opened #6796 for the "one element" auth in the package URL.

@chrahunt
Copy link
Member

Thanks @slafs. In that case I think we can close this one.

@cjerdonek
Copy link
Member

How about documenting this case in the docs (including with an example) since it seems common and can trip people up?

@chrahunt
Copy link
Member

@cjerdonek, added #6797 to track that action.

@pradyunsg
Copy link
Member

I've posted a PR (#6818), that fixes both of our URL handling issues (this and #6796).

@pradyunsg
Copy link
Member

I'd appreciate if folks from this thread could take #6818 for a spin and let me know if it works.


To install pip from that branch, into your local environment, run:

pip install https://github.com/pradyunsg/pip/archive/fix/URL-authentication-handling.zip

@cjerdonek
Copy link
Member

I don’t think that PR fixes this issue? I thought this issue was about usernames that include an @ when a password is present (it doesn’t have to do with None / non-None), and the other issue is about no password being present. But I don’t see the former among the test cases. (There would need to be two unescaped @ symbols in the test cases.) Also, I thought fixing the former would mean we’d have to “fix” / rewrite the URL, which I don’t see being done in the PR.

@pradyunsg
Copy link
Member

I don’t see the former among the test cases

Ah, yes. My bad, I omitted the test for it. I've gone ahead and added that case to the test suite -- running it locally confirms that it extracted the email address correctly.

"http://[email protected]:[email protected]/path" -> ("http://example.com/path", "[email protected]", "password")

I thought fixing the former would mean we’d have to “fix” / rewrite the URL, which I don’t see being done in the PR.

Ah, yes. Me too initially.

However, I looked at the diff of requests and urllib3 vendors and didn't see anything in a code path that affected us directly here. I decided to try seeing if reverting to the changes from the original structure worked and it seems like it did.

My understanding now, is that it's got something to do with requests' default handling of URLs that contain auth information and (somehow) we're circumventing those issues in our Session subclass.

@cjerdonek
Copy link
Member

My understanding now, is that it's got something to do with requests' default handling of URLs that contain auth information and (somehow) we're circumventing those issues in our Session subclass.

No, if you look at my comment above where I included the traceback for this issue, you'll see that we pass the original URL: #6775 (comment)
And your PR doesn't "fix" / rewrite the original URL, so that I wouldn't expect that behavior to change in your PR.

Just to be sure, I retried that command in your PR and confirmed that the same error still happens.

@cjerdonek
Copy link
Member

Here is my comment above where I talk about one approach to fixing the current issue: #6775 (comment)

@pradyunsg
Copy link
Member

Right, makes sense to me. I think I was mixing issues in my head.

@pradyunsg
Copy link
Member

How about documenting this case in the docs (including with an example) since it seems common and can trip people up?

It would also make sense to abort in this situation, and print a warning if there's an unescaped "@" in the URL. It's probably more discoverable than the documentation and is friendlier than having it in the documentation.


Summarizing the state of affairs in pip 19.2, for my stupid self:

  • Authentication with only a token (i.e. username, no password) is broken -- pip's current logic assumes both parts would be present. That's what pip >=19.2 cannot install packages via custom URLs with "one element" auth #6796 is for.
  • Authentication with a username that's not URL-encoded does not work anymore -- pip 19.2 fails with a traceback, pip 19.1.1 accepts such URLs. The source of this issue is that urllib3 has become more strict in parsing such URLs.

@amritbhat786
Copy link

Hi, we are facing the same issue with pip 18.1, urllib3(1.24.1) & Python2.7 but in the password section.
https://pypi.org/simple --extra-index-url http://user:password@@124.4.56:8086 --trusted-host 124.4.56
https://pypi.org/project/urllib3/ doesn't list any change related to the RFC 3986 specification above. Also @cjerdonek @pradyunsg could you please recommend stable versions of both the packages so that we can rather install them.

@pradyunsg
Copy link
Member

pradyunsg commented Aug 11, 2019

Released pip 19.2.2 containing a fix for the tokens needing a :.

For folks facing issues with @ in their passwordds, the suggested path forward is to escape the @ in the URLs.

@bjohnson-va
Copy link

bjohnson-va commented Aug 15, 2019

Edit: Doesn't seem to be related to the '%', as I changed my password to "password" and got the same result.


Another kink in this: my old generated password (used to) end in "%".

On version 19.2.2

Replacing the % with %25 or leaving it alone, I see the same error:

(Assuming my password was: password%)

ERROR: Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip/_internal/cli/base_command.py", line 188, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip/_internal/commands/install.py", line 345, in run
    resolver.resolve(requirement_set)
  File "/Library/Python/2.7/site-packages/pip/_internal/legacy_resolve.py", line 196, in resolve
    self._resolve_one(requirement_set, req)
  File "/Library/Python/2.7/site-packages/pip/_internal/legacy_resolve.py", line 359, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/Library/Python/2.7/site-packages/pip/_internal/legacy_resolve.py", line 307, in _get_abstract_dist_for
    self.require_hashes
  File "/Library/Python/2.7/site-packages/pip/_internal/operations/prepare.py", line 134, in prepare_linked_requirement
    req.populate_link(finder, upgrade_allowed, require_hashes)
  File "/Library/Python/2.7/site-packages/pip/_internal/req/req_install.py", line 211, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/Library/Python/2.7/site-packages/pip/_internal/index.py", line 1201, in find_requirement
    req.name, specifier=req.specifier, hashes=hashes,
  File "/Library/Python/2.7/site-packages/pip/_internal/index.py", line 1183, in find_candidates
    candidates = self.find_all_candidates(project_name)
  File "/Library/Python/2.7/site-packages/pip/_internal/index.py", line 1111, in find_all_candidates
    if self._validate_secure_origin(logger, link)
  File "/Library/Python/2.7/site-packages/pip/_internal/index.py", line 998, in _validate_secure_origin
    origin = (parsed.scheme, parsed.hostname, parsed.port)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urlparse.py", line 113, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'password%%40mydomain.com'

or

ValueError: invalid literal for int() with base 10: 'password%25%40mydomain.com'

@booleand
Copy link

Can you try encoding only the credential portion of the URL - e.g. password%[email protected]?

@bjohnson-va
Copy link

Switching to password%[email protected] does allow me to proceed.

@pradyunsg
Copy link
Member

pradyunsg commented Aug 22, 2019

There's nothing more actionable here. As a closing note, I'll summarize this issue and make a call-to-action.

What happened?

pip 19.2 now requires credentials to be URL quoted. Thus, characters like @ and % need to be URL quoted (like %40 and %25).

Thus, the following index-urls will no longer work:

https://[email protected]:[email protected]/
https://user:password%@mydomain.com/

The equivalents that do work, are:

https://pradyunsg%40email.com:[email protected]/
https://user:password%[email protected]/

Why?

pip (and requests) use urllib3 for a lot of the underlying URL handling. urllib3 has introduced stricter URL parsing in the version that pip upgraded to in 19.2. Since this strictness was only "caught" after the release, pip's developers decided to not to circumvent the stricter parsing. The workflows for affected users had already broken and none pointed out that it'd be particularly difficult for them to quote their credentials properly.

How can you help?

pip's developers do not want to be breaking their users' workflows. However, this is difficult since pip is a project maintained entirely by volunteers, with a legacy codebase that makes on-boarding additional contributors difficult.

There is a substantial lack of developer time on pip, especially given the impact that changes in it, can have on users. To help us avoid issues like this and make more substantial improvements to pip, consider funding Python Packaging improvements. Funding such projects related to pip, would allow current maintainers be able to work on pip for more time, and bring in additional experts (UX experts, project managers, technical writers etc) to make pip better.

@pradyunsg pradyunsg unpinned this issue Sep 7, 2019
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Sep 21, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Sep 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: finder PackageFinder and index related code type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.