Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--use-deprecated=html5lib does not parse links, even though they're present #10845

Closed
1 task done
notatallshaw opened this issue Jan 30, 2022 · 5 comments · Fixed by #10846
Closed
1 task done

--use-deprecated=html5lib does not parse links, even though they're present #10845

notatallshaw opened this issue Jan 30, 2022 · 5 comments · Fixed by #10846
Labels
C: finder PackageFinder and index related code type: bug A confirmed bug or unintended behavior type: deprecation Related to deprecation / removal.

Comments

@notatallshaw
Copy link
Member

notatallshaw commented Jan 30, 2022

Description

When using Pip 22.0 with --use-deprecated=html5lib with JFrog as the Index packages pip throws the error: ERROR: No matching distribution found for requests

Tested with the "requests" package on Windows 10 using pip 22.0 (fails) and pip 21.3.1 (works)

Expected behavior

--use-deprecated=html5lib should allow JFrog indexes to work.

pip version

22.0

Python version

3.10

OS

Windows

How to Reproduce

Install package from JFrog index using pip 22.0

Output

C:\>python -m pip install -vvv requests --use-deprecated=html5lib
Using pip 22.0 from <corporate_local_path>\lib\site-packages\pip (python 3.10)
Non-user install by explicit request
Created temporary directory: <corporate_user_path>\AppData\Local\Temp\pip-ephem-wheel-cache-4a5e6ucc
Created temporary directory: <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Initialized build tracking at <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Created build tracker: <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Entered build tracker: <corporate_user_path>\AppData\Local\Temp\pip-req-tracker-p0zhtye3
Created temporary directory: <corporate_user_path>\AppData\Local\Temp\pip-install-_cnfjhxu
Looking in indexes: http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple
1 location(s) to search for versions of requests:
* http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Fetching project page and analyzing links: http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Getting page http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Found index url http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple
Looking up http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/ in the cache
Request header has "max_age" as 0, cache bypassed
Starting new HTTP connection (1): <corporate_domain>:80
http://<corporate_domain>:80 "GET /artifactory/api/pypi/pypi-release/simple/requests/ HTTP/1.1" 200 None
Updating cache with response from http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Skipping link: not a file: http://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/
Given no hashes to check 0 links for project 'requests': discarding no candidates
ERROR: Could not find a version that satisfies the requirement requests (from versions: none)
ERROR: No matching distribution found for requests
Exception information:
Traceback (most recent call last):
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 348, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 173, in _add_to_criteria
    raise RequirementsConflicted(criterion)
pip._vendor.resolvelib.resolvers.RequirementsConflicted: Requirements conflict: SpecifierRequirement('requests')
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "<corporate_local_path>\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 94, in resolve
    result = self._result = resolver.resolve(
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 481, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "<corporate_local_path>\lib\site-packages\pip\_vendor\resolvelib\resolvers.py", line 350, in resolve
    raise ResolutionImpossible(e.criterion.information)
pip._vendor.resolvelib.resolvers.ResolutionImpossible: [RequirementInformation(requirement=SpecifierRequirement('requests'), parent=None)]
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "<corporate_local_path>\lib\site-packages\pip\_internal\cli\base_command.py", line 165, in exc_logging_wrapper
    status = run_func(*args)
  File "<corporate_local_path>\lib\site-packages\pip\_internal\cli\req_command.py", line 205, in wrapper
    return func(self, options, args)
  File "<corporate_local_path>\lib\site-packages\pip\_internal\commands\install.py", line 339, in run
    requirement_set = resolver.resolve(
  File "<corporate_local_path>\lib\site-packages\pip\_internal\resolution\resolvelib\resolver.py", line 103, in resolve
    raise error from e
pip._internal.exceptions.DistributionNotFound: No matching distribution found for requests

Removed build tracker: '<corporate_user_path>\\AppData\\Local\\Temp\\pip-req-tracker-p0zhtye3'

Code of Conduct

@notatallshaw notatallshaw added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Jan 30, 2022
@notatallshaw notatallshaw changed the title ERROR: No matching distribution found for requests with " --use-deprecated=html5lib" ERROR: No matching distribution found for requests with "--use-deprecated=html5lib" using JFrog Jan 30, 2022
@notatallshaw notatallshaw changed the title ERROR: No matching distribution found for requests with "--use-deprecated=html5lib" using JFrog "ERROR: No matching distribution found" with "--use-deprecated=html5lib" using JFrog Jan 30, 2022
@notatallshaw
Copy link
Member Author

notatallshaw commented Jan 30, 2022

When I run the same command under Pip 21.3.1 I see it runs a GET on https://<corporate_domain>/artifactory/api/pypi/pypi-release/simple/requests/ and is able to parse the following response: https://gist.github.com/notatallshaw/caef03cdb0592c13fab463a9fb5223a3 with many "Found link" results.

@pradyunsg pradyunsg removed type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels Jan 30, 2022
@pradyunsg pradyunsg changed the title "ERROR: No matching distribution found" with "--use-deprecated=html5lib" using JFrog "ERROR: No matching distribution found" with --use-deprecated=html5lib Jan 30, 2022
@pradyunsg pradyunsg changed the title "ERROR: No matching distribution found" with --use-deprecated=html5lib --use-deprecated=html5lib and Artifactory Jan 30, 2022
@pradyunsg pradyunsg added C: finder PackageFinder and index related code state: needs eyes Needs a maintainer/triager to take a closer look type: deprecation Related to deprecation / removal. labels Jan 30, 2022
@pradyunsg
Copy link
Member

Can you share the raw HTML document returned by Artifactory? Feel free to redact the URLs as appropriate.

@notatallshaw
Copy link
Member Author

Can you share the raw HTML document returned by Artifactory? Feel free to redact the URLs as appropriate.

Already done in previous comment: #10845 (comment)

@pradyunsg pradyunsg added type: bug A confirmed bug or unintended behavior and removed state: needs eyes Needs a maintainer/triager to take a closer look labels Jan 30, 2022
@pradyunsg
Copy link
Member

I'm able to reproduce this, with just pip's parsing logic:

from pathlib import Path

from pip._internal.index.collector import HTMLPage, parse_links

content = Path("/tmp/page.html").read_bytes()
page = HTMLPage(content, "utf-8", "https://private.domain.example.com/index")

try:
    print("new", len(list(parse_links(page, use_deprecated_html5lib=True))))
except TypeError:
    print("old", len(list(parse_links(page))))

21.3.1

❯ python /tmp/foo.py
old 208

22.0

❯ python /tmp/foo.py
new 0

@pradyunsg pradyunsg added this to the 22.0.1 milestone Jan 30, 2022
@pradyunsg pradyunsg changed the title --use-deprecated=html5lib and Artifactory --use-deprecated=html5lib does not parse links, even though they're present Jan 30, 2022
@pradyunsg pradyunsg removed this from the 22.0.1 milestone Jan 30, 2022
@pfmoore
Copy link
Member

pfmoore commented Jan 30, 2022

The issue seems to be simply that the HTML doesn't include a doctype (which seems to be required by PEP 503 and the HTML5 spec)

I'm unsure whether this is something where we should be lenient in what we accept.

Edit: Never mind, I missed that this was about the old parsing using html5lib.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: finder PackageFinder and index related code type: bug A confirmed bug or unintended behavior type: deprecation Related to deprecation / removal.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants