Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wheels larger than 4G fail to download/install #9549

Open
MiningMarsh opened this issue Feb 2, 2021 · 11 comments
Open

wheels larger than 4G fail to download/install #9549

MiningMarsh opened this issue Feb 2, 2021 · 11 comments
Labels
C: cache Dealing with cache and files in it project: vendored dependency Related to a vendored dependency type: bug A confirmed bug or unintended behavior

Comments

@MiningMarsh
Copy link

Environment

  • pip version: 21.0.1
  • Python version: 3.7.5
  • OS: Ubuntu 19.10

Description
Older versions of pip could handle wheels larger than 4GiB. After a pip3 --upgrade pip, downloading such a package causes a msgpack exception when trying to store the wheel to disk.

Output

$ pip download <redacted>
Collecting <redacted>
  Downloading <redacted>.whl (4534.2 MB)
     |████████████████████████████████| 4534.2 MB 15.5 MB/s eta 0:00:01ERROR: Exception:
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 171, in _merge_into_criterion
    crit = self.state.criteria[name]
KeyError: '<redacted>'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 189, in _main
    status = self.run(options, args)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/cli/req_command.py", line 178, in wrapper
    return func(self, options, args)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 317, in run
    reqs, check_supported_wheels=not options.target_dir
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 122, in resolve
    requirements, max_rounds=try_to_avoid_resolution_too_deep,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 453, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 318, in resolve
    name, crit = self._merge_into_criterion(r, parent=None)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _merge_into_criterion
    crit = Criterion.from_requirement(self._p, requirement, parent)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/resolvelib/resolvers.py", line 82, in from_requirement
    if not cands:
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/resolvelib/structs.py", line 124, in __bool__
    return bool(self._sequence)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in __bool__
    return any(self)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 38, in _iter_built
    candidate = func()
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 169, in _make_candidate_from_link
    name=name, version=version,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 306, in __init__
    version=version,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 144, in __init__
    self.dist = self._prepare()
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 226, in _prepare
    dist = self._prepare_distribution()
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 312, in _prepare_distribution
    self._ireq, parallel_builds=True,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 457, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 482, in _prepare_linked_requirement
    self.download_dir, hashes,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 234, in unpack_url
    hashes=hashes,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 108, in get_http_url
    from_path, content_type = download(link, temp_dir.path)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/network/download.py", line 163, in __call__
    for chunk in chunks:
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/cli/progress_bars.py", line 159, in iter
    for x in it:
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_internal/network/utils.py", line 88, in response_chunks
    decode_content=False,
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 576, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/urllib3/response.py", line 519, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 65, in read
    self._close()
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/cachecontrol/filewrapper.py", line 52, in _close
    self.__callback(self.__buf.getvalue())
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/cachecontrol/controller.py", line 309, in cache_response
    cache_url, self.serializer.dumps(request, response, body=body)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/cachecontrol/serialize.py", line 72, in dumps
    return b",".join([b"cc=4", msgpack.dumps(data, use_bin_type=True)])
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/__init__.py", line 35, in packb
    return Packer(**kwargs).pack(o)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/fallback.py", line 960, in pack
    self._pack(obj)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/fallback.py", line 944, in _pack
    len(obj), dict_iteritems(obj), nest_limit - 1
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/fallback.py", line 1045, in _pack_map_pairs
    self._pack(v, nest_limit - 1)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/fallback.py", line 944, in _pack
    len(obj), dict_iteritems(obj), nest_limit - 1
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/fallback.py", line 1045, in _pack_map_pairs
    self._pack(v, nest_limit - 1)
  File "/home/user/.pyenv/versions/molecule_3/envs/test/lib/python3.7/site-packages/pip/_vendor/msgpack/fallback.py", line 887, in _pack
    raise ValueError("%s is too large" % type(obj).__name__)
ValueError: bytes is too large
@pradyunsg
Copy link
Member

Can you run a memory profiler, to see what's using this memory, or causing this bytes too large error?

@MiningMarsh
Copy link
Author

MiningMarsh commented Feb 2, 2021

Can you run a memory profiler, to see what's using this memory, or causing this bytes too large error?

What tool would you prefer I do this in? I am certainly willing to, but I haven't done any profiling in python before. If you have something to point me at I can take a look at the docs and post a memory profile with that tool.

A little more digging we did this evening: turns out this has been a problem since pip 10.x, 9.x and before works fine.

As far as I can tell so far, this is failing due to msgpack under the hood, as msgpack appears to only support packing of 4GiB objects max. I think something under pip is buffering the downloaded wheel as a msgpack object and failing as a result.

@uranusjr
Copy link
Member

uranusjr commented Feb 3, 2021

Judging from the traceback, this looks like a restriction in cachecontrol (which pip uses to cache network requests). One simple workaround would be to simply skip caching instead of failing.

@MiningMarsh
Copy link
Author

Judging from the traceback, this looks like a restriction in cachecontrol (which pip uses to cache network requests). One simple workaround would be to simply skip caching instead of failing.

We re-ran tests with --no-cache-dir enabled, and it appears to succeed when we do, so it does seem to be a problem with the caching layer.

@pradyunsg pradyunsg added C: cache Dealing with cache and files in it project: vendored dependency Related to a vendored dependency type: bug A confirmed bug or unintended behavior labels Feb 3, 2021
@hexagonrecursion
Copy link
Contributor

msgpack appears to only support packing of 4GiB objects max.

More specifically: this appears to be a limitation of the msgpack format rather than a limitation of a specific implementation.

https://github.com/msgpack/msgpack/blob/master/spec.md#bin-format-family

bin 32 stores a byte array whose length is upto (2^32)-1 bytes

@bennuttall
Copy link

I've seen this before and I'd consider it a bug in pip.

If you wget the large file, it will take a while to download but will download successfully, and you can use pip to install from that file.

It seems that pip's handling of a large download does not allow for a slight connection drop, it just errors out where wget doesn't.

@RonnyPfannschmidt
Copy link
Contributor

It's a bug in cache controll that triggers the issue for pip

@hexagonrecursion
Copy link
Contributor

@bennuttall
It seems that pip's handling of a large download does not allow for a slight connection drop

What makes you think this has anything to do with connection drop?

@bennuttall
Copy link

Perhaps not a connection drop as such, but a slight interruption to the download. wget seems to be able to continue, pip errors out.

@RonnyPfannschmidt
Copy link
Contributor

Please read the issue, its relatively clearly stated that the cache layer fails after download on storage of a blob that's larger than what msgpack can store

@itamarst
Copy link
Contributor

itamarst commented May 2, 2022

This should be fixed by fix to #2984.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: cache Dealing with cache and files in it project: vendored dependency Related to a vendored dependency type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

7 participants