Skip to content
This repository has been archived by the owner on Jan 8, 2024. It is now read-only.

PAR picks up system-installed version instead of bundled version #38

Closed
mattmoor opened this issue Sep 14, 2017 · 12 comments · Fixed by #49
Closed

PAR picks up system-installed version instead of bundled version #38

mattmoor opened this issue Sep 14, 2017 · 12 comments · Fixed by #49
Assignees

Comments

@mattmoor
Copy link
Contributor

In my PR to add PIP support to Bazel you can reproduce this as shown below.

On a system with:

$ pip --version
pip 1.5.4 from /usr/lib/python2.7/dist-packages (python 2.7)

(I must have gotten this through apt-get or some other super-slow release channel)

If I bazel run the py_binary form, no warning:

rules_python$ bazel run rules_python:piptool --  --name foo --input $PWD/examples/helloworld/requirements.txt --directory /tmp/pip/ --output /tmp/requirements.bzl
INFO: Analysed target //rules_python:piptool (0 packages loaded).
INFO: Found 1 target...
Target //rules_python:piptool up-to-date:
  bazel-bin/rules_python/piptool
INFO: Elapsed time: 0.184s, Critical Path: 0.00s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/rules_python/piptool --name foo --input /home/mattmoor/rules_python/examples/helloworld/requirements.txt --directory /tmp/pip/ --output /tmp/requirements.bzl
Collecting futures>=3.1 (from -r /home/mattmoor/rules_python/examples/helloworld/requirements.txt (line 1))
  File was already downloaded /tmp/pip/futures-3.1.1-py2-none-any.whl
Skipping futures, due to already being wheel.

If I bazel run the .par target, I get a warning:

rules_python$ bazel run rules_python:piptool.par --  --name foo --input $PWD/examples/helloworld/requirements.txt --
directory /tmp/pip/ --output /tmp/requirements.bzl
INFO: Analysed target //rules_python:piptool.par (0 packages loaded).
INFO: Found 1 target...
Target //rules_python:piptool.par up-to-date:
  bazel-bin/rules_python/piptool.par
INFO: Elapsed time: 0.204s, Critical Path: 0.01s
INFO: Build completed successfully, 1 total action

INFO: Running command line: bazel-bin/rules_python/piptool.par --name foo --input /home/mattmoor/rules_python/examples/helloworld/requirements.txt --directory /tmp/pip/ --output /tmp/requirements.bzl
Collecting futures>=3.1 (from -r /home/mattmoor/rules_python/examples/helloworld/requirements.txt (line 1))
  File was already downloaded /tmp/pip/futures-3.1.1-py2-none-any.whl
Skipping futures, due to already being wheel.
You are using pip version 1.5.4, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Happy to provide more info as needed.

@mattmoor
Copy link
Contributor Author

@duggelz Sorry I forgot to do this until just now, lmk if I can provide any more info.

@duggelz
Copy link
Contributor

duggelz commented Sep 15, 2017

pip 1.5.4 from most likely from the python-pip package, which is indeed ancient. I'd recommend doing pip install --upgrade --user pip virtualenv setuptools to put a newer version in ~/.local, and then apt-get remove python-pip python-virtualenv python3-pip for general sanity. But meanwhile, I'm able to repro this, fix might be complicated though.

@duggelz
Copy link
Contributor

duggelz commented Sep 15, 2017

pkg_resources doesn't recognize the packages inside the .par file, so it finds the old pip package. It's actually using the correct, newer version of pip, but I should fix this anyway.

https://github.com/pypa/pip/blob/a9d56c7734fd465d01437d61f632749a293e7805/src/pip/_internal/utils/misc.py#L835
https://github.com/pypa/pip/blob/a9d56c7734fd465d01437d61f632749a293e7805/src/pip/_internal/utils/outdated.py#L101

@duggelz
Copy link
Contributor

duggelz commented Sep 16, 2017

We'd have to do some surgery to tell pkg_resources about .par files, which are similar to, but different than .egg files. https://github.com/pypa/setuptools/blob/9f295a706590b1e3618978d6f2d83af0b893ec4d/pkg_resources/__init__.py#L1985
and
https://github.com/pypa/setuptools/blob/9f295a706590b1e3618978d6f2d83af0b893ec4d/pkg_resources/__init__.py#L692
which gets into some tricky distinctions about eggs, wheels, sdists, .par files, and zip applications.

@mattmoor
Copy link
Contributor Author

tl;dr I just linked an issue from rules_python, which winds back to this same problem.

Near as I can tell, this same problem is the root of why we cannot build .whl files for pip packages (that don't have them on pypi) unless the host has the wheel package installed.

This is because the metadata extracted from packages on the path is used to determine the "commands" that have been registered by a particular installed package.

My conclusion is that we need our own pkg_resources.register_finder(...) defined in a custom entrypoint. When we pip_import stuff, the whl_library includes the package metadata as data, so all of the information is present in the PAR, we just need to access it.

The final gotcha (because nothing is easy) is that because wheel compilation happens in a separate process, we need whatever we do to work by just setting PYTHONPATH before invoking that process. I see two possible options:

  1. The more straightforward would be to expand deps and put their directories onto PYTHONPATH, but for reasons stated elsewhere unpacking PAR files is undesirable; or

  2. Create dummy metadata outside of the PAR that aggregates the metadata of stuff bundled into the PAR and (somehow) direct their invocations through the PAR (and have the PAR entrypoint be smart enough to handle this).

Either way, I'm way outside of my depth. :)

@mattmoor
Copy link
Contributor Author

Hi there.

I missed that the first time I peeked at __main__.py in the unzipped PAR. This is probably where we hook into pkg_resources.

@mattmoor
Copy link
Contributor Author

Searching for examples of other finders, I happened across this, which seems close.

@mattmoor
Copy link
Contributor Author

Alright, hacking the hell out of that sample, I have something that seems to work for my tiny example... Basically, I added this to support.py:

+class WheelMetadata(pkg_resources.EggMetadata):
+  """Metadata provider for zipped wheels."""
+
+  @classmethod
+  def _split_wheelname(cls, wheelname):
+    split_wheelname = wheelname.split('-')
+    return '-'.join(split_wheelname[:-3])
+
+  def _setup_prefix(self):
+    path = self.module_path
+    old = None
+    whl = os.path.dirname(path)
+    dir = os.path.dirname(whl)
+    if not dir.endswith('.par'):
+        return
+
+    whlname = os.path.basename(whl)
+    with zipfile.ZipFile(dir, 'r') as par:
+        for name in par.namelist():
+            parts = name.split('/', 2)
+            if parts[0] != whlname:
+                continue
+            if parts[1].endswith('.dist-info') and parts[2] == 'metadata.json':
+                self.egg_name = whlname
+                self.egg_info = os.path.join(dir, parts[0], parts[1])
+                self.egg_root = dir
+                break
+
+
+def wheel_from_metadata(location, metadata):
+  if not metadata.has_metadata(pkg_resources.DistInfoDistribution.PKG_INFO):
+    return None
+
+  from email.parser import Parser
+  pkg_info = Parser().parsestr(metadata.get_metadata(pkg_resources.DistInfoDistribution.PKG_INFO))
+  return pkg_resources.DistInfoDistribution(
+      location=location,
+      metadata=metadata,
+      project_name=pkg_info.get('Name'),
+      version=pkg_info.get('Version'),
+      platform=None)
+
+
+def find_whl_in_par(importer, path_item, only=False):
+    """Find .whl metadata in PAR files."""
+    metadata = WheelMetadata(importer)
+    dist = wheel_from_metadata(path_item, metadata)
+    if dist:
+        yield dist

With this call in setup:

+
+    # Replace the built-in finder.
+    pkg_resources.register_finder(zipimport.zipimporter, find_whl_in_par)

Who knows what breaks, but with this, list_resources.par prints:

$ bazel run examples/par:list_resources.par
...
pip 9.0.1
wheel 0.30.0a0
...

I'm going to declare my day of Python spelunking a success and pray @duggelz finds this useful :)

mattmoor added a commit to bazelbuild/rules_k8s that referenced this issue Oct 13, 2017
This is blocking the addition of several Python samples, including the http one and the TODO controller.

The cause is tracked by [this](bazelbuild/rules_python#17) and [this](google/subpar#38) issue.

Currently Bazel CI doesn't provide very meaningful coverage, most of it happens via Travis and example e2e tests.
@duggelz duggelz self-assigned this Nov 2, 2017
@duggelz
Copy link
Contributor

duggelz commented Nov 4, 2017

After much head-scratching, it turns out pip has it's own private, independent, vendored version of pkg_resources, so my fix doesn't fix it.

duggelz pushed a commit to bazelbuild/rules_python that referenced this issue Nov 4, 2017
The warning is:
   You are using pip version 1.5.4, however version 9.0.1 is available.
   You should consider upgrading via the 'pip install --upgrade pip' command.

In reality, pip is at 9.0.1, but its bundled version of pkg_resources
doesn't know that.

See google/subpar#38
@duggelz
Copy link
Contributor

duggelz commented Nov 7, 2017

Ok, I don't think I can fix the 'bdist_wheel' issue without a ridiculous amount of hackery. The expedient solutions are:

  1. Ensure 'wheel' is installed in the normal way in the local Python interpreter [Which I guess you did?]
  2. Unpack the .par file at runtime.

Based on this and other considerations, I don't think I can avoid doing 2 any longer.

@duggelz
Copy link
Contributor

duggelz commented Nov 7, 2017

More investigation: Simply doing import pip changes sys.path in hard to predict ways, that are also hard to prevent, and hard to fixup after the fact. Also, pip can be the regular pip, or the heavily patched Debian-specific version, and it's hard to tell at runtime which one you're getting.

https://github.com/pypa/pip/blob/master/src/pip/_vendor/__init__.py

@duggelz duggelz reopened this Nov 10, 2017
@duggelz
Copy link
Contributor

duggelz commented Nov 10, 2017

I don't think I appreciated that Github automatically closes issues if you have the right wording in your PR.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants