Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve wheel support in pex. #388

Merged
merged 25 commits into from
Jul 8, 2017
Merged

Improve wheel support in pex. #388

merged 25 commits into from
Jul 8, 2017

Conversation

MarkChuCarroll
Copy link
Contributor

Wheel handling in pex files is broken. The wheel standard says
that wheel files are not designed to be importable archives,
but pex treats them as if they are. This causes many standard
compliant wheels, including things like tensorflow and opencv,
to fail to import in pexes (and thus in pants).

This change modifies wheel handling, so that when a wheel
is added to a pex, it's installed in an importable form.

@MarkChuCarroll
Copy link
Contributor Author

Ping? It's been a week - any chance anyone can look at this soon? @jsirois?

@kwlzn kwlzn mentioned this pull request May 22, 2017
Copy link
Contributor

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a reasonable first step towards wheel installation improvements, mod one comment.

if possible, it would also be nice to ensure that pex(1)-built pex files also net this improvement (they currently do not appear to afaict) so that there's consistency between pex and pants for e.g. testing and repros.

# wheel dir.
if dist_name.endswith("whl"):
wf = WheelFile(path)
wf.install(overrides=self._get_installer_paths(dist_name), force=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me like this conditional block should be mutually exclusive with the zip explosion below.. or else land in an isolated install path?

otherwise you could end up with dirs containing both installed and unzipped wheels, which could be problematic if e.g. inner paths collide?

Copy link

@dpkp dpkp May 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or possibly just short-circuit with return CacheHelper.zip_hash(wf.zipfile)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this a bit and noticed that the wheel-installed files will be ignored if they are not also written directly to the chroot via self._chroot.write. The Chroot class maintains an internal set of written files (by label) in Chroot.filesets and files that are included in this data structure get written out to the pex zip in Chroot.zip . So any files installed directly by wheel that are not also installed in the normal path below will be ignored.

There doesn't appear to be a clean way to update chroot.filesets without writing directly, so perhaps the best approach here is to install the wheel into a temporary directory, then read the installed RECORD file and write each entry to the chroot?

"""
base = os.path.join(self.path(), self._pex_info.internal_cache, dist_name)
return {
'purelib': os.path.join(base),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could drop os.path.join here and just use base

@dpkp
Copy link

dpkp commented May 26, 2017

I am unable to get console scripts to work with this wheel install (amended to skip the additional direct zip install). using ansible for example, the current pex builder generates

.deps/ansible-2.2.2.0-py2-none-any.whl/ansible-2.2.2.0.data/scripts/ansible

while this PR generates

.deps/ansible-2.2.2.0-py2-none-any.whl/bin/ansible

and if I try to build a pex using this console script, I get an error:

pex ansible==2.2.2.0 -o ansible -c ansible

# dpowers at dpowers-m01.local in ~/Code/pex on git:PR388 ✖︎ [11:58:04]
→ ./ansible
Traceback (most recent call last):
  File ".bootstrap/_pex/pex.py", line 360, in execute
  File ".bootstrap/_pex/pex.py", line 288, in _wrap_coverage
  File ".bootstrap/_pex/pex.py", line 320, in _wrap_profiling
  File ".bootstrap/_pex/pex.py", line 401, in _execute
  File ".bootstrap/_pex/pex.py", line 430, in execute_script
_pex.pex.NotFound: Could not find script ansible in pex!

again, this is only if I modify the PR to not also unpack the wheel as a zipfile directly).

@MarkChuCarroll
Copy link
Contributor Author

Some context on this update: this properly uses WheelFile.install to extract the contents of a wheel, and then copies that into the pex chroot. I think this is a much better solution than the first try.

The alternative approach, which I think would work, but be more brittle, would be to add a wheel-based finder (including an override of the ImpImporter from pkgutils). That would allow us to include the wheel, in its original pypi form, in the pex. The catch, and the reason why I prefer this approach, is that we'd basically be reproducing some logic from the pkgutils WheelFile.install in the finder, and that would require us to make sure that we update pex logic after any changes to pkgutils. The approach in this change just uses wheelfile, and then pulls in the lesult.

Wheel handling in pex files is broken. The wheel standard says
that wheel files are not designed to be importable archives,
but pex treats them as if they are. This causes many standard
compliant wheels, including things like tensorflow and opencv,
to fail to import in pexes (and thus in pants).

This change modifies wheel handling, so that when a wheel
is added to a pex, it's installed in an importable form.
(Code disagreed about whether scripts were stored in
/bin or /scripts.)
@MarkChuCarroll
Copy link
Contributor Author

This passes on my laptop with Python 2.6. Not sure what Jenkins' problem is.

@dpkp
Copy link

dpkp commented May 31, 2017

Nice updates! My local testing is raising exception during build w/ python2 pex:

PR388> pex ansible==2.2.2.0 -o ansible -c ansible
Traceback (most recent call last):
  File "/Users/dpowers/Library/Python/2.7/bin/pex", line 11, in <module>
    sys.exit(main())
  File "/Users/dpowers/Library/Python/2.7/lib/python/site-packages/pex/bin/pex.py", line 578, in main
    pex_builder = build_pex(reqs, options, resolver_options_builder)
  File "/Users/dpowers/Library/Python/2.7/lib/python/site-packages/pex/bin/pex.py", line 542, in build_pex
    pex_builder.set_script(options.script)
  File "/Users/dpowers/Library/Python/2.7/lib/python/site-packages/pex/pex_builder.py", line 220, in set_script
    script, ', '.join(str(d) for d in self._distributions)))
pex.pex_builder.InvalidExecutableSpecification: Could not find script 'ansible' in any distribution pyparsing 2.2.0, pycrypto 2.6.1, pyasn1 0.2.3, packaging 16.8, ansible 2.2.2.0, cryptography 1.9, pycparser 2.17, enum34 1.1.6, ipaddress 1.0.18, asn1crypto 0.22.0, appdirs 1.4.3, cffi 1.10.0, six 1.10.0, idna 2.5, setuptools 35.0.2, PyYAML 3.12, paramiko 2.1.2, Jinja2 2.9.6, MarkupSafe 1.0 within PEX!

@dpkp
Copy link

dpkp commented May 31, 2017

bb4088f fixes my console script error! But I can reproduce the build failures on py26 locally, so I think those are real.

@MarkChuCarroll
Copy link
Contributor Author

Can you tell me about your py26 environment? I haven't been able to reproduce it, so I'd like to try to figure out what's different.

  • What OS?
  • What version of 2.6? (I've got 2.6.9)
  • What version of pip are you using? (9.0.1)

@dpkp
Copy link

dpkp commented Jun 1, 2017

The test failure appears to be an import error -- wheel.install is imported during pex bootstrap and is not available (it isn't part of stdlib).

> /var/folders/pb/gwlknvpx1l70js5vw_1xnyp00005fb/T/tmpQUHimR/app.pex
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "__main__.py", line 25, in <module>
  File ".bootstrap/_pex/pex_bootstrapper.py", line 82, in bootstrap_pex
  File ".bootstrap/_pex/pex.py", line 17, in <module>
  File ".bootstrap/_pex/environment.py", line 23, in <module>
  File ".bootstrap/_pex/pex_builder.py", line 12, in <module>
ImportError: No module named wheel.install

I'm not sure how pex_builder is used within the packaged pex bootstrap environment, but it seems like you want to only import during build and avoid import at runtime.

@dpkp
Copy link

dpkp commented Jun 1, 2017

fwiw, I reproduced using tox tox -v -e py26 -- tests/test_pex.py . Running system mac 2.6.9 and my python2.6 does not have wheel installed globally. pip is 9.0.1

@dpkp
Copy link

dpkp commented Jun 1, 2017

So the import error is a bug that I think needs fixing, but it is not the cause for test failures. The pex artifacts are run within the tox virtualenv which gets 'wheel' installed. So the tests should get past that issue.

I think what is happening in the test failures is WheelFile.install is not copying the full script to bin/ . When I inspect the pex artifact, the test scripts (hello_world and shell_script) are empty except for the hashbang. I confirmed that this doesn't happen on master (which just looks at the raw script in the wheel archive).

@MarkChuCarroll
Copy link
Contributor Author

I think that the wheel import is, ultimately, the root cause.

The tests that are failing on 2.6 are the tests that rely on the pex generation script successfully running Wheelfile.install; I think that what's going on under the covers is that we're winding up with an invalid pex because the generation process is failing due to wheel not being available on the 2.6 host.

setup.py Outdated
@@ -61,6 +61,7 @@
'twitter.common.lang>=0.3.1,<0.4.0',
'twitter.common.testing>=0.3.1,<0.4.0',
'twitter.common.dirutil>=0.3.1,<0.4.0',
'wheel==0.29.0',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that WHEEL_REQUIREMENT should now be in install_requires

@dpkp
Copy link

dpkp commented Jun 2, 2017

I dug through a bit more and I'm confident this is actually hitting a bug deep within the interaction between wheel and zipfile on python2.6 . Specifically, when wheel installs a script it attempts to massage the #!python hashbang before processing the rest of the script. The code is here:
https://bitbucket.org/pypa/wheel/src/0.29.0/wheel/install.py?at=default&fileviewer=file-view-default#install.py-346

It does this by attempting to read a single line from the zipfile using zipfile.readline() . It then reads the remainder of the script using zipfile.read(). The problem is that the underlying zipfile library on python2.6 does not handle this sequence properly. You can test very easily by creating a zipfile with any script, open via z = zipfile.open(filename), then attempt to read a single line followed by the rest of the script w/ readline() and read(). read() returns nothing! Very strange. This does not happen on python2.7, likely because the zipfile module was refactored to fix the underlying issue.

I'm not sure what the best solution is here. First, I wonder if you can reproduce this zipfile bug? If so, I think the options are to (1) break pex installs of wheel files using python2.6, (2) write custom code to install the wheel package that does not have this bug, or (3) see if we can get a patch applied to upstream wheel.

@MarkChuCarroll
Copy link
Contributor Author

MarkChuCarroll commented Jun 2, 2017 via email

@MarkChuCarroll
Copy link
Contributor Author

So how can we move this forward? We'd really like to have a version of pex that works with the tensorflow distribution wheel.

My inclination is not to put resources into supporting a version of Python that no one should be using. Would it be an acceptable compromise to use a check in the wheel code, which signals an error for using wheels with .data directories on Python 2.6?

@dpkp
Copy link

dpkp commented Jun 21, 2017

I'm just an interested third-party, not a maintainer. But it seems reasonable to me to error, or perhaps just warn, if it seems like someone is trying to use python2.6 to build a pex from a wheel that has console scripts.

@kwlzn -- what are your thoughts?

@dpkp
Copy link

dpkp commented Jun 21, 2017

Also would suggest skipping the test on py2.6:

@pytest.mark.skipif(sys.version_info <= (2,6),
                    reason="wheel script installation is broken on python 2.6")

@MarkChuCarroll
Copy link
Contributor Author

Can anyone help out with the isort-test failures? Isort running locally on my machine shows no errors; and the file that it's complaining about is unchanged in this diff!

@rouge8
Copy link
Contributor

rouge8 commented Jun 22, 2017

Can anyone help out with the isort-test failures?

isort 4.2.9+ seems to catch things earlier versions of isort missed. Looks like 4.2.15 is getting installed on Travis, not sure what you have locally.

@MarkChuCarroll
Copy link
Contributor Author

Ping?

This now includes checks for python versions, to generate a warning on older versions of Python.
On those earlier versions of Python, it defaults to treating a wheel file as an object that can be added to the pythonpath (which is the current behavior for all wheels on all versions of Python.)

The build issues are fixed, and all tests pass.

@kwlzn
Copy link
Contributor

kwlzn commented Jul 7, 2017

@MarkChuCarroll apologies for the delay, just catching back up on the pex PR backlog now - will try to take a look at this tomorrow.

Copy link
Contributor

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR @MarkChuCarroll! looks pretty good to me. handful of smaller things and maybe one issue - then we can expedite to release.

# into an importable shape. We can do that by installing it into its own
# wheel dir.
if not self.interpreter.supports_wheel_install():
print("*** Wheel dependencies may not work correctly with Python 2.6.", file=sys.stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be great to avoid printing directly to std descriptors due to the unexpected output effect on downstream consumers like e.g. pants..

how about using the logging module - or at a minimum the warnings module instead? ideally with a one liner message.

@@ -253,7 +256,48 @@ def _add_dist_dir(self, path, dist_name):
self._copy_or_link(filename, target)
return CacheHelper.dir_hash(path)

def _get_installer_paths(self, dist_name, base):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dist_name appears unused here.

pex/finders.py Outdated
dist.get_resource_string('', script_path).replace(b'\r\n', b'\n').replace(b'\r', b'\n'))
# This can get called in different contexts; in some, it looks for files in the
# wheel archives being used to produce a pex; in others, it looks for files in the
# install wheel directory included in the pex. So we need to look at both locations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this was the only caller of the safer_name function above, so would kill that along with the removal of it's usage.

if self.interpreter.supports_wheel_install() and dist_name.endswith("whl"):
from wheel.install import WheelFile
try:
tmp = tempfile.mkdtemp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of pex.common.safe_mkdtemp is preferred.

self._chroot.copy(fullpath, target)
finally:
shutil.rmtree(tmp)
return CacheHelper.dir_hash(whltmp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't the parent dir (tmp) of whltmp be removed before dir_hash(whltmp) is called?

assuming this was a bug and since this wasn't caught by existing tests, adding some extra test coverage here might be appropriate here if you're up for it.

Copy link
Contributor

@kwlzn kwlzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@kwlzn kwlzn merged commit 2bdd9c3 into pex-tool:master Jul 8, 2017
@kwlzn
Copy link
Contributor

kwlzn commented Jul 8, 2017

thanks a ton for the PR @MarkChuCarroll - this will go out in the 1.2.8 release tracked in #389

@kwlzn
Copy link
Contributor

kwlzn commented Jul 10, 2017

the pex version this went out in is now being consumed in pants master @ pantsbuild/pants@cdb4e5e

@dpkp
Copy link

dpkp commented Jul 11, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants