Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally validate entry point at build time #521

Merged
merged 37 commits into from
Jul 23, 2018
Merged

Optionally validate entry point at build time #521

merged 37 commits into from
Jul 23, 2018

Conversation

wisechengyi
Copy link
Contributor

@wisechengyi wisechengyi commented Jul 9, 2018

Resolves #508

Problem

Current behavior allows building a pex that cannot execute:

[omerta ~]$ pex requests -e qqqqqqqqqqqq -o test.pex
[omerta ~]$ ./test.pex
Traceback (most recent call last):
  File ".bootstrap/_pex/pex.py", line 367, in execute
  File ".bootstrap/_pex/pex.py", line 293, in _wrap_coverage
  File ".bootstrap/_pex/pex.py", line 325, in _wrap_profiling
  File ".bootstrap/_pex/pex.py", line 410, in _execute
  File ".bootstrap/_pex/pex.py", line 468, in execute_entry
  File ".bootstrap/_pex/pex.py", line 473, in execute_module
  File "/Users/kwilson/Python/CPython-2.7.13/lib/python2.7/runpy.py", line 182, in run_module
    mod_name, loader, code, fname = _get_module_details(mod_name)
  File "/Users/kwilson/Python/CPython-2.7.13/lib/python2.7/runpy.py", line 107, in _get_module_details
    raise error(format(e))
ImportError: No module named qqqqqqqqqqqq

Solution

Verify entry point at build time. E.g. a.b.c:m means we will try to do from a.b.c import m in a separate process.

Result

$ find hello
hello
hello/test.py
hello/tree.py

# Invalid module
$ python bin.py -D hello -e invalid.module  -o x.pex --validate-entry-point
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: No module named invalid.module
Traceback (most recent call last):
  File "bin.py", line 758, in <module>
    main()
  File "bin.py", line 743, in main
    pex_builder.build(tmp_name, verify_entry_point=options.validate_ep)
  File "/Users/yic/workspace/pex/pex/pex_builder.py", line 529, in build
    self.freeze(bytecode_compile=bytecode_compile, verify_entry_point=verify_entry_point)
  File "/Users/yic/workspace/pex/pex/pex_builder.py", line 514, in freeze
    self._verify_entry_point()
  File "/Users/yic/workspace/pex/pex/pex_builder.py", line 263, in _verify_entry_point
    raise self.InvalidEntryPoint('Failed to:`{}`'.format(import_statement))
pex.pex_builder.InvalidEntryPoint: Failed to:`import invalid.module`

# invalid method
$ python bin.py -D hello -e test:invalid_method  -o x.pex --validate-entry-point
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: cannot import name invalid_method
Traceback (most recent call last):
  File "bin.py", line 758, in <module>
    main()
  File "bin.py", line 743, in main
    pex_builder.build(tmp_name, verify_entry_point=options.validate_ep)
  File "/Users/yic/workspace/pex/pex/pex_builder.py", line 529, in build
    self.freeze(bytecode_compile=bytecode_compile, verify_entry_point=verify_entry_point)
  File "/Users/yic/workspace/pex/pex/pex_builder.py", line 514, in freeze
    self._verify_entry_point()
  File "/Users/yic/workspace/pex/pex/pex_builder.py", line 263, in _verify_entry_point
    raise self.InvalidEntryPoint('Failed to:`{}`'.format(import_statement))
pex.pex_builder.InvalidEntryPoint: Failed to:`from test import invalid_method`

# without the flag, invalid method still works
$ python bin.py -D hello -e test:invalid_method  -o x.pex 

@wisechengyi
Copy link
Contributor Author

Hi @kwlzn please kindly take an initial look for sanity. I am still working on tests, so will ping when it's fully ready.

Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a useful feature but I think there are a few fundamental issues here to address.

@@ -238,6 +303,7 @@ def set_entry_point(self, entry_point):
The entry point may also be specified via ``PEXBuilder.set_executable``.
"""
self._ensure_unfrozen('Setting an entry point')
self._verify_entry_point(entry_point)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a builder, order of building is not guaranteed and you can't verify until after a freeze since not all requirements or files may yet be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any pointer for a better place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In freeze, where you know there can be no more code or requirement additions.

.format(ep_method, file_candidates))

def _gather_file_candidates(self, entry_point_elements):
# 1. Find the files matching a.b.c:m, which is a/b/c.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also be a/b/c/__init__.py, ... or be a dynamically generated module! I think you really have to try to import the symbol after a freeze to avoid all the corner cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing the code feels quite sketchy, even out of process, because they could have bad side effect or even security concerns.

How do you feel about putting verifying entry point behind an option, stating that it would only check for static method in

  • a/b/c.py
  • a/b/c/__init__.py
    and allow people to skip static check by disabling it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If validation is opt-in, it seems like import would be the way to go. The user is going to run the pex after all and suffer the side-effects soon anyhow when they do so! Basically, this is forced to be a half measure one way or the other, so the choice comes down to avoiding side-effects at build time (they'll happen at runtime) for the X% of cases that have this pathology, or not supporting Y% of cases that have bizarre dynamic module generation. I'm really not sure what these percentages are or what is the better way to go. Import just seems attractive since it's robust: ie it's what pex does at runtime. Parsing is new logic.

Copy link
Contributor Author

@wisechengyi wisechengyi Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair. Importing could also get around ast parsing difference between python 2 and 3.

So I am going to do the importing, and then put it behind an option with default disabled.

Copy link
Contributor Author

@wisechengyi wisechengyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsirois ready for a second round sanity check. test still incoming.

if bytecode_compile:
self._precompile_source()
self._frozen = True

def build(self, filename, bytecode_compile=True):
def build(self, filename, bytecode_compile=True, verify_entry_point=False):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the interface pants will call, correct?

changing it in a way to keep backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but see above.

Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but a cleanup suggestion re subprocess / Executor and a bigger change idea to consider.

else:
raise self.InvalidEntryPoint("Fail to parse: `{}`".format(entry_point))

args = [self._interpreter.binary, '-c', import_statement]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Execution via subprocess was consolidated to Executor, so this code should use that instead of subprocess directly.

A bit more true to the runtime environment the entry point will be executed in would be something like:

with named_temporary_file() as fp:
  fp.write(import_statement)
  fp.close()
  result = PEX(self.chroot().path()).run([fp.name], env={'PEX_INTERPRETER': '1'})
  if result != 0:
    raise self.InvalidEntryPoint('Failed to validate entrypoint {}'.format(entry_point))

This though unfortunately creates a circular PEX <-> PEXBuilder dep. Better I think to lift the check to the PEX constructor which could be plumbed both from pex.bin.pex and from API users (Pants).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the verification step into class PEX(object):

def __init__(self, ..., verify_entry_point=False):

if bytecode_compile:
self._precompile_source()
self._frozen = True

def build(self, filename, bytecode_compile=True):
def build(self, filename, bytecode_compile=True, verify_entry_point=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but see above.

Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the one non-isort/non-style CI shard failure is legit but not related to this PR.

LGTM mod one latent bug pointed out re not using the builder's interpreter to perform the import check and completion of test work.

pex/pex.py Outdated
ep_method = ep_split[1]
import_statement = 'from {} import {}'.format(ep_module, ep_method)
else:
raise InvalidEntryPoint("Fail to parse: `{}`".format(entry_point))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failure or Failed would make more sense here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

pex/bin/pex.py Outdated
@@ -741,7 +753,7 @@ def main(args=None):
pex_builder.freeze()

log('Running PEX file at %s with args %s' % (pex_builder.path(), cmdline), v=options.verbosity)
pex = PEX(pex_builder.path(), interpreter=pex_builder.interpreter)
pex = PEX(pex_builder.path(), interpreter=pex_builder.interpreter, verify_entry_point=options.validate_ep)
Copy link
Member

@jsirois jsirois Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

pex/bin/pex.py Outdated
default=False,
action='store_true',
help='Validate the entry point by importing it in separate process. Warning: this could have '
'side effect. For example, entry point `a.b.c:m` will translate to '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'side effects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

pex/bin/pex.py Outdated
@@ -732,6 +741,9 @@ def main(args=None):
tmp_name = options.pex_name + '~'
safe_delete(tmp_name)
pex_builder.build(tmp_name)
if options.validate_ep:
Copy link
Member

@jsirois jsirois Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could be more uniform, more correct and fail faster with:

pex_builder = build_pex(reqs, options, resolver_options_builder)
pex_builder.freeze()
pex = PEX(pex_builder.path(), interpreter=pex_builder.interpreter, verify_entry_point=options.validate_ep)

if options.pex_name is not None:
  ...

As things stand, in the options.pex_name branch, the pex is zipped up before the entrypoint validation which could be wasted work. Worse though is that the builder's custom interpreter, if any, is not used. This could be a problem if the current running interpreter does not match what the built pex needs and syntax differs (a 2/3 split).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved things up to fail fast.

pex/pex.py Outdated
@@ -520,3 +522,31 @@ def run(self, args=(), with_chroot=False, blocking=True, setsid=False, **kwargs)
stderr=kwargs.pop('stderr', None),
**kwargs)
return process.wait() if blocking else process

def do_entry_point_verification(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given comments above re uniformity in performing the validation, I'd make this _private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

@wisechengyi
Copy link
Contributor Author

wisechengyi commented Jul 20, 2018

looks like PEX.run would ignore -c arg.

$ python '/var/folders/yt/pxdd3gp96530xh_116kskmkh0000gn/T/tmp8rQWfk' -c 'print 123'
hello

executed the entry point (print hello) rather than -c which prints 123.
Going to try

with named_temporary_file() as fp:
  fp.write(import_statement)
  fp.close()
  result = PEX(self.chroot().path()).run([fp.name], env={'PEX_INTERPRETER': '1'})
  if result != 0:
    raise self.InvalidEntryPoint('Failed to validate entrypoint {}'.format(entry_point))

as suggested earlier.

@wisechengyi
Copy link
Contributor Author

Updated. Tests incoming.

@wisechengyi
Copy link
Contributor Author

Tests added. Ready for final review.

Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yi!

@wisechengyi wisechengyi changed the title Validate entry point at build time Optionally validate entry point at build time Jul 23, 2018
@wisechengyi wisechengyi merged commit d1f946c into master Jul 23, 2018
@wisechengyi wisechengyi deleted the vm_pr branch July 23, 2018 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants