Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify different interpreter for py_binary rules #62

Closed
mouadino opened this issue Jan 28, 2018 · 12 comments
Closed

Specify different interpreter for py_binary rules #62

mouadino opened this issue Jan 28, 2018 · 12 comments

Comments

@mouadino
Copy link

mouadino commented Jan 28, 2018

Hey all,

Problem Statement

It's quite confusing how a user can specify that a given rule need to use Python3 or that rule need Python2, where we have multiple attributes documented like default_python_version, srcs_version and undocumented like :py_interpreter (which can't be used but out of curiosity what is the prefix : doing), however I am still confused regarding what does what, in the end, what I think we should have is a way to tell py_binary to execute using the interpreter X, I know about py_runtime which is supposed to be the future but there are multiple py_binary rules out there with unspecify python version which lead to issues of python3 and 2 compatibilities, beside there is an assumption with the introduced with --python_top argument that all code need one interepreter instead of having local to each target.

Workaround

Currently we use the following workaround, maybe useful for others that suffer same issue:

  • When building default to python2 since that's the safest bet and what most extensions requires.
  • When running a target we dynamically switch to the needed interpreter with something similar to decorator pattern, as described below:

The Decorator Pattern

  1. Create a python file with the following format, let's name it force_python3.py:
import sys
import os


PYTHON3_PATH = os.environ['PYTHON3_PATH']


if __name__ == '__main__':
    argv = sys.argv
    path, _ = os.path.split(argv[0])
    argv[0] = os.path.join(path, '<real main file>')
    argv = [PYTHON3_PATH] + argv

    os.execv(argv[0], argv)

(This assume that we expose a PYTHON3_PATH environment variable with --action_env=PYTHON3_PATH=<path>).

  1. Write py_binary rule by pointing main attribute to the file above:
py_binary(
    name = "my_target",
    srcs = ["main.py", "force_python3.py"],
    main = "force_python3.py"
)

FYI we have a workaround of our own to specify which python to use for pip requirements.

Goals

What I would like to see are the followings:

  • Add an attribute to py_binary and py_test rules to specify which python interpreter to use, it should be a label and point to a py_runtime, if not specified revert to default py_runtime (for backward compatibility too).
    • this means dropping probably --python_top argument b/c the assumption that you need one global Python version is bogus as explained above.
  • py_runtime should be specified in WORKSPACE so that rules that download dependencies can use it, e.g. https://github.com/bazelbuild/rules_python#importing-pip-dependencies.
  • Build on previous point, if py_runtime works in same say way go rules do, which use the new toolchains idea it will be awesome.
    • but keep in mind that's you may want to package the py_binary result somehow e.g. par, pex, docker image and shipping the interpreter with it will be very convenient, or at least not have a hardcoded python path (which is what you get if you point the py_runtime to an host binary path).
@MarkusTeufelberger
Copy link

When building default to python2

I strongly disagree. Python 2 should not even be supported imho, but it definitely should NOT be made default less than 2 years before upstream's extended support ends. Unfortunately a lot of Bazel's internal stuff is still thinking that Python 2 is system default, this would be a good time to update it instead of keeping it on life support even longer. End the suffering.

Build on previous point, if py_runtime works in same say way go rules do, which use the new toolchains idea it will be awesome.

I agree, especially keeping in mind that there are several Python interpreters out there (cPython just being one of them - there's also pypy, jython, cython...) and that it would be nice to be able to easily choose from a variety of them for releases, tests and benchmarks.

@mouadino
Copy link
Author

mouadino commented Jan 31, 2018

Thanks for your feedback, just to clarify when I said:

When building default to python2

I meant as a workaround (unless you don't agree with the workaround if so I would love to hear better alternatives), b/c as you said it's the current state of some Bazel extensions, I would love if we don't have to support Python2 anymore and that what we did in my current company, but since we adopted Bazel we are back to supporting it, and defaulting to Python2 in build seems to do it, for now at least, things will be much worse if there are extensions that require Python3 and other Python2, especially since setting up Python version is a global thing, so in this move toward Python3 please make sure to both are supported until transitions end.

IMHO the most important thing that I am trying to emphasize is to not make python version a global thing, which the mistake that --python_top did, Python is not like Golang in that regard so IMHO we need a way to specify Python version at target/code level, thoughts?

@MarkusTeufelberger
Copy link

I think one of the questions is if a py_binary should be a self contained executable (= even if you don't have Python installed, it should run) or just contain Python code (= using one of the installed Python interpreters).

I guess most people would expect the latter and at that point I would rather see PEP394 (https://www.python.org/dev/peps/pep-0394/) being used instead of some environment variable decorator magic. A #!/usr/bin/env python3 line should be enough.

There is the problem that Bazel itself also depends on Python being present on the host system for some of its scripts/rules and it mostly expects Python2 there (see bazelbuild/bazel#1580). I don't know if that's going to be phased out, now that the skylark language runtime has been rewritten in go and if they might be convinced to rewrite the remaining few helpers in go. Alternatively Bazel would need some way to have a vendored Python binary that it uses to run its internal rule scripts instead of relying on Python being present on its host machine.

The Python rules here seem to suffer a bit from this assumption that Python (unlike e.g. the go binary) anyways has to be present on the build system since it has to run Bazel and Python has to be installed for that. Making sure that the system Python is never used when building Python rules with Bazel is an important step in my opinion and to ensure this, it might also be beneficial to make sure that Bazel itself doesn't actually depend on any Python being installed on the system.

@mouadino
Copy link
Author

mouadino commented Jan 31, 2018

I guess most people would expect the latter and at that point I would rather see PEP394 (https://www.python.org/dev/peps/pep-0394/) being used instead of some environment variable decorator magic. A #!/usr/bin/env python3 line should be enough.

I agree about the self-contained Python IMHO self-contained Python package is nice to have but not necessary, however about using #!/usr/bin/env python3 shebang, overall I think it's the right approach but just to play the devil advocate, the few issues that I can see are:

  • This means that we will depend on PATH environment variable, which may be good and bad, to just give you an example we found out that it's necessary to limit PATH e.g with --action_env=PATH=/bin:/usr/bin:/usr/local/bin in .bazelrc else you may be surprised at what ppl have in their PATH, on the other hand, this just a workaround b/c someone may be using pyenv or such, this is just to say that sandboxing is better but expensive.
  • We should avoid naked python e.g. #!/usr/bin/env python b/c that can be either Python2 or Python3 depending on OS.
  • #!/usr/bin/env python3 can refer to the first python3 found in PATH but that can end up be the wrong Python3 since some features are specific to a version e.g. python3.6.
  • AFAIK it's not enough to put #!/usr/bin/env python3 in your script b/c you need the cooperation of Bazel, in the end, this later control which Python to use check the python_stub_template.txt below.

From the good side, I can see it fixing issues when containerizing python code b/c currently if you use host machine python path in py_runtime and you package the code in a container, then you should have the same python path inside your container b/c python_stub_template.txt template will end up pointing to py_runtime's interpreter_path variable.

Alternatively Bazel would need some way to have a vendored Python binary that it uses to run its internal rule scripts instead of relying on Python being present on its host machine.

Isn't this already feasible with py_runtime already? at least if the python scripts aren't called from WORKSPACE, as in theory, you can do:

new_http_archive(
    name = "python_version",
    urls = ["https://www.python.org/ftp/python/3.6.3/Python-3.6.3.tgz"],
    strip_prefix = "Python-3.6.3",
    build_file_content = """
genrule(
    name = "build_python",
    srcs = glob(["**"]),
    outs = ["python"],
    cmd = "./external/python_version/configure && make && cp python $@",
    visibility = ["//visibility:public"],
)""",
)

The Python rules here seem to suffer a bit from this assumption that Python (unlike e.g. the go binary) anyways has to be present on the build system since it has to run Bazel and Python has to be installed for that.

I can see that you're looking at this from the lenses of Bazel core developer, from a user perspective I can tell you that Python suffers the assumption that by just pointing code to an interpreter and that's it, sadly it's not that easy, besides it seems that Google use Python a bit different, hence the awkward support of Python in Bazel that probably was inherited from Blaze, in theory you should:

  1. Choose the right version (not an issue in Golang, at least not yet :))
  2. make sure that all dependencies are there and the chosen interpreter can find them
    2.1. this part can be annoying if you try to do things differently for import my other two issues https://github.com/bazelbuild/rules_python/issues/created_by/mouadino
    2.2. or if you depend on shared libraries and such
  3. Last packaging, which has been a standing issue in Python community with no official answer but many solution pex, par, https://github.com/takluyver/flit, ...

@MarkusTeufelberger
Copy link

I am not a Bazel core developer or even employed by Google, I'm a normal user...

@mouadino
Copy link
Author

mouadino commented Jan 31, 2018

@MarkusTeufelberger Ahh sorry for the confusion, but then just to reiterate, our experience has shown that the workaround above works for us, most of our issues concerning python version were due to extensions and not Bazel internal, but it's possible that our use cases are different than yours, that's why we see different issues.

@duggelz
Copy link

duggelz commented Feb 16, 2018

Yes, it's good to remember that Python, and software developing in general, happens in a wide variety of environments, and so people's requirements can legitimately be very different. One person's "must have feature" is another's "we would never use this", and that's fine.

Here's a discussion group that may be of interest: https://groups.google.com/forum/#!forum/bazel-sig-python

Here's a document proposing a py_toolchain target for Bazel. I also talk about the evolution of Python usage inside Google, to give some background about why Bazel is the way it is. And I talk about some ways that Bazel can evolve to serve a wider community. Let me know what you think: https://docs.google.com/document/u/1/d/1dQjbbLEJqxUIJWmH5sIZAv-_emnKksDI-VCG1v86dWA/edit#

@franzigeiger
Copy link

Is there already a solution to this feature request? I can't see how all these issues relate to the question or if they avoid the initially mentioned workaround. If it is fixed by now, can you please provide a small example code.

@sha1n
Copy link
Contributor

sha1n commented Jun 16, 2019

I managed to make that work properly for me by specifying python_version="PY3" on py_binary and py_test targets + adding the following to my .bazelrc:

build --incompatible_use_python_toolchains - this tells Bazel to use python toolchains, which should auto-detect your platform python 2 and 3 runtimes and use the appropriate one based what you specify in your target. You should be able to take than further, and specify your runtimes explicitly if you use custom paths or want to download the interpreter from a remote repository (see: https://github.com/bazelbuild/bazel/blob/master/tools/python/toolchain.bzl).

In addition, I added build --noincompatible_py3_is_default in order to avoid python 3 from being selected when no python_version is explicitly specified.

I use Bazel 0.26.0 (I think python toolchains have been introduced in 0.25-ish) and the following rules_python version:

git_repository(
    name = "io_bazel_rules_python",
    commit = "fdbb17a4118a1728d19e638a5291b4c4266ea5b8",
    remote = "https://github.com/bazelbuild/rules_python.git",
)

@thundergolfer
Copy link

Checking the "Goals" section of the original issue description:

Add an attribute to py_binary and py_test rules to specify which python interpreter to use ....

python_version=PY3

py_runtime should be specified in WORKSPACE ...

The Python toolchain is not available at WORKSPACE evaluation time, but with the pip_install repo rule you can specify a label in the python_interpreter_target attribute that is the same label as what's passed to py_runtime's interpreter (Label: optional) attribute. Eg.

pip_install(
    name = "foo_pip",
    python_interpreter_target = "@python3//:bin/python3",
    requirements = "//tools/build/dependencies/python:requirements.txt",
)

Build on previous point, if py_runtime works in same say way go rules do, which use the new toolchains idea it will be awesome.

Bazel Toolchains work with the Python rules now.

shipping the interpreter with it will be very convenient...

With --build_python_zip and an in-build interpreter target you can ship the interpreter within an executable .zip.


Given the above, I think this issue can be closed, but please let me know if there's something still missing and we can create a new more specific issue.

@dgrunwald
Copy link

Unfortunately it's not possible to use python_interpreter_target with py_binary:

no such attribute 'python_interpreter_target' in 'py_binary' rule

Toolchains seem overly complicated for our use-case (we have multiple py_binary, where some py_binaries need a different python interpreter than other py_binaries). In fact I've spent 2 days on this now and still can't figure out at all how to make this work.

@andponlin-canva
Copy link

I would also be interested in this feature being available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests