Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env: fix islation when pyvenv.cfg is present #99

Closed
wants to merge 5 commits into from

Conversation

FFY00
Copy link
Member

@FFY00 FFY00 commented Sep 11, 2020

Signed-off-by: Filipe Laíns [email protected]

@pganssle
Copy link
Member

pganssle commented Sep 11, 2020

WRT the approach taken here, it'd be good if @gaborbernat could weigh in on this, because I am far from confident in my understanding of how virtual environment magic works.

I do think that it would be a good idea to add at least one test that actually covers the isolation behavior, though. The current test only ensures that the monkey-patch works (which I would expect to change if we can come up with a better solution), but won't detect regressions if something else breaks the behavior.

Creating a basically empty project and installing it into a virtualenv, then trying to build a project from within that virtualenv that has import <projectname> in the setup.py and testing that it fails in the right way should work. There are conceptual simplifications (some of which don't involve pulling in other packages), but they involve more complicated test rigs (e.g. build a small project with an in-tree backend that executes a random file, have the "imported project" build itself without using setuptools, etc). Might be worth building out some of those test rigs anyway, so that we aren't leaning so heavily on integration tests, but I think #97 (and #98) are probably important enough that it's not worth delaying a solution just to avoid integration tests.

Copy link
Contributor

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a typo in the title (islation) and IMHO I'd like a better description on why this is needed, what are the moving parts within python that allow it to happen and why we need to alter some of the knobs you do. The CI is failing also.

WRT the approach taken here, it'd be good if @gaborbernat could weigh in on this, because I am far from confident in my understanding of how virtual environment magic works.

I'd wait with judgment for @FFY00 to explain more in-depth why he took this approach, and what problems he's trying to fix and how.

src/build/env.py Outdated Show resolved Hide resolved
src/build/env.py Outdated Show resolved Hide resolved
src/build/env.py Outdated Show resolved Hide resolved
src/build/env.py Outdated Show resolved Hide resolved
src/build/env.py Outdated Show resolved Hide resolved
src/build/env.py Outdated Show resolved Hide resolved
src/build/env.py Show resolved Hide resolved
Copy link
Contributor

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong button, sorry!

@FFY00 FFY00 force-pushed the fix-isolation-pyvenv branch 2 times, most recently from d3f98d4 to 17e167f Compare September 12, 2020 19:12
Copy link
Contributor

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're isolation logic basically tries to create a virtual environment, and breaks in a lot of situations. That might be ok if this was a personal project, but for a pypa one you need to support all major interpreters. Maybe just use virtualenv instead?

self._place_path_relative(sysconfig.get_config_var('LIBPL'))

'''
We use PYTHONHOME to relocate the Python installation to our environment,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need the pyvenv.cfg only on python 3.4+. Copy on python 2 will not work. We should raise if that happens. Same on python 3 that's signed (macos+ windows store). We basically don't support those environments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? We are not using pyvenv.cfg.

@FFY00
Copy link
Member Author

FFY00 commented Sep 12, 2020

Can you describe how it breaks?

@gaborbernat
Copy link
Contributor

For starters windows store python can't be copied for security reasons. So when symlink is off for the is os this will break. Furthermore as far as I read cpython code it checks all parent folders, not just the immediate parent for pyvenv.cfg. it's more - complicated but I'm on my phone now.

@FFY00
Copy link
Member Author

FFY00 commented Sep 12, 2020

For starters windows store python can't be copied for security reasons. So when symlink is off for the is os this will break.

The number of systems without symlinking support is already pretty small, so this won't affect many users. I am okay with not supporting this use case.

How does virtualenv deal with this anyway?

Furthermore as far as I read cpython code it checks all parent folders, not just the immediate parent for pyvenv.cfg. it's more - complicated but I'm on my phone now.

That is not documented, nor is the behavior I see. This could change between os, but on Linux, CPython is only checking the same directory and one directory up.

I would be happy to discuss this further. virtualenv is kind of a heavy dependency, so I am reluctant to pull it.

@gaborbernat
Copy link
Contributor

If you say not supporting windows without symlink is ok I'll need to retract my support for this joining pypa.

@FFY00
Copy link
Member Author

FFY00 commented Sep 12, 2020

If you say not supporting windows without symlink is ok I'll need to retract my support for this joining pypa.

*supporting Windows without symlink support when the interpreter cannot be copied (which is the case of Python distributed by the Windows store)

The core functionality of the package still works, you just can't build in isolation. We can actually disable the Python interpreter copy behavior outside virtual environments, which would running python-build from a virtual environment as an additional requirement above.

Please let me know how virtualenv solves this issue, we may be able to apply the same approach here.

@gaborbernat
Copy link
Contributor

gaborbernat commented Sep 12, 2020

Per PEP 517 building in isolated environments is considered core functionality. Maybe you can get away with using venv instead of virtualenv to be fair here. And then have some hack like this for python 2, where copy only might be enough. virtualenv solution is hundreds os lines of code. Rewriting e.g. mach-o files is non trivial.

@pganssle
Copy link
Member

It definitely would be nice to avoid pulling in virtualenv as a dependency (since it exposes console scripts and a bunch of stuff like that), but venv is in the standard library and AFAICT is basically a stripped-down version of virtualenv anyway (that satisfies all of our use cases).

If it's significantly non-trivial to get the isolation logic right, maybe we should use venv on Python 3 and either add a Python 2-only dependency on virtualenv or a hacked-together "best effort" version for Python 2 support.

With regards to platform support, I don't think PyPA has any particular requirements for universal support for all features for all platforms — I'll note that pep517 doesn't support this kind of isolation properly at all. If we're not going the venv route, do we have any numbers on how many users are on a platform without symlinking support and unable to copy their interpreter? This isn't a situation like pip where you really need to support every random platform out there because it's a core workflow. This tool is intended to be used for building artifacts for distribution (for personal consumption you can do pip wheel to create a wheelhouse). We should aim for as wide support as possible (since you need to be able to run it on all the platforms you would distribute wheels for), but if this would impact very few users and there's an easy enough workaround (install and run from a different interpreter, create your own isolated environment to run this in, use a github action, etc), I don't think it's the end of the world.

@gaborbernat
Copy link
Contributor

gaborbernat commented Sep 12, 2020

I think using venv for python 3, and some hacked together for python 2 would be the easiest and best investment.

Paul pip wants to use this tool to build packages, so it must support everything because pip needs to.

@FFY00
Copy link
Member Author

FFY00 commented Sep 12, 2020

virtualenv should have exactly the same issue we do, so what I am interested here is to understand if the way it solves the issue is complex enough that would make sense for us to pull it as a dependency. If it isn't complicated we could possibly do the same thing here.

Both venv and virtualenv work by giving you a custom Python interpreter. We have already established that overriding sys.executable is bad and we want to move away from it, so I see no future in us using them. Any effort put moving to those solutions would be better used on fixing the current workflow in the upstream IMHO. If we want this to be solved properly, CPython will have to gain a way to be reliably set up with custom paths without depending on the interpreter. I don't think there is any way around that, please let me know if I missed something.

Now, I think the use cases that wouldn't be supported are obscure enough for us to justify it. And if we (or more likely, I) do manage to fix the current workflow in the Python interpreter, they would start to be supported.

As a maintainer, I think it is very important to keep in mind the scope of the project and understand that you can't do everything. I think supporting the use case in question will block progress for the project. If I had unlimited time, I would maintain all backend approaches, maximizing the use cases we support, but I don't. If anyone does, please let me know, I would be more than glad to have that.
Right now I think I should optimize my time, and where I think it is best used is in trying to fix this issue properly.

To finish, I just wanted to note that this is not a run, it's a marathon. Not supporting use cases right now doesn't mean we can't support them in the future. We need to save our breath to try to pass the finish line, not waste it trying to go fast. Sorry for this corny metaphor 😅, that is just how I feel.

@FFY00
Copy link
Member Author

FFY00 commented Sep 12, 2020

Oh, I think it's import point this out, in case anyone is forgetting, python-build would still be able to build packages in the environments in question, it's just that it won't be able to build in an isolated environment.

@gaborbernat
Copy link
Contributor

gaborbernat commented Sep 14, 2020

Both venv and virtualenv work by giving you a custom Python interpreter.

Can you ellaborate on this, what does custom covers here?

We have already established that overriding sys.executable is bad and we want to move away from it, so I see no future in us using them

Why do you need to override?

If we want this to be solved properly, CPython will have to gain a way to be reliably set up with custom paths without depending on the interpreter.

Not sure I follow, but we need to support Python in general, not just CPython. Please don't exclude other Python implementations.

And if we (or more likely, I) do manage to fix the current workflow in the Python interpreter, they would start to be supported.

Unlikely without tens of lines of extra code for every release type (version constrained, etc). The main reason virtualenv moved away from single file architecture is because they are way too many variations, and you end up with hundereds lines of code.

As a maintainer, I think it is very important to keep in mind the scope of the project and understand that you can't do everything

Exactly. The scope of this project is to build a python package. Creating isolated python environments is a goal for virtual envrionment creation projects. So let's not try to adddress that within this project, but instead delegate that job to a tool that does this (venv/virtualenv). This means we can spend more time on addressing the project goals, less trying to get support for various python releases.

Right now I think I should optimize my time, and where I think it is best used is in trying to fix this issue properly.

This is exactly my issue raised. IMHO you're trying to fix a dumpster fire, and I'm just warning if you go down this path you'll end up with few more of this. I'd recommend instead avoiding them by delegating isolation part to tools who are dedicated to do this.

Not supporting use cases right now doesn't mean we can't support them in the future.

If you just switch to venv/virtualenv (I'd say use venv on py3.4+, virtualenv otherwise - pulling in virtualenv on python 2 seems good enough compromise considering it's EOL) you'll no longer need to revisit this issue at any point in the future. You solve this and all future isolation issues in one go. Seems the efficient thing to do.

Oh, I think it's import point this out, in case anyone is forgetting, python-build would still be able to build packages in the environments in question, it's just that it won't be able to build in an isolated environment.

Violating the requirement of PEP-518 should not be taken lighly IMHO. Building in isolated environment is not optional per that PEP.


Passing non PEP 508 strings will result in undefined behavior, you
*should not* rely on it. It is merely an implementation detail, it may
change any time without warning.
'''
if not requirements:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that you are calling:

subprocess.check_call([sys.executable, '-m', 'ensurepip'], cwd=self.path)

This will install pip and setuptools. IMHO this is bad because now setuptools is by default provisioned and not specifying it within your build requires will silently pass. We should ensure only pip is installed in there 🤔 . Perhaps we should uninstall if not specified within the build-requires list. On handle the easy to make error of forgetting setuptools as build-require but build still passes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I hadn't noticed before. We should be able to just delete setuptools* from site-packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to just delete setuptools* from site-packages.

I don't think that's wise, you'd be relying there on implementation details of what pip uninstall does. Probably better to just use pip uninstall directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WTF? Why does ensurepip install setuptools?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in a source only non PEP-518 world pip install would break without a local setuptools.. Is my guess.

@FFY00
Copy link
Member Author

FFY00 commented Sep 14, 2020

Can you ellaborate on this, what does custom covers here?

They give you their own Python interpreter.

Why do you need to override?

Because sys.executable might be used withing the code, in fact this is exactly what pep517 does.

Not sure I follow, but we need to support Python in general, not just CPython. Please don't exclude other Python implementations.

Sorry, in Python.

Unlikely without tens of lines of extra code for every release type (version constrained, etc). The main reason virtualenv moved away from single file architecture is because they are way too many variations, and you end up with hundereds lines of code.

Can you elaborate, I don't follow? Unless Python implementations are not following the PEP, I see no reason to have custom code for each release.

Exactly. The scope of this project is to build a python package. Creating isolated python environments is a goal for virtual envrionment creation projects. So let's not try to adddress that within this project, but instead delegate that job to a tool that does this (venv/virtualenv). This means we can spend more time on addressing the project goals, less trying to get support for various python releases.

The scope of the project is to provide a CLI tool to build Python packages, that is easy to bootstrap. I understand that other people may not care about bootstrapability, and just vendor dependencies, but I do and that is exactly the reason I created this project. I want to make sure that Python environments are easily bootstrapble, preferring following all that PEP 517 says. You might also notice the 'correct' in the description, which means that we should do things correctly, I've been burned many times by projects not caring about this, especially on such critical components as build systems.

We shouldn't be relying on hacks such as overriding sys.executable, the documentation does not say it is allowed, it could be read-only for all I know. In practice, both on CPython and PyPy it can be overridden, but that might change in the future. Even though I think it relatively low risk, I don't want to be relying on that. So I won't make any long term decisions that explicitly requires us to rely on that 😕

The long term plan can't be to use venv and virtualenv. But we could use them venv and virtualenv short term, I am not arguing with that. The long term plan would be to fix this use-case in Python itself.

The current short term effects are that a few obscure use cases are not supported, and I don't think that warrants me taking the time to move to venv/virtualenv. I would rather take that time to work on fixing the workflow in Python, it takes time and as soon as we do that, as soon we will have this problem fixed.

With that said, if anyone does want to take the time to change the code to venv/virtualenv, please be my guest, I would really appreciate it. But that will not be the long term plan.

This is exactly my issue raised. IMHO you're trying to fix a dumpster fire, and I'm just warning if you go down this path you'll end up with few more of this. I'd recommend instead avoiding them by delegating isolation part to tools who are dedicated to do this.

My position on this is to wait, if the maintenance burden becomes higher than the burden of moving the code to venv/virtualenv, I will consider doing that.

If you just switch to venv/virtualenv (I'd say use venv on py3.4+, virtualenv otherwise - pulling in virtualenv on python 2 seems good enough compromise considering it's EOL) you'll no longer need to revisit this issue at any point in the future. You solve this and all future isolation issues in one go. Seems the efficient thing to do.

As I just said above, I think that would be a fine short term approach. You are free to do it.

Violating the requirement of PEP-518 should not be taken lighly IMHO. Building in isolated environment is not optional per that PEP.

I do not take it lightly.

@FFY00
Copy link
Member Author

FFY00 commented Sep 14, 2020

Also, we are not on 1.0.0, we are on 0.0.4, having rough edges is fine.

@gaborbernat
Copy link
Contributor

Because sys.executable might be used withing the code, in fact this is exactly what pep517 does.

If we invoke the build operations in a subprocess of the virtual environment (isolated) this is not a problem. You already do a lot of subprocess calls, so see no reason why not do it here too.

They give you their own Python interpreter.

Well it's not really their own. It's a python environment that tweaks enough parts of the global python to hide away globally installed packages. Which also sounds to me exactly what an isolated python environment is.

I understand that other people may not care about bootstrapability

I've suggested to use venv, that's part of the standard library so should not need boostrapping. I did say virtualenv for python 2, but if you'd not want that we can use some hackish solution like here.

You might also notice the 'correct' in the description, which means that we should do things correctly, I've been burned many times by projects not caring about this, especially on such critical components as build systems.

I'm complelty on board with you on this. This is why I'm suggesting a better isolation system than what you have here. Because what's here is not correct when symlinks are not available, or the python executable is not read-able (but can be executed).

The long term plan can't be to use venv and virtualenv.

Why not? Those systems have been designed especially to create isolated build environments, and venv does not require any custom boostraping.

My position on this is to wait, if the maintenance burden becomes higher than the burden of moving the code to venv/virtualenv, I will consider doing that.

I guess we fundamentally disagree on how one should go about supporting python interpreters. I prefer designed to work upfront, while you're aiming at wait until enough people complain to make your effort worthwhile. From my POV people trying to use this tool, and then finding out it does not work because it does not support their use cases is hurtful to the whole packaging ecosystem, because another thing that they should use but they cannot.

@pganssle
Copy link
Member

The scope of the project is to provide a CLI tool to build Python packages, that is easy to bootstrap. I understand that other people may not care about bootstrapability, and just vendor dependencies, but I do and that is exactly the reason I created this project. I want to make sure that Python environments are easily bootstrapble, preferring following all that PEP 517 says. You might also notice the 'correct' in the description, which means that we should do things correctly, I've been burned many times by projects not caring about this, especially on such critical components as build systems.

I can understand virtualenv being a problem for bootstrapping, but venv is in the standard library. The reasons for wanting to bootstrap and devendor everything have to do with making sure you can apply (security, mostly) patches in a central location, which is actually a major plus for using venv — CPython is way more likely to get security bug reports and fixes like that than this small project, which would argue in favor of delegating to venv.

We shouldn't be relying on hacks such as overriding sys.executable, the documentation does not say it is allowed, it could be read-only for all I know.

Yeah, @gaborbernat to be clear the sys.executable hack was to make it so that we aren't blocked on this issue in pep517. None of us are happy with it and it cannot be the long-term plan.

The long term plan can't be to use venv and virtualenv. But we could use them venv and virtualenv short term, I am not arguing with that. The long term plan would be to fix this use-case in Python itself.

Anything we fix in Python would only hit Python 3.10+. Assuming python-build continues to support anything not EOL upstream (and we're already actually way past EOL on a few Python versions we're supporting), I don't think we'd be able to take advantage of anything of this nature for ~4-5 years, so I think "long term" may be very long indeed.

The current short term effects are that a few obscure use cases are not supported, and I don't think that warrants me taking the time to move to venv/virtualenv. I would rather take that time to work on fixing the workflow in Python, it takes time and as soon as we do that, as soon we will have this problem fixed.

With that said, if anyone does want to take the time to change the code to venv/virtualenv, please be my guest, I would really appreciate it. But that will not be the long term plan.

I think we're already doing significantly better than pep517, and that this PR is a net improvement for this package. We definitely must not let the perfect be the enemy of the good here. I wonder if this might be a good compromise:

  1. For the easiest cases, we maintain our own lightweight isolating virtual environment — for people who care about bootstrapping, it's easy enough to make their use case the easy case, since they control their build environments (and are generally linux distros anyway — plus build can be built as a universal wheel, so you can always bootstrap build in a specific environment and then use it to build other stuff on your constrained environments).
  2. For the tougher cases, we fall back to venv on Python 3 and either virtualenv or just no significant isolation support on Python 2 (we can raise a warning to that effect).

The upside of this is that it should help us avoid the complicated logic involved in supporting dozens of different platforms while also giving most users the benefits of not using venv (we could also support an opt-in option to use venv unconditionally — which would certainly make it easier to test if it's hard to generate these obscure platforms). The downside here is that it's at least predicated on our ability to detect that we're in an unusual situation so we know when to fall back — and detecting that may end up being almost as complicated as supporting all the other platforms.

Another option that is possibly more palatable would be to default to using venv, but add an option to use the lighter weight option for people who want to avoid it for whatever reason.

Copy link
Contributor

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't consider my comments as deal breakers, but just notting the project as it stands will (very likely) not work (correctly) with Windows Store or macOs framework python implementations.

@FFY00
Copy link
Member Author

FFY00 commented Sep 14, 2020

I don't think you'll be able to fix without adding few hunders of lines of code, and even then you only support what you know. IMHO the only viable option here is to move to venv. Once you move to venv I'm not sure @FFY00 PEP is needed anymore. Also @FFY00 I think you're confusing PEP-517 and PEP-518. Build isolation comes with the later not the earlier.

I agree with Bernát here, it would be very difficult to distinguish where to use venv and where not. I think we should just use venv.

What would you propose in that future PEP to solve the problem?

Essentially I think we should have an environment variable to disable pyvenv.cfg so that it doesn't clash with PYTHONHOME, restoring the ability to rely on PYTHONHOME. But I would also like to propose to have the sysconfig variables configurable via environment variables. This would result in a weaker alternative to pyvenv.cfg.

The goal is to allow spinning up custom environments completely independently on the interpreter.

sysconfig paths:

$ python -m sysconfig
Platform: "linux-x86_64"
Python version: "3.8"
Current installation scheme: "posix_prefix"

Paths:
	data = "/usr"
	include = "/usr/include/python3.8"
	platinclude = "/usr/include/python3.8"
	platlib = "/usr/lib/python3.8/site-packages"
	platstdlib = "/usr/lib/python3.8"
	purelib = "/usr/lib/python3.8/site-packages"
	scripts = "/usr/bin"
	stdlib = "/usr/lib/python3.8"

...

With this we should be able to simply start a process from sys.executable providing something like PYTHONIGNOREVENV=1, PYTHONPLATLIB=my_env/site-packages, PYTHONPURELIB=my_env/site-packages, PYTHONDATA=my_env/data and PYTHONSCRIPTS=my_env/bin. PYTHONIGNOREVENV may not even be needed here, but I still think it's a good idea.

What do you think?

@FFY00 FFY00 mentioned this pull request Sep 14, 2020
@gaborbernat
Copy link
Contributor

Not sure I understand how would it be useful. How would you provision pip for a such isolation, without making the entire system packages available🤔maybe would help to create a no packages installed venv via just env vars but creating virtual environments without pip is not expensive 🤷‍♂️so IMHO would cost more than benefit for interpreter maintenance.

@FFY00
Copy link
Member Author

FFY00 commented Sep 15, 2020

Not sure I understand how would it be useful. How would you provision pip for a such isolation, without making the entire system packages available 🤔 maybe would help to create a no packages installed venv via just env vars but creating virtual environments without pip is not expensive 🤷‍♂️ so IMHO would cost more than benefit for interpreter maintenance.

I don't see any reason why ensurepip would not work since I am not touching stdlib and platstdlib, just the site packages.

The benefit here would be that we can run a virtual environment with any interpreter, that is the whole point. It would also solve the Windows store Python issue, as we wouldn't need to copy the executable anywhere, no weird workarounds needed.

@gaborbernat
Copy link
Contributor

I guess could work... But arguably only useful for tools, as would be impractical to use by end users. I'm a bit skeptical core devs would accept this kind of fine tunability while you can already solve the problem with just invoking virtualenv. The benefits over direct virtualenv invocation overall are very low the way I see it.

@FFY00
Copy link
Member Author

FFY00 commented Sep 15, 2020

Yes, only useful for tools. virtualenv would indeed be available but the problem I see here is that bootstraping virtual environments when we have some limitations regarding the interpreter, like your example of the Windows store, is very tricky. virtualenv is great and I am very glad it exists but I would like to see virtual environments being easily bootstrapable by anyone. The current status quo is use virtualenv or have pain.

Being able to create virtual environments without any dependency on what Python interpreter is being run would really help that.

Btw, I am still not clear on how virtualenv solves the issue of using Windows store Python when symlinks are not available. I would really appreciate if you could clarify that for me.

The main issue I see here is how most console entrypoints are generated. For tools like pipx that want to run an entrypoint from one specific environment, they would have to clear the environment before running the Python process. This is prefectly doable, but could be annoying.

I think it is at least worth exploring and getting feedback from other people. Do you agree?

@gaborbernat
Copy link
Contributor

Being able to create virtual environments without any dependency on what Python interpreter is being run would really help that.

For Python 3 you can just use venv. For Python 2 you need virtualenv, but let's be realistic any PEP we'd come up would be python 3 only.

Btw, I am still not clear on how virtualenv solves the issue of using Windows store Python when symlinks are not available. I would really appreciate if you could clarify that for me.

It doesn't solve it. Delegates the job to the window store pythons venv module, that knows how to handle himself. The venv module basically looks up a redirect script and uses that as python executable.

I think it is at least worth exploring and getting feedback from other people. Do you agree?

If you have the bandwidth for it sure. From my side not worth the gain considering the effort one needs to put in, but feel free to pursue that path.

@layday
Copy link
Member

layday commented Sep 16, 2020

I'm not a packaging guru (by any measure!) but I don't understand what we're trying to do here. As I understand it, build isolation means that the builder should not have access to system- and user-level packages (anything that is not part of the stdlib). To my knowledge, you can accomplish this in two ways: minimally, by overriding PYTHONPATH (preferably in combination with the -S flag to disable the loading of the site module); or you can construct a virtual environment to take care of all of the minutiae for you. Following that, you would invoke pip, after having installed it at a temporary location using ensurepip or venv (which wraps ensurepip) or the virtualenv package, to install the build requires, at another temporary location, on the PYTHONPATH. Anything else would be outside the scope of build isolation.

@FFY00
Copy link
Member Author

FFY00 commented Sep 16, 2020

We have no control over how the Python subprocess in pep517 is called so we can't use the -S flag. Also, the backend or its dependencies might install needed console scripts or package data so we need to make sure the sysconfig paths are correct, or we might have trouble. So essentially, we need a virtual environment.

@layday
Copy link
Member

layday commented Sep 16, 2020

Thanks for explaining. Using a virtual env sounds like a far better option than trying to account for what looks like a hotchpotch of edge cases in this package.

@ofek
Copy link

ofek commented Sep 23, 2020

@gaborbernat Are there plans to port your virtualenv refactor to venv? If not, I would strongly encourage us to not use the latter simply because of the abysmal performance, especially on Windows.

@gaborbernat
Copy link
Contributor

@gaborbernat Are there plans to port your virtualenv refactor to venv? If not, I would strongly encourage us to not use the latter simply because of the abysmal performance, especially on Windows.

That's for the venv maintainers to decide onto. Note venv uses ensurepip to provision pip/setuptools, so in all likelyhood it's up to the ensurepip maintainers at the end of day. This seems to be @ncoghlan @pfmoore @dstufft. So if they agree that the path taken by virtualenv is preferable, can make a PR against CPython (and PyPy respectively).

@pfmoore
Copy link
Member

pfmoore commented Sep 23, 2020

Can someone summarise the issue here, and what it is that is being suggested ensurepip (or venv) needs to do? I don't have time right now to read the whole of this thread, but I'm happy to comment on something specific.

@gaborbernat
Copy link
Contributor

off-topic for this discussion but currently ensurepip uses pip to install pip+setuptools. @ofek was suggesting it should do the same as virtualenv does: extract wheel himself (potentially via the install project), and cache it. So that venv virtual environment creation time is on par with virtualenv.

@pfmoore
Copy link
Member

pfmoore commented Sep 23, 2020

OK, I'd say that's not likely to be something we'd add to ensurepip. Once installer is mature, maybe it would be worth using to do the installs - but why add an extra component to the core when we already have a wheel installer included in the form of pip?

More likely would be improvements to speed up pip. We currently use --no-cache --no-index --find-links so we should be grabbing the wheels direct and just unpacking them. Very recent versions of pip unpack wheels direct to the target, rather than unpacking to a temp location and copying into place, so that will help when that version goes into core. Otherwise, someone would need to come up with a specific change (probably around the flags we use to call pip) and demonstrate the improvement, in a PR, I guess. We wouldn't object to changes that improve venv creation time, but someone needs to do the work.

@gaborbernat
Copy link
Contributor

why add an extra component to the core when we already have a wheel installer included in the form of pip?

@ofek answers this.

simply because of the abysmal performance, especially on Windows.

We wouldn't object to changes that improve venv creation time, but someone needs to do the work.

Your words above do object against this. And instead redirect people to make pip faster. Note this later is likely by a few order of magnitudes harder task, due to the legacy/complexity of pip compared to a simple wheel unistall/cache system.

@pfmoore
Copy link
Member

pfmoore commented Sep 23, 2020

@gaborbernat OK, sorry if I'm missing things - as I say, I don't have time to read the whole thread, so I'm somewhat relying on the summary.

To repeat what I was trying to say - what is the proposed change, and do we have a demonstration that it'll increase venv creation speed sufficiently to justify any additional costs and overheads? I suspect a big proportion of the speed penalty is pip unpacking to a temp location and copying. Has anyone confirmed whether the new version of pip with unpack in place improves things here and by how much? That would be the first thing to check.

The proposal to add the installer module has a number of problems right now. First, the installer module isn't even complete. Let's look again when that's sorted out. Secondly, we would be adding a module which would be solely in support of ensurepip, and wouldn't be installed on the user's system. That's a bit of a weird thing to do - why not either bring installer into the stdlib (immediate response - it needs to mature a lot before that's worth considering) or use the tools that are already there, i.e., pip (the response being "pip's too slow", to which my counter is "why not improve pip and help everyone?")

To directly answer your question:

So if they agree that the path taken by virtualenv is preferable, can make a PR against CPython (and PyPy respectively).

I'm happy to see a PR, as long as it comes with benchmarks that demonstrate the improvement, and the code isn't more of a maintenance burden than the improvement warrants. One particular question I'd want to see addressed, is who would update the code to deal with possible future changes in the wheel format? Would that task fall on the core devs?

@gaborbernat
Copy link
Contributor

@pfmoore I'll defer for the foreseable future doing all that research you're requsting. From POV of this project @ofek our only choice is to go with venv, be that as fast or slow as it is. Mainly because we don't want this project to take dependencies for boostraping reasons.

@gaborbernat
Copy link
Contributor

@FFY00 shall we close this in favor of #112 ?

@FFY00 FFY00 closed this Oct 4, 2020
@FFY00 FFY00 mentioned this pull request Oct 7, 2020
@gaborbernat gaborbernat deleted the fix-isolation-pyvenv branch October 14, 2020 09:36
@gaborbernat gaborbernat restored the fix-isolation-pyvenv branch October 14, 2020 09:36
@gaborbernat gaborbernat deleted the fix-isolation-pyvenv branch October 14, 2020 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants