Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installing on alpine is slooow, if we can't have pre-compiled, could we have parallel builds instead? #261

Closed
samuelcolvin opened this issue May 26, 2019 · 9 comments

Comments

@samuelcolvin
Copy link

  1. What is your operating system and version?

Ubuntu and alpine

  1. What is your Python version?

any, 3.7

  1. What version of pip do you have?

any, latest, 19.x

  1. Could you describe your issue in as much detail as possible?

ref:

I spend a lot of my life waiting for packages to build when building images based on alpine, it's not download that's the problem but build time.

If you have an image that includes uvloop, asyncpg, cryptography, pycares, aiohttp etc. you can easily be waiting 10 minutes for that build stage alone.

Could pip do those builds concurrently across multiple processes to save time?

On a modern machine this could speed up installs by 10x.

For me this would be an elegant workaround until then time when (if) musl binaries are available.

Is there anything fundamentally stopping this from happening (eg. race conditions on install directory etc.)?

@samuelcolvin
Copy link
Author

perhaps phase one would be a new package which wrapped around pip and called packages in parallel?

It would "simply" need to:

  • work out a dependency tree, then run install in multiple stages making for everything required for stage 2 was installed by stage 1 etc.
  • the actual installs could then be as simple as subprocess.run(['pip', 'install', '...'}) or something marginally less ugly.

Apart from being ugly, is there anything that would block this from working?

@KOLANICH
Copy link

You should just prepare a CI pipeline building a docker image and then use that image in other builds.

@KOLANICH
Copy link

the actual installs could then be as simple as subprocess.run(['pip', 'install', '...'})

arguments surely should not be passed via a command line.

@KOLANICH
Copy link

For parallel downloading aria2c can be used.

@samuelcolvin
Copy link
Author

You should just prepare a CI pipeline building a docker image and then use that image in other builds.

that's a very poor solution, it involve updating the image on every release of every package, some of that can be done automatically with pyup etc/ but it's still another repo, CI setup, pyup, image release - a lot more faff.

arguments surely should not be passed via a command line.

well, it depends how good pips python api is, but this was just to make a case, hence why I caveated it with "or something marginally less ugly".

For parallel downloading aria2c can be used.

As I explained download is not the bottleneck, it takes <10 seconds while build takes minutes

@KOLANICH
Copy link

KOLANICH commented May 26, 2019

that's a very poor solution, it involve updating the image on every release of every package

Yes. The good thing it doesn't need to happen too often.

I usually do the following:
1 determine which packages can be slightly outdated and which always must be of the latest versions
2 the ones can be outdated are embedded into docker images rebuilt by cron once in 2 weeks
3 for the ones which must be up to date it is decided if I am satisfied with the latest prebuilt version I can automatically obtain or if I should build them from git. If I am satisfied with prebuilt ones, I use them.
4 for the ones that must be always up to date it is determined how long do the build and installation take. If it is fast, I do nothing. If it is slow, I move updating them into a separate stage of CI pipeline, set PYTHONUSERBASE the dir that can be cached and install them with --user flag and configure CI to cache $PYTHONUSERBASE. So if that stage succeeds, the dependencies are not reinstalled, they are already there. Then the next stage uses the deps installed by the previous stage. The problem here is that the dependencies specified using PEP 508 specifiers with full URIs are always fetched and installed, instead of checking the version of the remote dependency and reinstall it only if needed. Fortunately I don't have many of such dependencies.
5 After the CI pipeline succeds, it leaves a prebuilt binary wheel artifact, which can be installed without compilation by other pipelines.

@pradyunsg
Copy link
Member

pradyunsg commented May 26, 2019

if we can't have pre-compiled, could we have parallel builds instead?

You can have both! It's just that someone has to do the work of figuring out how to implement them and then actually implementing them - like any other functionality in the volunteer-run PyPA projects.

Both of those tasks are non-trivial tasks, which is part of why they haven't been "solved" yet. Due to how pip / manylinux are positioned in the ecosystem, the solution to these problems has to be general and not affect existing workflows.

If you want to champion this effort, you are welcome to! :D

@samuelcolvin
Copy link
Author

@pradyunsg, great.

My questions were:

  1. is this something others would find useful?
  2. are there likely to be any major blocks to that I haven't thought about?

@KOLANICH, thanks for your response - I think the length of your description of a work around demonstrates how much simpler pip install -r requirements.txt would be. 😉

@pradyunsg
Copy link
Member

pradyunsg commented May 26, 2019

is this something others would find useful?

Definitely.

are there likely to be any major blocks to that I haven't thought about?

I don't know. I've not looked into this lately but I'm pretty sure you can find some thoughts on the pip issue.


Since this issue is scoped to just pip for now, I'm moving this conversation to pypa/pip#825.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants