Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install packages in parallel #12742

Open
1 task done
notatallshaw opened this issue Jun 2, 2024 · 5 comments
Open
1 task done

Install packages in parallel #12742

notatallshaw opened this issue Jun 2, 2024 · 5 comments
Labels
type: feature request Request for a new feature type: performance Commands take too long to run

Comments

@notatallshaw
Copy link
Member

What's the problem this feature will solve?

This is to improve the performance of pip.

For example looking at #12613 (comment) of a large install, even with resolving, downloading, buildising sdists, installing takes over 8% of the time. As resolving becomes faster, downloads are run in parallel, and hopefully there are more wheels instead of sdists then installing will become a larger part of the total time.

Describe the solution you'd like

After the resolve, downloads, and sdist build has completed, the installs could run in parallel.

Alternative Solutions

Keep as is.

Additional context

This would require a PR from someone obviously, I think there would need to make sure there are a complement of tests about installing packages in parallel, and different packages (e.g. make sure multiple editables run at the same time, editables and regular installs, etc.).

uv has already implemented this succesfully, following their issue tracker this has been the last problematic part of making things parallel/concurrent.

Code of Conduct

@notatallshaw notatallshaw added S: needs triage Issues/PRs that need to be triaged type: feature request Request for a new feature labels Jun 2, 2024
@ichard26 ichard26 added type: performance Commands take too long to run and removed S: needs triage Issues/PRs that need to be triaged labels Jun 2, 2024
@ichard26
Copy link
Member

ichard26 commented Jun 2, 2024

See also #8187 (comment).

@notatallshaw
Copy link
Member Author

See also #8187 (comment).

Thanks, hadn't seen that before, I'll have a good read through and see if this is a straight up duplicate, and if anything can be done to take the existing work to be landed in pip.

@pfmoore
Copy link
Member

pfmoore commented Jun 2, 2024

As the author of the linked comment, I'll add that the key new development is that uv has implemented parallel installs. It would be interesting to know how they designed things. It's quite possible that pip could learn some useful lessons.

I've not looked at how uv implements this at all, so the following is pure speculation, but if I had to guess, I'd imagine they have the following things in their favour:

  1. They may well have designed from the start for parallel tasks. One concern I have for pip is getting reporting right, for instance, because we have1 some stateful code that handles getting indentation correct, that might be broken by multiple threads.
  2. Rust has better thread safety than Python, so there's likely a class of issues that uv simply can't encounter (at least, not by accident).
  3. To be blunt, they may just not have worried about pathological cases. For example, installing two wheels in parallel, which both contain the same filename but with different content, is a potential race condition (writing the file itself and RECORD). But it's unlikely in practice, so maybe uv ignored the possibility. Pip has a larger user base, and a longer history of dealing with weird errors, so we may well simply be (for better or worse) more paranoid over things like this.

Footnotes

  1. Or at least we used to, I haven't looked at that code since we started using rich...

@denx20
Copy link

denx20 commented Jul 14, 2024

Just curious, are there any updates on this/any active work being done?

@morotti
Copy link
Contributor

morotti commented Jul 17, 2024

I made a PR with a proof of concept #12816

The parallel installation is trivial to do
(except if you want to handle the case of 2 packages trying to overwrite the same file, outside of Linux)

The gains are very little because of the global interpreter lock. Unless you're installing on a very slow file system like a $HOME network drive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request Request for a new feature type: performance Commands take too long to run
Projects
None yet
Development

No branches or pull requests

5 participants