-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelizing the install process + PoC! #8187
Comments
Benchmark test details. Take the download (D/L) column time/speed with a grain of salt, download speed is not stable enough between runs, but obviously parallel download is much more efficient. Requirements variant B is actually somewhat of an outlier, it contains the Notice that in requirements variants A/B/C all the package versions are fixed. |
Thank you for sharing the benchmark, the design as well as the proof of concept. I am glad to see the positive benchmark for larger (I supposed by estimating the size of others based on that of B, I'd love to have them to test in the future though) collections, compared to the ones tested under GH-3981. I made a quick run and found that the PoC still have the issue with
I am unaware if this is because you and your team has not worked on that or there's still a technical obstacle to be solved. If the latter is the case, I'd love to know what you found in extra to the quote above. I've also submitted a plan for solving GH-825 as part of GSoC, however it's purely a plan. Theoretically it should work, but I can't be sure that it definitely will in practice. I'd be really grateful if you take a look at it and leave some comments. Note that I'll only know if it's approved by Given the situation in GH-8169, I wonder how |
Thanks @McSinyx for the references! To answer a few of your concerns:
Requirement files added here
On a different note, if you are willing to help out with this code and make it PR worthy, please don't be a stranger, feel free to push to the forked repo 😄 |
This is awesome. |
Also note that pip is in the middle of the process of integrating a new resolver. So make sure you're able to also introduce the parallelism to the new resolver, otherwise all the good work would be lost once the transition is finished. |
@uranusjr thanks for the heads up! EDIT: |
@bmartinn Your observations are correct. The new resolver can be enabled (in 20.1 or later) by passing |
@bmartinn, thank you for the requirement files as well as explanations 2 and 3. As of the termination, yesterday I notice that after canceling in the middle of parallel download, my cursor disappeared, so there might be something wrong. However, to make the UI user-friendly, I believe that it might be a good idea to only having the main thead to take care of all of the displaying, as in my proposal, which is now accepted (yay, and thank PyPA maintainers and others for commenting during the drafting process). To @uranusjr as well, I'm not sure how the proposal was reviewed, but if you haven't taken a look at it, could you please scheme through the implementation idea (section 3.3.3) if that fits your vision of coordinating into the new resolver, i.e. does that cause any trouble for the on going implementation of the resolver? BTW I feel the need of saying that I've just got (physically) back to school this week and is still trying to get used to the new and denser schedule, so please tolerate me if in the next one or two days I am not able to carefully examine things to meaningfully participate in the discussion. |
@McSinyx, following your suggestion, I pushed a fix for the Logger handler issue (basically forcing at least one instance to be created on the main thread), let me know if it solved the cursor problem. @uranusjr I'm testing the |
@bmartinn pip maintains a wheel cache and won’t redownload if you had downloaded a wheel previously. You probably need to supply |
Apologies @uranusjr I was seeing the |
Hi @uranusjr This design should hold with the continuous development of the new Resolver/Resolution classes. With your approval I'd like to push it as a PR :) BTW: EDIT: |
I think it would be a reasonable addition, but some considerations are needed. |
Sure, we can quite quickly protect the import and disable the feature automatically.
You mean regrading the ResolverLib ? |
I meant regarding within ResolveLib, the parallelisation needs to reduce duplicated resolving logic implementations. In your current implementation, for example, there are two completely separate pieces of code for updating criteria in single-thread and multi-threaded modes, which would be hard to maintain if we want to revise the resolution logic. |
Yes @uranusjr you are correct, I put it there just so the execution path is easier to see (i.e. when viewing the diff), I'll extract it out to a function and make sure we call it, obviously easier to maintain :) Still with that in mind, should it be pushed to pip or ResolverLib repository, it seems you already have a PR updating to the new ResolverLib version, and the parallelization itself is fully contained in the new ResolverLib, so I guess that makes most sense (from maintenance perspective as well), to PR it there and then just have it synced back to pip?! EDIT: |
Hi @bmartinn, it took me quite a while to get back in shape 😄 IIUC, your branch has three separate enhancements:
I don't have enough understanding to comment on 3 (I like the idea, just haven't carefully read the code), but for the other two: Parallel installIt's true that the current situation is much better now than in the past where download/resolve/build/install were coupled. I believe there shouldn't be any problem with throwing packages to be installed to a pool since currently we only display Installing collected packages and Successfully installed. I'd also be nice if we can have parallel build, but I think it's out of scope of this discussion. 3 features in a single branch is already really hard for me to follows, and I believe pip's maintainers might have the same issue. Parallel downloadI have two main concerns over the design:
|
TL;DR Hi @McSinyx
The new Resolver is more sophisticated and tries to resolve collisions by iterating over the least dependent packages first, then slowly "climb" the decencies tree. The actual implantation comes from this package (resolvelib)[https://github.com/sarugaku/resolvelib]. UI design. The latest progress bar implementation (again thank you for noticing the hide cursor bug), A few thoughts:
|
Regarding PRs, please make multiple ones for each feature, as per About the progress bar, as an user, frankly I feel that the current mock-up is quite confusing. Even with changing number of download at the same time, we can just redraw the entire screen, but (1) it's complicated and (2) it'd be a mess on half-broken terminals which in my experience are quite common. That being said, it that's the way to go, I'd try my best help making it happen. However, I still have concern over duplicate downloads of future ones, the kind of concern of I don't really know if it's avoidable. We might need to refactor the following to do convenient hashing of URL
and I'm not a fan of I'd love to hear elaboration on the parallel build, but this might not be the right place, since there're so many things to be discussed at the same time here already. AFAIK there's no ticket dedicate for it, so feel free to create one (or just tell me to do it). |
Thanks @McSinyx sounds like a plan to me :)
Where did you see this pseudo code? |
For the full context, what I was trying to say was
e.g. if A is needed by both B and C and we don't control the collection of packages to be downloaded, then if, say, B initiate the download first, and the progress is not finished when C ask for A too, then the thread resolving C would need to wait for that future A to finish. There could be another way that I'm not aware of but if that's not the case then we need to do void loops to wait for it and personally I don't like it. |
Heads up: This is on my "take a look at" list, but I'll have my hands full right now at least for the next few days, with pip 20.1.1. |
I'm super excited that @bmartinn has done so much useful work on this, and filed a PR for this as well! It's much appreciated! I'm 100% onboard for doing this. However, to use Brett Cannon's excellent analogy, you're gifting a puppy here and I'm wondering how accepting this will affect my life once the puppy is all grown up. I really think we should carefully consider how this affects various other parts of pip, before moving too far with this:
I do have some thoughts on the specific implementation here (w.r.t. output etc) but I think we should address the more meaty design/technical concerns around the effects of such a change (like the ones above and possibly more!), prior to getting to the implementation/CLI details. FWIW, 4 is covered by pip's existing One thing I'll note here, is that there's a minor overlap with @McSinyx's GSoC project for this summer here (#825, parallel downloads), in that they both need parallelization. However, beyond the discussion about how pip does parallelization (#8320), I don't think there's any overlap between these two tasks, since the GSoC project is more about what happens during dependency resolution and this specific issue is more about what happens after the dependency resolution process is completed. |
@chrahunt kudos on the quick toy implementation! very nice!
WDYT? |
I think this view is fine - what matters isn't the definition but the fact that we're trying to avoid bad outcomes (for users, for maintenance burden, for debugging). The situation I gave above was one in which a race condition can occur under a multi-process installation, not a race condition itself. There are places in code that we make assumptions that could be rendered false and lead to failure in the face of multi-process installation. See #8562 (comment) for an explicit example: now if this code overlaps with this code in separate processes then the latter will raise an exception¹. We probably make similar assumptions in the rest of the wheel installation code, the internal utilities it uses, 3rd party libraries we use, and standard library modules we use. Using code in multiprocessing forces contributors to consider those situations (and more) on an ongoing basis (e.g. development, code review). That can be worth it, but we should be intentional about deciding it. ¹ In the proposed implementation an exception will be swallowed and we will try to install sequentially. IMO that approach leads to a much worse experience, because terminal failure of any one wheel installation will leave all but that one wheel installed. That environment would potentially be much more broken than today. |
Point taken. If I now understand correctly, you're flagging a general problem with any form of parallel installation, not just a possible issue with this PR. Which would imply that any proposal to do parallel installs needs to start with (some form of) design document, discussing the trade-offs, integrity guarantees that we want to make, and explicit "undefined behaviour" situations where we don't offer guarantees and consider the use case unsupported. That seems fair - although in practice, I suspect it will result in parallel installs being postponed for a relatively long time - in my experience open source workflows are not good at handling the sort of broad design questions we're talking about here. I'd love to be proved wrong, though! |
@chrahunt I might be missing something here, but with all examples, you will end up with a broken (i.e. overwritten) setup. This is bad, but unavoidable. The only thing you need is to make sure you convey it to the user. If we are, then again the implementations you pointed are all atomic issues not a race condition1, and yes they should probably be dealt with, I think we can assume Locks are supported. Anyhow I think that at least to begin with, parallel installation / download / resolving, should be turned off by default. It only makes sense to have this feature when pip is used to build an entire environment, and then the speed benefit is huge, and I hope it is safe to assume you do not have packages overwriting one another. 1 Atomic operation assumes correctness outside of the atomic call as long as only a single atomic call is executed on the "system" at a single moment. Race condition is defined when there is no guarantee who will get to point A first, but that does not assume incorrectness of code, if it breaks correctness than this falls under the need for an atomic-section. In your example even introducing an atomic-section will not solve the race-condition, there is no guarantee which package will be installed first, and that is okay, as long as we do not end up with two packages installed without knowing they overwrote one another. |
@pfmoore I'm with you on this one and IMHO this is exactly the opposite of the open-source spirit. If you try to "play it safe" and try to grantee all scenarios, you are limiting your ability to introduce innovation. Pip dropping python 2.7 and soon python 3.5 support is a good example, we have to keep moving forward. I think that trying to protected against extreme scenarios which are broken to begin with, is missing the point of speeding up the process. And since I think that these days accelerating the BTW: |
This, I disagree with. The problem here is a combination of compatibility and support, it's nothing to do with an "open-source spirit". Pip is used so widely, and in so many different workflows and environments, that we have to be extremely careful to not break things - that's true just as much for an open source project as for a paid one. IMO, the reason open source has become so popular and common, is precisely because volunteers and open source maintainers care about their projects and users, and take issues like backward compatibility seriously (typically more seriously than many commercial vendors). Open source gives us more options and flexibility to innovate, but not at the expense of our users. |
We disagree here. The point of #8119 is to introduce a deprecation notice, which would turn into a hard failure. So I think a broken setup will be avoidable in that case.
I don't think this addresses the maintenance overhead I mentioned above. Once we support multi-processing here then it's something we need to think about during code review and development, regardless of whether it is the default. On a side note, having code that is not often run increases the chances that we will break it inadvertently.
And that is totally OK! Everything is a trade-off. Having a slightly slower installs that, IMO, are easier to reason about, review code for, contribute code for, and support users using may be more aligned to our desired outcomes.
I agree 100%, this is an impactful thing to look into.
Noted! |
@pfmoore I think I failed to explain myself, I'm not saying we should break stuff or stop backwards compatibility, this will be horrible. My point is a proper design document that will cover all angles is challenging and very hard to achieve in the asynchronous workflows of open-source community. This is why so many times these things evolve organically and iteratively, where each step improves the previous one. This is basically my suggestion in regards to parallelization in general. This also connects with @chrahunt point, it might take a few iterations until we achieve stability, and that is exactly why I'm against pushing parallelization as default behavior, I really think this is too complicate to end with a single PR.
@chrahunt, my bad, I was not aware this is a warning before instillation, and not during/after. BTW I would imaging that just listing files before unzipping them will take a toll on speed, but of course you can always parallelize the check it self... Anyhow you are most definitely correct, the only way to implement warn before unzip is to test everything before installing. And again I do not understand if #8119 produces warning before the specific wheel in unpacked or should stop the installation before any package is unpacked. But I guess that is a discussion that should be in the #8119 thread.
It does not address the maintenance issue, and this is always challenging with multiprocessing workloads. That said, this is exactly why I think this feature should only be turned on with a specific flag when installing, since it will take time until it is stable. |
Looks like there's been some spirited meta-esque discussion here, containing words and phrases like "innovation" / "Everything is a trade-off" / "open source spirit" / "flexibility to innovate" etc and on interpretations of meaning of phrases. I don't want to add to that spirited discussion (mostly because I don't have the mental bandwidth for it right now). I have faith that everyone here will take a moment to step away if they need to, and avoid escalating things.
Well... the old one did that too! However, that logic is difficult to understand (for me at least), without spending a lot of time frowning at the screen and going "huh?" multiple times over the course of 20 minutes. (yes, I've spent a substantial amount of time on the old resolver. yes, I am eagerly waiting to delete all of that code.)
Those utilities are not going to "solve" race conditions and logical/design issues. They do not to avoid the need to consider the implications of parallel/async programming. They avoid having the entire codebase riddled with calls to multiprocessing/multithreading etc, and provide a clean way to have fallback-logic when a platform doesn't have support for some parallelization routine. [note 1] [note 1]: I went too far with an analogy here so I removed all of that text, but I wanted to note that the ending was: A nail gun's inactive-unless-against-a-surface logic isn't going to help if you put it on your foot and press the trigger. It's a nice package that removes the need for directly yielding a hammer, but that doesn't inherently make it always-safe. :) |
I think everything I've said in #8187 (comment) is still True. Allow me to use more words what I was trying to convey by referring to Brett's post: This would be wonderful and I would love if we can make this speedup happen. But, we ought to consider both the benefits and drawbacks of it. In an ideal case, we'd have no drawbacks when comparing parallel installs to serial installs. Sadly, we're not in an ideal world (gestures at everything) so I think we ought to figure out how parallel installs would differ from serial installs. Knowing this will help us when we're implementing this, reviewing the implementation, planning the rollout and communicating to users during the rollout. Yes, all this is a lot of work (and perhaps "inertia" even) but I don't want us to release something that'll "not work" for a substantial chunk of the Python community. If someone disagrees that we should not be careful about avoiding disruption, please say so and elaborate on why. OTOH, I really don't care how we do this -- all I know is that I'm not volunteering to do this work myself. Whether this takes the form of prototyping where we discover things as we go, or a design document that we discuss before starting implementation, or some other workflow -- I really don't care, as long as I can provide feedback before merging the code into master (since master is where we make releases to end users). I'll note that poetry recently implemented parallel installs and there's ideas that can be borrowed if we want to do so. |
Thanks @pradyunsg ! That sounds like a plan to me. |
@bmartinn Awesome! Thanks for being willing to champion this! ^>^ If you wanna collaboratively work on a design document, I'd suggest just pick a platform that lets you put text in a place and share it -- a GitHub Gist / HackMD / Etherpad / Google Docs / DropBox Paper etc. :) The main difference between these platforms is the level of access control for various folks -- I think Google Docs with "view" access would likely work best, since folks can (publicly) suggest edits and make comments on specific parts of the content, while the author/owner is the only person who can actually make those changes. YMMV tho, and as long as it's possible to provide feedback on the content "near" it (i.e. not in this issue), I'm on board. :) |
I'm passing by just to mention that after GH-8562, caching unpacked wheel is no longer needed. |
What is the meaning of GH and expansion for other labels? |
It's the same as |
After reading through the discussion here, in #825, #7819, #8320, #8161, #8169, #8448 it's clear that this is a problem that some people want to solve. It seems to me that the amount of time saved multiplied over the number of people who wait for Pip installs makes this a fantastic way that I can find to give people across the world back the one thing we don't ever get more of, time. |
I think that the core maintainers¹ are painfully aware of all of the pitfalls that may make this problem a lot harder than people expect. This may come across as lack of enthusiasm, but it really isn't. However, we do tend to get burned out by needing to be the ones reining all of the (understandable) enthusiasm of people wanting to contribute to this feature. Nobody likes being the party pooper all the time 🙂
That's quite possibly true - I haven't reviewed the various threads and issues you link to so I'll take your word on this. But that's likely to be more because the original contributors hit the problem of not having good answers to all of the potential questions, and gave up. That's totally understandable, and one of the biggest problems with any project on the scale of something like pip - larger pieces of work are difficult to complete, because they need more time than the typical "interested volunteer" has available.
Well, I don't want to suggest that there's any magic answer here - core maintainer time is extremely limited and you may find that it's hard to get anyone's attention. But if you are interested in giving it a try, here are some suggestions:
If that doesn't burn you out, then we'll have made a good step forward, far better than someone just diving it with another proof of concept. As I said, though, I think you'll find this is a much harder problem than you imagine. Hopefully, trying to tackle it won't leave you too disillusioned - there are lots of much more tractable problems that could use some work, and we would definitely be glad to see some more enthusiastic contributors pitching in! Thanks for asking the question, and I'm sorry if this response feels more negative than you hoped... I really would like to see something like this happen, but it won't be a quick project 🙂 ¹ I'm speaking for myself mainly, but I imagine the other core devs have similar feelings. |
@ctoth I'm hoping we are close to merging here: PR #8215 , this is the lowest hanging fruit in terms of parallelization, and it offer great inital improvement over the standard serial installation process. |
No, sorry. I'm extremely limited on time for pip at the moment, and what time I have is mostly restricted to drive-by comments on issues (like my comment here 🙂) or if I'm lucky, thinking about stuff that I've been meaning to work on myself. |
@pfmoore no worries, I can totally understand 🙂 |
@bmartinn sorry for the soft poke but has there been any progress with this? Sad to see the nearly complete PR still open, my docker builds would love the boost :) |
@tgross35 no worries, I really appreciate the poke, this is how we keep PRs alive :) |
Next steps are probably to do the research I noted in this comment above. |
Is there anyone still working on this? It's been nearly a year since any activity. The subject of this PR sounds like a very intuitive way to speed up pip processes, and the comments seem to indicate that it is nearly finished. It'd be a shame to let it be wasted. |
See the comment immediately before yours, which provides context on what the next steps here are. |
First of all, kudos guys on the v20+ release 👍 , wheel caching is much better and the addition of cached built wheels is an awesome time saver.
TL;DR
Reference implementation for
pip install --parallel
, which will download inspect and install packages in parallel, highly accelerating the installation environment of an entire project (or actually anyrequirements.txt
file).A few speedup numbers (Ubuntu w/ python 3.6):
Details:
We heavily rely on pip to set up environments on our ever changing production systems.
Specifically either setting a new virtual environment or inside a docker, the full implementation of our ML/DL agent is open-source and can be found here: https://github.com/allegroai/trains
With the introduction of pip 20.1 we decided to try and parallelize the pip install process in order to save precious spin up time.
ThreadPool
(RequirementSet
was added a global lock). The only caveat is the download progress bar (more on that down below).The reason for using Threads is the fact that most of the speedup here is Network and I/O which are parallelized well on python Threads. The second reason is the way
RequirementSet
is constancy growing with every package discovered, and it's requirements. Sharing the set among all threadsis quite trivial, as opposed to sharing them across processes.
Pool
, as installing the wheels will not execute any script, and can be parallelized without risk.The reason for choosing Processes over Threads is the limiting factor here is actually CPU cycles in the installation process. Also the Wheel unpacking and copying is totally independent from one another and the rest of the process, and lastly order has no meaning in this stage as the requirements order is only important when building packages.
Unpacking the wheels was optimized to reuse the unpacked folders, this happens between the inspection part (
resolver
) on a cached wheel and the installation of the same wheel (essentially what happened was the wheel was unpacked twice)Solving the parallel progress bar: The progress bar has now a global lock, the first progress-bar to acquire it, will be the one outputting the progress to the tty. Once the progress-bar is done, another instance can acquire the lock and will continue to report from its current progress stage.
This looks something like:
The full reference code can be found here
Usage example:
The text was updated successfully, but these errors were encountered: