-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent pypi lookups with --update-all #101
Comments
I did a rough hack to get this to work and just jotting down some notes.
The whole thing took 1.92s. The only bad thing so far is that there's an awkward little delay on the terminal whilst all this downloading is happening. You think nothing's happening. Like it's stuck. The |
Perhaps I'm over-worrying about the nothing-happens-till-all-is-downloaded. I just tried another file an the WHOLE thing took just 2 seconds. That requirements file had 79 packages listed and it took a total of 2 seconds to do 79 HTTP requests plus all the post-processing. |
@mythmon @di What do you think about this? I haven't finished the work but it looks ^ promising. ~2 seconds to check 53 to 71 packages for updates. The core of it is this: def pre_download_packages(memory, specs, verbose=False):
futures = {}
with concurrent.futures.ThreadPoolExecutor() as executor:
for spec in specs:
package, _, _ = _explode_package_spec(spec)
req = Requirement(package)
futures[
executor.submit(get_package_data, req.name, verbose=verbose)
] = req.name
for future in concurrent.futures.as_completed(futures):
content = future.result()
memory[futures[future]] = content It basically populates a dict with the downloaded content so when it starts analyzing one package at a time, the download part of that can be skipped. By doing all the downloads first, it makes sure the atomicity and the predictability of the interactive prompt stay intact. |
I think the idea of prefetching the needed requests in interactive mode makes sense. I have very little experience with the new asyncio parts of Python, but the code in your latest comment seems fine to me. |
I certainly have experience with it but saying I get it is like saying I get Linux. The code I've got is not asyncio at all. Just good old regular threading. I made it so that if you're on Python 2.7 you get the backport from pypi for it. Untested. |
What I like with this is that it works in Py 2.7 and py 3 without any third-party libraries (except the backport for 2.7) and it's simple. It just does the download piece which is the only thing that can be significantly boosted because of the network IO. I tested the error handling by messing with the spelling of a line in a requirements file (e.g. A caveat is of course that the whole work is now basically at the mercy of the slowest download since we wait for ALL downloads to complete. Also, since it's threads there is a small chance that you saturate your network but since the individual network calls are tiny I'm not sure that's even a problem. |
* Concurrent pypi lookups with --update-all Fixes #101 * exception for python 3.4
If a requirements file has 10 packages, you have to do 10 pypi.org lookups all in serial. When you use the
--update-all --interactive
that delay between each line is annoying.The text was updated successfully, but these errors were encountered: