-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(experimental) Add partial-wheel-download functionality, to reduce time spent downloading wheels that are eventually discarded and allow parallel downloads #8448
Conversation
e12a7c4
to
feca106
Compare
Let me know if people aren't ready for this yet -- I'm not trying to rush anything. I happened to find a neat way to link up the current state of the shallow downloading work to the in-progress unstable resolver in a usable way so I decided to create a PR. |
7debab9
to
5759b82
Compare
1dd0e15
to
1b9dfa9
Compare
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
I'm stopping by to say I'm finalizing the magical lazy file-like object and it's been a lot of fun. |
1b9dfa9
to
addd3ca
Compare
c28b3a2
to
1891d79
Compare
pip download
1891d79
to
6ce0a30
Compare
6ce0a30
to
e9c6ec3
Compare
make types pass add --shallow-wheels cli arg add news rename news make the metadata test pass on windows use --shallow-wheels unconditionally and remove the cli arg download all wheels at the end of the run add a hack to avoid signal() erroring in a background thread avoid using shallow wheels for non-remote file paths add --unstable-feature=shallow_wheels!
e9c6ec3
to
9df5c8f
Compare
Let's not touch the issue title any more. :) (hello to the GitHub folks that I've sent here) |
Need rebase fyi |
Per #8467 (comment), the patch to implement this feature can be reduced down quite a
I'm happy either way, but if you're going to take care of this, keep in mind that the parallelization of download still needs some discussion involving the UI/UX (mainly the progress bar), i.e. please don't do it now. Additionally, as mentioned earlier, I'm somewhat employed to do this, so don't feel uncomfortable if you want to leave the task to me. I'm sure people evolving in pip's development and usage are already really appreciate you from sharing the idea to proof of concepts and full-fledged PR on this enhancement. |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
@cosmicexplorer Let's close this, as per #7819 (comment)? |
Yes! |
TODO
httpfile
,zipfile
, andwheel
This change improves the time to resolve
tensorflow==1.14.0
by 25% when--unstable-feature=shallow_wheels
is specified, bringing the v2 resolver's performance up to par with the v1 resolver.Problem
See discussion in #7819 (comment) and #8442 (comment).
We would like to be able to get dependency information from very large requirements without having to download the very large wheel file first, as that can take several minutes, and the size of the dependency information is extremely small in comparison. If we can avoid blocking on downloading entire wheel files, we can perform all the time-sensitive resolution logic first, and download everything in parallel at the end. Avoiding the download of dependencies in the first place also opens up the possibility of allowing a "dry-run" resolve in the future, as described in #53.
#7819 introduces the concept of "shallowly downloading" a remote wheel file, in order to obtain the contends of its
METADATA
entry. This can be done by reading the central directory record at the end of the zip file, which contains a listing of the binary offsets of all files in the zip archive next to their filenames (see https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT).The new unstable resolver makes it very very easy to swap out parts of resolution logic, so we would like to try to "shallowly download" wheel files instead of downloading the entire file to read its metadata entries to make progress in the resolver.
Solution
pip._internal.network.shallow.{httpfile,zipfile,wheel}
which spread out the logic of shallowly downloading wheels into individually-testable modules.ShallowWheelDistribution
class and return it fromRequirementPreparer.prepare_linked_requirement()
when--unstable-feature=shallow_wheels
is specified on the command line.Result
This change improves the time to resolve
tensorflow==1.14.0
by 25% when--unstable-feature=shallow_wheels
is specified, bringing the v2 resolver's performance up to par with the v1 resolver.Note that
--unstable-feature=shallow_wheels
does not appear to work when used with the v1 resolver. I haven't investigated why, since the v1 resolver is deprecated.