Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate candidate string versions only once in get_applicable_candidates #12664

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions news/12664.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Minor performance improvement of finding applicable package candidates by not repeatedly calculating their versions
29 changes: 14 additions & 15 deletions src/pip/_internal/index/package_finder.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,24 +452,23 @@ def get_applicable_candidates(
# Using None infers from the specifier instead.
allow_prereleases = self._allow_all_prereleases or None
specifier = self._specifier
versions = {
str(v)
for v in specifier.filter(
# We turn the version object into a str here because otherwise
# when we're debundled but setuptools isn't, Python will see
# packaging.version.Version and
# pkg_resources._vendor.packaging.version.Version as different
# types. This way we'll use a str as a common data interchange
# format. If we stop using the pkg_resources provided specifier
# and start using our own, we can drop the cast to str().
(str(c.version) for c in candidates),

# We turn the version object into a str here because otherwise
# when we're debundled but setuptools isn't, Python will see
# packaging.version.Version and
# pkg_resources._vendor.packaging.version.Version as different
# types. This way we'll use a str as a common data interchange
# format. If we stop using the pkg_resources provided specifier
# and start using our own, we can drop the cast to str().
candidates_and_versions = [(c, str(c.version)) for c in candidates]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how much memory overhead this may induce in a large install? I agree this block can likely be further optimised since it is basically filtering on one list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know how much memory overhead this may induce in a large install?

I ran memray and there was no noticable memory overhead, peak memory for a dry run install of apache-airflow[all]==2.9.2 on Python 3.12 was 354 MBs, memory usage was dominated by making a list of pages of all candidates (I'm going to make a seperate issue on that).

I agree this block can likely be further optimised since it is basically filtering on one list.

I tried making it simpler, but found that the behavior of pre-releases made it problematic. You can't filter against an individual version, because 1 pre-release will allow that pre-release, but one final version and a pre-release will not allow that pre-release unless allow_prereleases=True.

versions = set(
specifier.filter(
(v for _, v in candidates_and_versions),
prereleases=allow_prereleases,
)
}

# Again, converting version to str to deal with debundling.
applicable_candidates = [c for c in candidates if str(c.version) in versions]
)

applicable_candidates = [c for c, v in candidates_and_versions if v in versions]
filtered_applicable_candidates = filter_unallowed_hashes(
candidates=applicable_candidates,
hashes=self._hashes,
Expand Down
Loading