Add an explicit cache on Python entry points #614

cottsay · 2024-02-07T21:56:08Z

Whenever we enumerate Python entry points to load colcon extension points, we're re-parsing metadata for every Python package found on the system. Worse yet, accessing attributes on importlib.metadata.Distribution typically results in re-reading the metadata each time, so we're hitting the disk pretty hard.

We don't generally expect the entry points available to change, so we should cache that information once and parse each package's metadata a single time.

Closes #600

codecov · 2024-02-07T22:01:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.54%. Comparing base (6cf24ea) to head (69e20f9).

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #614      +/-   ##
==========================================
+ Coverage   83.34%   83.54%   +0.20%     
==========================================
  Files          66       66              
  Lines        3794     3816      +22     
  Branches      739      745       +6     
==========================================
+ Hits         3162     3188      +26     
+ Misses        557      554       -3     
+ Partials       75       74       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cottsay · 2024-02-08T00:01:04Z

I think we can do better than this, actually.

I didn't know this, but the importlib.metadata API for distributions and entry_points does absolutely no caching at all, to the point where even accessing properties on distribution objects typically results in reading the metadata from disk every time. I did some fooling around and got my previous 0.4s to under 0.3s by specifically caching the underlying metadata to avoid the disk reads. Some strategic structuring of that underlying data to avoid iterating over it might yield even more savings.

I didn't realize that the startup performance had regressed so badly. SSDs and OS caching hide how much IO is happening here. I can imagine that cold invocations on spinning disks are brutal...

cottsay · 2024-02-16T20:29:28Z

Alright, I dropped the lru_cache stuff in favor of an explicit cache. This change brought baseline loading from 0.8s to 0.3s on my machine. Pyflame looks a lot better now.

nuclearsandwich

Nice!

colcon_core/extension_point.py

Whenever we enumerate Python entry points to load colcon extension points, we're re-parsing metadata for every Python package found on the system. Worse yet, accessing attributes on importlib.metadata.Distribution typically results in re-reading the metadata each time, so we're hitting the disk pretty hard. We don't generally expect the entry points available to change, so we should cache that information once and parse each package's metadata a single time. This change jumps through a lot of hoops to specifically use the `importlib.metadata.entry_points()` function wherever possible because it has an optimization that allows us to avoid reading each package's metadata while still properly handling package shadowing between paths. This has a measurable impact on extension point loading performance.

cottsay added the enhancement New feature or request label Feb 7, 2024

cottsay self-assigned this Feb 7, 2024

cottsay force-pushed the cottsay/cache-extension-points branch from 568ae80 to 1b85894 Compare February 7, 2024 21:59

cottsay changed the title ~~Use functools.lru_cache to cache extension point discovery~~ Add an explicit cache on Python entry points Feb 16, 2024

cottsay marked this pull request as ready for review February 16, 2024 20:29

cottsay changed the base branch from master to cottsay/extension-point-tests February 22, 2024 17:12

cottsay force-pushed the cottsay/cache-extension-points branch from 893cc79 to 5ed75ac Compare February 22, 2024 17:12

cottsay marked this pull request as draft February 22, 2024 17:12

cottsay force-pushed the cottsay/cache-extension-points branch 4 times, most recently from f834f9a to 86c11bb Compare February 22, 2024 21:58

cottsay marked this pull request as ready for review February 22, 2024 22:04

nuclearsandwich approved these changes Mar 2, 2024

View reviewed changes

colcon_core/extension_point.py Show resolved Hide resolved

delete-merged-branch bot deleted the branch master March 11, 2024 23:00

cottsay changed the base branch from cottsay/extension-point-tests to master March 11, 2024 23:01

cottsay force-pushed the cottsay/cache-extension-points branch from 86c11bb to 69e20f9 Compare March 11, 2024 23:01

cottsay merged commit 2208a3b into master Mar 11, 2024
42 checks passed

delete-merged-branch bot deleted the cottsay/cache-extension-points branch March 11, 2024 23:18

cottsay added this to the 0.15.3 milestone Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an explicit cache on Python entry points #614

Add an explicit cache on Python entry points #614

cottsay commented Feb 7, 2024 •

edited

Loading

codecov bot commented Feb 7, 2024 •

edited

Loading

cottsay commented Feb 8, 2024

cottsay commented Feb 16, 2024

nuclearsandwich left a comment

Add an explicit cache on Python entry points #614

Add an explicit cache on Python entry points #614

Conversation

cottsay commented Feb 7, 2024 • edited Loading

codecov bot commented Feb 7, 2024 • edited Loading

Codecov Report

cottsay commented Feb 8, 2024

cottsay commented Feb 16, 2024

nuclearsandwich left a comment

Choose a reason for hiding this comment

cottsay commented Feb 7, 2024 •

edited

Loading

codecov bot commented Feb 7, 2024 •

edited

Loading