optimized implementation of pathlib #194
Replies: 10 comments 16 replies
-
Pull requests would be very much appreciated of course! I just hope we can improve speed here without breaking compatibility. I recall looking into caching stat() results but that was really tricky because sometimes the caching would break an app (e.g. if it was watching a file waiting for changes). |
Beta Was this translation helpful? Give feedback.
-
Hm, it's a huge module to reimplement in C. I'm looking forward to your
perf results though.
|
Beta Was this translation helpful? Give feedback.
-
I created https://bugs.python.org/issue46148 to track improving pathlib performance |
Beta Was this translation helpful? Give feedback.
-
Instead of a series of PRs I recommend a single larger PR. You can use multiple commits to help reviewing. (FWIW I am not a pathlib expert so I hope you find another reviewer. Maybe Antoine?) |
Beta Was this translation helpful? Give feedback.
-
These noteworthy internal methods are called when path objects are constructed:
|
Beta Was this translation helpful? Give feedback.
-
The split functions in Also, the implementation should be fixed to handle device paths correctly on Windows. For example, a volume device path such as "\\.\C:" has no root, but |
Beta Was this translation helpful? Give feedback.
-
The
Both of these operations are necessary in some cases, e.g. for However, in some cases, step (2) is probably unnecessary. This includes the majority of And in some further cases, both are unnecessary. Importantly this includes If these steps were performed on-demand, rather than at the time of path construction, I think we could dramatically improve pathlib performance for some common cases. |
Beta Was this translation helpful? Give feedback.
-
Here's a PR that makes pathlib defer path normalization until it's strictly needed: python/cpython#101560 As I mentioned in my previous comment, path normalization is not needed in |
Beta Was this translation helpful? Give feedback.
-
Hullo, would any core devs be available to review these three smallish PRs that improve pathlib performance?
Once these land I'll resume work on deferring path normalization (see my previous comment.) |
Beta Was this translation helpful? Give feedback.
-
Right, here are the last two general pathlib optimization PRs from myself:
The first of these is a substantial change that makes path construction 2-4x faster by deferring normalization. It speeds up use cases like The second is a fairly subtle/technical change that improves normalization performance up to 15%, but makes the pathlib internals less clear. I'm not too sure about this one. Possible future avenues for improving performance:
|
Beta Was this translation helpful? Give feedback.
-
in pytest we ran into a number of surprises with the performance of multiple pathlib methods/helpers
currently the performance profiles of path-lib objects are highly surprising and i'd love to see something with a less surprising profile that's much closer to the performance we had with the legacy py.path stuff (which was by no means fast) or that gets close to the performance to the string operations
i'll add links to performance examples as i go and collect the issues we had in pytest as well as a few other projects
Beta Was this translation helpful? Give feedback.
All reactions