-
-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-101196: Make isdir/isfile/exists faster on Windows #101324
Conversation
fb3c6ac
to
a07f1e7
Compare
A builtin |
Co-authored-by: Eryk Sun <[email protected]>
A few cases that are handled by
|
I guess this means we don't have test coverage for them...? In any event, I agree we should probably try to cover these cases. I'll see what can be done.
My initial implementation from this morning did that, which I like because it doesn't repeat what's clearly battle-tested code, but it made things slower in the event of symlinks. I was almost ok with that given that that isn't the common case, but your solution seemed to make everything a little faster (ignoring these new corner cases). |
There shouldn't be an issue with speed for common cases as long as you only fall back on
|
Lib/test/test_ntpath.py
Outdated
@unittest.skipIf(sys.platform != 'win32', "drive letters are a windows concept") | ||
def test_isfile_driveletter(self): | ||
current_drive = os.path.splitdrive(os.path.abspath(__file__))[0] + "\\" | ||
self.assertFalse(os.path.isfile(current_drive)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this is false is because a relative drive path like "C:" resolves to the working directory on the drive. Did you want to test a volume device path like r'\\.\C:'
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. And this is exactly the thing that (erroneously) returns True
without checking the error code from GetFileInformationByHandleEx
.
@eryksun: Thanks for all of your help and pointers. I've implemented support for these corner cases by calling I'm a little stuck on writing unit tests for them. I was able to cover I'm not sure how to recreate the case you describe in For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. Any other POVs?
Looks good to me, but I'll add that there's a lot of code duplication in here; it might be worth it to refactor the code shortly after landing this PR, so the added code is more maintainable and readable. |
I'll take a first pass at reducing some of this duplication. |
My read of the duplication is that there's just enough difference that it's probably not worth it. The main change I'd like to see (but wasn't requiring) is to add an option to If all the additional functions were using the same OS call, it would make sense to merge them all. But they're subtly different each time, and since this is for perf, we don't want to introduce more switches. |
+1 (I suggest adding that in a follow-up PR; let's land this.)
True that. |
Michael, are you still working on refactoring the code? Steve and Erlend seem to be in agreement that it's ready to merge as is. I think it's ready. |
Just heard that it's good to go as it is. |
Sorry, @mdboom and @zooba, I could not cleanly backport this to |
Sorry @mdboom and @zooba, I had trouble checking out the |
This ought to be safe to backport from a public API perspective, but I'll let the bot figure out how easy it'll be. |
Sorry, @mdboom and @zooba, I could not cleanly backport this to |
Okay, it's not going to backport trivially :) If anyone's up for it, go ahead, otherwise I'll take a look when I get time. |
(It's relatively rare that we backport performance improvements, so if it ends up being complicated maybe we shouldn't) |
* main: (82 commits) pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723) pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736) no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731) pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730) pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679) pythongh-85984: Remove legacy Lib/pty.py code. (python#92365) pythongh-98831: Use opcode metadata for stack_effect() (python#101704) pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719) pythongh-101283: Fix use of unbound variable (pythonGH-101712) pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) pythongh-101277: Port more itertools static types to heap types (python#101304) pythongh-98831: Modernize CALL and family (python#101508) pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697) pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329) pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672) pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324) pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615) pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934) pythonGH-101578: Normalize the current exception (pythonGH-101607) pythongh-47937: Note that Popen attributes are read-only (python#93070) ...
This makes
isdir
,isfile
andexists
faster on Windows, by making fewer API calls than callingos.stat
and then looking at that result.On the following microbenchmarks:
Microbenchmarks
I get the following results:
So therefore, these operations are around 13%-28% faster, depending on whether the file exists, is a symlink etc.
NOTE: This does not improve the performance of
pathlib.Path.is_dir
and friends. The behavior there is slightly different in terms of error handling, but I think a similar approach could also be applied there. If this PR is acceptable, I plan to do that as follow-up work.