-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate directory enumeration performance #8396
Comments
@davkean points out there's a high perf directory enumeration logic here: |
@KirillOsenkov feel free to hit me up to talk about this. I specifically had MSBuild in mind when I wrote this and provided a .NET Framework build. |
@JeremyKuhne I told him the same thing internally :) |
As @ladipro pointed out internally we should be using it on at least some codepaths: #6771. And indeed I do see
@KirillOsenkov can you give some details on what you were doing when you hit that exception? What version of MSBuild? Using |
I think we could simplify our msbuild/src/Shared/FileMatcher.cs Lines 298 to 303 in 7cfb36c
Using MS.IO.Redist's |
I see a bunch of string creating (Substring for example) that could be avoided with wiring spans through this part of the code. And eventually perhaps this will allow some of the custom globbing to be removed |
I did add Microsoft.IO.Redist to my benchmark and it is indeed even faster than my handcrafted approach! Kudos to Jeremy! I did follow up with the original stack that I'd pasted here, and it's from 2021 😱 My apologies. Most of this issue is now invalid as we have transitioned to Microsoft.IO.Redist! Remaining issues:
I won't be offended if we close this issue outright or mark it as low priority ;) Apologies I should have checked the MSBuild version before filing the issue. |
I had figured that MSBuild could, in theory, translate it's |
I noticed we're using standard
Directory.EnumerateFiles()
to enumerate files for globs. It's not very efficient, and also runs the risks of throwing when it hits directories or files it can't access.Sample first-chance exception:
Also seeing the same for
C:\Config.Msi
when we accidentally enumerate the whole drive due to some property being empty and the glob ends up starting with a\
.I've had success with directly calling the Win32 API in parallel to reduce allocations, achieving up to 2x speed and 0.5x allocations:
https://github.com/KirillOsenkov/Benchmarks/blob/8556f92c07b9a3d211a7e72b776c324aff7e24b7/src/Tests/DirectoryEnumeration.cs#L12-L15
Also it seems that this approach doesn't run into exceptions when trying to access inaccessible directories, unlike the BCL one.
Feel free to experiment with this benchmark, steal the source, try on real-world builds, see if you can tune it further, submit PRs if you can make it even faster ;)
The first place I would try this is in FileMatcher (see the stack above). Also, looking at the stack, I'd measure getting rid of the ConcurrentDictionary and try a simple collection with a lock around it. I often get much better results with a simple lock around simple collections.
I'm noticing we do have a ManagedFileSystem abstraction, so I guess we can try replacing the implementation in a single place and see if it can make our builds faster wholesale.
One potential concern is that the parallelism in the new method does a lot of thrashing, so not sure how this performs on an HDD. But then again, do we care about HDDs anymore?
The text was updated successfully, but these errors were encountered: