-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file enumeration extensibility points #24429
Comments
@terrajobst, @danmosemsft, @pjanotti |
I would rename |
Should |
I agree, updating.
It would be possible to do so. I was conflicted on this one as I wasn't sure whether or not we'd want to output it frequent, as it would reallocate an existing string as we currently have to create it. Further/future optimizations could potentially be done where that could be avoided on some platforms. Pretty complicated to optimize the full path to the directory we're looking at away as we can't keep a single character array for the directory, so I'm pretty unsure of if this is practically possible. |
This API feels very complicated. Should it be part of the platform, or should it rather live in separate NuGet package that the few folks who really need it will reference? |
It is, but out of necessity. I couldn't find a better solution that kept performance high and allocations low. Note that I'm measuring impact against hitting the same folders more than once (which is super common, particularly in the MSBuild scenario). Once caches below us are hot (OS, hardware) we consume a much more significant portion of the time spent.
We use it internally already. I'm not sure if there is value in doing the gymnastics to isolate by package. If people are particularly concerned about visibility in the "normal" namespace we could put things in something like |
I am concerned about things like being able to evolve this independently on the platform, or requiring all current and future platforms to implement it.
We have number of other cases where we use one limited flavor of the functionality internally, and we expose the same thing in independent package. And we are reversing course on a few more for the reasons above. |
While I like the idea of OOB'ing this from a quick distribution standpoint (VS really wants this), there are some things that I'd want to ensure:
|
Presumably it would need a .NET Framework target that used the existing constructors, or quick distribution is not solved. |
We'd need to do some contortions and lose some perf if the APIs aren't there. I don't think any of the scenarios that are currently in play for the build teams actually use the |
Currently .Net Core implement DOS-like globbing for Windows and fnmatch-like globing on Unix. PowerShell implementation Case matching is needed for IntelliSense scenarios. For performance .Net Core could use low level API for enumeration:
FTS support hard/soft link cycle detection (and case matching). In PowerShell we have to implement this by inode cache and checks. |
@iSazonov, #21362 is the general discussion around adding new globbing support. With this issues changes we now use a unified matching algorithm for all platforms. Existing APIs use the DOS-like behavior and the new APIs (that take FindOptions directly) default to a simple match that honors I absolutely want to add more matching options. If anyone is game to implement the Posix one (or others) I'll do what I can to help shepherd them along. This is the current, checked-in shape of the current API based on reviews (this is the source code from the reference assembly). All existing APIs that took namespace System.IO.Enumeration
{
public enum MatchType
{
Simple,
Dos
}
public enum MatchCasing
{
PlatformDefault,
CaseSensitive,
CaseInsensitive
}
public class EnumerationOptions
{
public EnumerationOptions() { }
public bool RecurseSubdirectories { get { throw null; } set { } }
public bool IgnoreInaccessible { get { throw null; } set { } }
public int BufferSize { get { throw null; } set { } }
public FileAttributes AttributesToSkip { get { throw null; } set { } }
public MatchType MatchType { get { throw null; } set { } }
public MatchCasing MatchCasing { get { throw null; } set { } }
public bool ReturnSpecialDirectories { get { throw null; } set { } }
}
public ref struct FileSystemEntry
{
public ReadOnlySpan<char> Directory { get { throw null; } }
public string RootDirectory { get { throw null; } }
public string OriginalRootDirectory { get { throw null; } }
public ReadOnlySpan<char> FileName { get { throw null; } }
public FileAttributes Attributes { get { throw null; } }
public long Length { get { throw null; } }
public DateTimeOffset CreationTimeUtc { get { throw null; } }
public DateTimeOffset LastAccessTimeUtc { get { throw null; } }
public DateTimeOffset LastWriteTimeUtc { get { throw null; } }
public bool IsDirectory { get { throw null; } }
public FileSystemInfo ToFileSystemInfo() { throw null; }
public string ToSpecifiedFullPath() { throw null; }
public string ToFullPath() { throw null; }
}
public abstract class FileSystemEnumerator<TResult> : Runtime.ConstrainedExecution.CriticalFinalizerObject, Collections.Generic.IEnumerator<TResult>
{
public FileSystemEnumerator(string directory, EnumerationOptions options = null) { }
protected virtual bool ShouldIncludeEntry(ref FileSystemEntry entry) { throw null; }
protected virtual bool ShouldRecurseIntoEntry(ref FileSystemEntry entry) { throw null; }
protected abstract TResult TransformEntry(ref FileSystemEntry entry);
protected virtual void OnDirectoryFinished(ReadOnlySpan<char> directory) { throw null; }
protected virtual bool ContinueOnError(int error) { throw null; }
public TResult Current { get { throw null; } }
object System.Collections.IEnumerator.Current { get { throw null; } }
public bool MoveNext() { throw null; }
public void Reset() { throw null; }
public void Dispose() { throw null; }
protected virtual void Dispose(bool disposing) { throw null; }
}
public class FileSystemEnumerable<TResult> : Collections.Generic.IEnumerable<TResult>
{
public FileSystemEnumerable(string directory, FindTransform transform, EnumerationOptions options = null) { }
public FindPredicate ShouldRecursePredicate { get { throw null; } set { } }
public FindPredicate ShouldIncludePredicate { get { throw null; } set { } }
public Collections.Generic.IEnumerator<TResult> GetEnumerator() { throw null; }
Collections.IEnumerator Collections.IEnumerable.GetEnumerator() { throw null; }
public delegate bool FindPredicate(ref FileSystemEntry entry);
public delegate TResult FindTransform(ref FileSystemEntry entry);
}
public static class FileSystemName
{
public static string TranslateDosExpression(string expression) { throw null; }
public static bool MatchesDosExpression(ReadOnlySpan<char> expression, ReadOnlySpan<char> name, bool ignoreCase = true) { throw null; }
public static bool MatchesSimpleExpression(ReadOnlySpan<char> expression, ReadOnlySpan<char> name, bool ignoreCase = true) { throw null; }
}
} I'm currently iterating on the Unix implementation, fixing the last few corner cases and improving the allocation count. |
I duplicate my post in #21362. For PowerShell we are very interested in implementing simple wildcards like '[a-z]' for paths - |
Looks good. Comments:
|
Updated original comment per latest feedback. |
* API tweaks to match latest updates to spec Add a few new tests See #25873 * Properly clear state when enumerating on Unix. Make sure we don't include special directories in subdir processing. Add test. Collapse helper that was only called in one place, and remove dead one.
In PowerShell we implemented this by means of a cache for device/inode information on Windows/Unix PowerShell/PowerShell#4020 |
We should probably rename |
TranslateDosExpression and MatchesDosExpression rename too? |
That was my presumption. We also discussed adding |
* API tweaks to match latest updates to spec Add a few new tests See #25873 * Properly clear state when enumerating on Unix. Make sure we don't include special directories in subdir processing. Add test. Collapse helper that was only called in one place, and remove dead one.
Everything is checked in per spec. |
Reading @JeremyKuhne 's blog post something caught my attention.
If those properties are UTC as their name indicate, why use On the other hand, if for some reason the data type really has to be DateTimeOffset to account for offsets <> 0, then maybe we could remove the Utc suffix from the property names. |
This may be obvious if you are looking at the signature because it indicates UTC, but when using the returned DateTime in subsequent operations, would not be obvious if the DateTime is UTC without carefully checking that. Returning DateTimeOffset is always better in such situations telling the returned result is associated with UTC zone (even if it has offset). so the subsequent operation on the returned value would be safer. In general, using DateTimeOffset is much clearer than DateTime because of the confusion of the DateTimeKind inside it.
Removing Utc suffix can cause the user think the returned Date/Time in the local zone with some offset inside DateTimeOffset. The users have to check the results at that time to know it is really Utc. So, having Utc suffix is helping in that. |
We need low-allocating high-performance extensibility points to build solutions for enumerating files.
(This is the API review for dotnet/designs#24)
Rationale and Usage
Enumerating files in .NET provides limited configurability. You can specify a simple DOS style pattern and whether or not to look recursively. More complicated filtering requires post filtering all results which can introduce a significant performance drain.
Recursive enumeration is also problematic in that there is no way to handle error states such as access issues or cycles created by links.
These restrictions have a significant impact on file system intensive applications, a key example being MSBuild. This document proposes a new set of primitive file and directory traversal APIs that are optimized for providing more flexibility while keeping the overhead to a minimum so that enumeration becomes both more powerful as well as more performant.
To write a wrapper that gets files with a given set of extensions you would need to write something similar to:
Not complicated to write, but this can do an enormous amount of extra allocations. You have to create full strings and
FileInfo
's for every single item in the file system. We can cut this down significantly with the extension point:The number of allocation reductions with the above solution is significant.
FileInfo
allocationsProposed API
Implementation Notes
Changes to existing behavior
*.htm
will no longer match*.html
if 8.3 filenames exist*.*
means any file with a period,foo.*
matchesfoo.txt
, notfoo
)FileSystemEnumerable
directory
, andtransform
will throwArgumentNullException
if null.FileSystemEnumerator
directory
, andtransform
will throwArgumentNullException
if null.FileSystemEntry
FileSystemEntry
should not be cachedFileName
will only contain valid data for the duration of filter/transform calls, hence the struct being passed by refMatchers
\
using the\
character\*
,\\
,?
(and\>
,\<
,\"
forMatchesDosExpression
)expresion
will match allThe text was updated successfully, but these errors were encountered: