-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial draft of file enumeration design doc. #24
Changes from 2 commits
9178340
fea0bb2
32f6440
cb3288b
76ae3d5
7edad28
295198b
9a010fc
9bdf78a
50271fb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,248 @@ | ||
# Extensible File Enumeration | ||
|
||
**PM** [Immo Landwerth](https://github.com/terrajobst) | **Dev** [Jeremy Kuhne](https://github.com/jeremykuhne) | ||
|
||
Enumerating files in .NET provides limited configurability. You can specify a simple DOS style pattern and whether or not to look recursively. More complicated filtering requires post filtering all results which can introduce a significant performance drain. | ||
|
||
Recursive enumeration is also problematic in that there is no way to handle error states such as access issues or cycles created by links. | ||
|
||
These restrictions have a significant impact on file system intensive applications, a key example being MSBuild. This document proposes a new set of primitive file and directory traversal APIs that are optimized for providing more flexibility while keeping the overhead to a minimum so that enumeration becomes both more powerful as well as more performant. | ||
|
||
## Scenarios and User Experience | ||
|
||
1. MSBuild can custom filter filesystem entries with limited allocations and form the results in any desired format. | ||
2. Users can build custom enumerations utilizing completely custom or provided commonly used filters and transforms. | ||
|
||
## Requirements | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section is intended to be empty; the requirements you want to have should go under goals and the ones you want to scope out under non-goals. #Closed |
||
|
||
|
||
### Goals | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The primary goal seems to be:
|
||
|
||
|
||
1. Custom filtering based on common file system data | ||
- Name | ||
- Attributes | ||
- Time stamps | ||
- File size | ||
2. Result transforms can be of any desired output type | ||
- Like Linq Select(), but keeps FileData on the stack | ||
3. API minimizes allocations | ||
4. API is cross platform abstract | ||
3. We provide common filters and transforms | ||
- To file/directory name | ||
- To full path | ||
- To File/Directory/FileSystemInfo | ||
- DOS style filters (Legacy- `*/?` with DOS semantics, e.g. `*.` is all files without an extension) | ||
- Simple Regex filter | ||
- Simpler globbing (`*/?` without DOS style variants) | ||
- Set of extensions (`*.c`, `*.cpp`, `*.cc`, `*.cxx`, etc.) | ||
4. Recursive behavior is configurable | ||
- On/Off | ||
- Predicate based on FileData | ||
5. Can avoid throwing access denied exceptions | ||
|
||
### Non-Goals | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One non-goal to call out is that we don't intend to replace the existing IO APIs -- these are meant to be advanced APIs for folks that really have to care about performance. #Closed |
||
|
||
1. API will not expose platform specific data | ||
3. Error handling configuration is fully customizable | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That needs more detail as a goal says Error handling is configurable while a non-goal says Error handling configuration is fully customizable. You need to draw say enough to that the reader can draw a line in their head of what's in and what's out #Closed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
## Design | ||
|
||
### Proposed API surface | ||
|
||
``` C# | ||
namespace System.IO | ||
{ | ||
/// <summary> | ||
/// Delegate for filtering out find results. | ||
/// </summary> | ||
public delegate bool FindPredicate<TState>(ref RawFindData findData, TState state); | ||
|
||
/// <summary> | ||
/// Delegate for transforming raw find data into a result. | ||
/// </summary> | ||
public delegate TResult FindTransform<TResult, TState>(ref RawFindData findData, TState state); | ||
|
||
[Flags] | ||
public enum FindOptions | ||
{ | ||
None = 0x0, | ||
|
||
// Enumerate subdirectories | ||
Recurse = 0x1, | ||
|
||
// Skip files/directories when access is denied | ||
IgnoreAccessDenied = 0x2, | ||
|
||
// Future: Add flags for tracking cycles, etc. | ||
} | ||
|
||
public class FindEnumerable<TResult, TState> : CriticalFinalizerObject, IEnumerable<TResult>, IEnumerator<TResult> | ||
{ | ||
public FindEnumerable( | ||
string directory, | ||
FindTransform<TResult, TState> transform, | ||
FindPredicate<TState> predicate, | ||
// Only used if FindOptions.Recurse is set. Default is to always recurse. | ||
FindPredicate<TState> recursePredicate = null, | ||
TState state = default, | ||
FindOptions options = FindOptions.None) | ||
} | ||
|
||
public static class Directory | ||
{ | ||
public static IEnumerable<TResult> Enumerate<TResult, TState>( | ||
string path, | ||
FindTransform<TResult, TState> transform, | ||
FindPredicate<TState> predicate, | ||
FindPredicate<TState> recursePredicate = null, | ||
TState state = default, | ||
FindOptions options = FindOptions.None); | ||
} | ||
|
||
public class DirectoryInfo | ||
{ | ||
public static IEnumerable<TResult> Enumerate<TResult, TState>( | ||
FindTransform<TResult, TState> transform, | ||
FindPredicate<TState> predicate, | ||
FindPredicate<TState> recursePredicate = null, | ||
TState state = default, | ||
FindOptions options = FindOptions.None); | ||
} | ||
|
||
/// <summary> | ||
/// Used for processing and filtering find results. | ||
/// </summary> | ||
public ref struct RawFindData | ||
{ | ||
// This will have private members that hold the native data and | ||
// will lazily fill in data for properties where such data is not | ||
// immediately available in the current platform's native results. | ||
|
||
// The full path to the directory the current result is in | ||
public string Directory { get; } | ||
|
||
// The full path to the starting directory for enumeration | ||
public string OriginalDirectory { get; } | ||
|
||
// The path to the starting directory as passed to the enumerable constructor | ||
public string OriginalUserDirectory { get; } | ||
|
||
// Note: using a span allows us to reduce unneeded allocations | ||
public ReadOnlySpan<char> FileName { get; } | ||
public FileAttributes Attributes { get; } | ||
public long Length { get; } | ||
|
||
public DateTime CreationTimeUtc { get; } | ||
public DateTime LastAccessTimeUtc { get; } | ||
public DateTime LastWriteTimeUtc { get; } | ||
} | ||
} | ||
``` | ||
|
||
### Transforms & Predicates | ||
|
||
We'll provide common predicates transforms for building searches. | ||
|
||
``` C# | ||
namespace System.IO | ||
{ | ||
internal static partial class FindPredicates | ||
{ | ||
internal static bool NotDotOrDotDot(ref RawFindData findData) | ||
internal static bool IsDirectory(ref RawFindData findData) | ||
} | ||
|
||
public static partial class FindTransforms | ||
{ | ||
public static DirectoryInfo AsDirectoryInfo(ref RawFindData findData) | ||
public static FileInfo AsFileInfo(ref RawFindData findData) | ||
public static FileSystemInfo AsFileSystemInfo(ref RawFindData findData) | ||
public static string AsFileName(ref RawFindData findData) | ||
public static string AsFullPath(ref RawFindData findData) | ||
} | ||
} | ||
|
||
``` | ||
|
||
### DosMatcher | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume you would have a RegexMatcher as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ideally. We might have to have less than ideal perf to start with if we don't get the Span overloads on Regex at first. In reply to: 154807693 [](ancestors = 154807693) |
||
|
||
We currently have an implementation of the algorithm used for matching files on Windows in FileSystemWatcher. Providing this publicly will allow consistently matching names cross platform according to Windows rules if such behavior is desired. | ||
|
||
``` C# | ||
namespace System.IO | ||
{ | ||
public static class DosMatcher | ||
{ | ||
/// <summary> | ||
/// Change '*' and '?' to '<', '>' and '"' to match Win32 behavior. For compatibility, Windows | ||
/// changes some wildcards to provide a closer match to historical DOS 8.3 filename matching. | ||
/// </summary> | ||
public unsafe static string TranslateExpression(string expression) | ||
|
||
/// <summary> | ||
/// Return true if the given expression matches the given name. | ||
/// </summary> | ||
public unsafe static bool MatchPattern(string expression, ReadOnlySpan<char> name, bool ignoreCase = true) | ||
} | ||
} | ||
``` | ||
|
||
### Samples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'd take both the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
Getting full path of all files matching a given name pattern (close to what FindFiles does, but returning the full path): | ||
|
||
``` C# | ||
public static FindEnumerable<string, string> GetFiles(string directory, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sample has a heavy cognitive load:
It's the ultimate API, but it is likely intimidating for someone who simply wants look for files matching a regex, without caring about every drop of performance There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For sure. Doing this doesn't preclude us adding simpler overloads, and I would actually expect to eventually. In reply to: 154808558 [](ancestors = 154808558) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be helpful to propose those here. |
||
string expression = "*", | ||
bool recursive = false) | ||
{ | ||
return new FindEnumerable<string, string>( | ||
directory, | ||
(ref RawFindData findData, string expr) => FindTransforms.AsFullPath(ref findData), | ||
(ref RawFindData findData, string expr) => | ||
{ | ||
return !FindPredicates.IsDirectory(ref findData) | ||
&& DosMatcher.MatchPattern(expr, findData.FileName, ignoreCase: true); | ||
}, | ||
state: DosMatcher.TranslateExpression(expression), | ||
options: recursive ? FindOptions.Recurse : FindOptions.None); | ||
} | ||
|
||
``` | ||
|
||
### Existing API summary | ||
|
||
``` C# | ||
namespace System.IO | ||
{ | ||
public static class Directory | ||
{ | ||
public static IEnumerable<string> EnumerateDirectories(string path, string searchPattern, SearchOption searchOption); | ||
public static IEnumerable<string> EnumerateFiles(string path, string searchPattern, SearchOption searchOption); | ||
public static IEnumerable<string> EnumerateFileSystemEntries(string path, string searchPattern, SearchOption searchOption); | ||
public static string[] GetDirectories(string path, string searchPattern, SearchOption searchOption); | ||
public static string[] GetFiles(string path, string searchPattern, SearchOption searchOption); | ||
public static string[] GetFileSystemEntries(string path, string searchPattern, SearchOption searchOption); | ||
} | ||
|
||
public sealed class DirectoryInfo : FileSystemInfo | ||
{ | ||
public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, SearchOption searchOption); | ||
public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, SearchOption searchOption); | ||
public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, SearchOption searchOption); | ||
public DirectoryInfo[] GetDirectories(string searchPattern, SearchOption searchOption); | ||
public FileInfo[] GetFiles(string searchPattern, SearchOption searchOption); | ||
public FileSystemInfo[] GetFileSystemInfos(string searchPattern, SearchOption searchOption); | ||
} | ||
|
||
public enum SearchOption | ||
{ | ||
AllDirectories, | ||
TopDirectoryOnly | ||
} | ||
} | ||
``` | ||
|
||
|
||
## Q & A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your scenarios way too short. You want to make those headings and show some sample code consuming the APIs you're proposing. Scenarios are meant to illustrate the value your APIs are adding. They shouldn't be longer than a few paragraphs but they also shouldn't be that abstract. #Closed