Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developers can enumerate directories and files using globbing patterns #21362

Open
Tracked by #44314 ...
khellang opened this issue Apr 25, 2017 · 29 comments
Open
Tracked by #44314 ...

Developers can enumerate directories and files using globbing patterns #21362

khellang opened this issue Apr 25, 2017 · 29 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.IO Cost:M Work that requires one engineer up to 2 weeks Priority:2 Work that is important, but not critical for the release Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@khellang
Copy link
Member

I'd like to start a discussion on including a file system globbing API in .NET (Core). If you look at implementations mentioned on Wikipedia, every "mainstream platform" has an entry, but not .NET.

There's quite a few (more or less) successful implementations around (see below), some even from Microsoft, but I think something as fundamental as this, should ship with the framework.

There's already partial globbing support using the following methods

  • Directory.GetFiles
  • Directory.EnumerateFiles
  • Directory.GetFileSystemEntries
  • Directory.EnumerateFileSystemEntries
  • Directory.GetDirectories
  • Directory.EnumerateDirectories

They all have a searchPattern argument, but it lacks support for recursive globs (** aka. "globstar"), brace expansion etc. This can be achieved using the SearchOption argument, but the API is hard to use when you want to support (often user-defined) recursive patterns like /src/**/*.csproj.

I'd ❤️ to hear people's opinions here...

  • Is it worth including in the framework?
  • Should new APIs be introduced, or can we "level up" the existing searchPattern in the methods mentioned above (without it being a breaking change)?

Examples

And tons of other implementations...

@khellang khellang changed the title API Request: Path Globbing Feature Request: File System Globbing Apr 25, 2017
@nil4
Copy link
Contributor

nil4 commented Apr 25, 2017

This article may come in handy: Glob Matching Can Be Simple And Fast Too

@khellang
Copy link
Member Author

khellang commented Apr 25, 2017

This article may come in handy: Glob Matching Can Be Simple And Fast Too

Heh. Reading that article was exactly what prompted me to file this issue 😂

@daveaglick
Copy link

I think this would make a great API addition.

I would like to see the implementation tied to an abstraction and not directly to the CoreFx IO classes. Then a specific implementation of the globbing API could be delivered support the CoreFx IO API while leaving it open for other uses.

For example, Microsoft.Extensions.FileSystemGlobbing does a great job with this because you only have to implement two abstract classes to get it two work (DirectoryInfoBase and FileInfoBase - interfaces would be even nicer though).

Some things I think are missing from Microsoft.Extensions.FileSystemGlobbing and would love to see:

  • Pattern expansion (being able to specify multiple pattern alternatives with a single string)
  • Pattern exclusion (excluding patterns or pattern expansions with a specific syntax, something like !)

For example, pattern expansion can be represented by braces so you can have patterns like:

/src/**/*.{csproj,cs}

which selects all *.csproj files and all *.cs files in the specified folder.

Likewise, pattern exclusion would look like:

/src/**/*.{!csproj,}

which selects all *.* files (notice the comma without anything specified inside the expansion) except *.csproj in the specified folder.

Both of these concepts are (mostly) implemented for Microsoft.Extensions.FileSystemGlobbing in Reliak.FileSystemGlobbingExtensions. There's also a really good implementation of brace expansion in Minimatch.

FWIW, I've also got an implementation for Microsoft.Extensions.FileSystemGlobbing based on combining features from those two libraries here.

@maartenba
Copy link

Yes!

@ThatRendle
Copy link

Given the shift to CLI, both for the tooling and as a target for .NET Core apps, this would make perfect sense.

@danmoseley
Copy link
Member

danmoseley commented Apr 25, 2017

Is there some kind of quasi standard written down for globbing behavior? Are forward slashes important (they aren't for native Windows apps like MSBuild, but are for Git on Windows probably just undone work)

Does "**" literally match "everything including slash" and "*" literally matches "everything excluding slash"? I believe that's what we did in MSBuild and I think that's Git behavior also.

Could "**" supported be added to the existing File IO API's without unnacceptable breaks, and is that important? I like the idea of also offering it for use in other contexts, eg., Git obviously does not always run globs over the file system.

@danmoseley
Copy link
Member

danmoseley commented Apr 25, 2017

As for perf, of course it's faster to use the native API where possible. I believe for MSBuild we did something like: used the OS if there were no * after the last slash; otherwise cropped the pattern at the first slash after the first * then used the OS to enumerate all files below it into a list then used a regex on the results to handle any subsequent * and **. Curious what Unix shells, perl etc do. It would be easy to check.

@khellang
Copy link
Member Author

khellang commented Apr 25, 2017

Is there some kind of quasi standard written down for globbing behavior?

I don't think there's any formal specification. I guess the best bet is going by something like minimatch's test suite which seems to be pretty in-line with bash, sh, ksh etc.

There's even some comments on compliance with other fnmatch/glob implementations in their README.md

Are forward slashes important (they aren't for native Windows apps like MSBuild, but are for Git on Windows probably just undone work)

When it comes to forward/backward slashes, I think it makes sense to do what node-glob does, which means only use forward slash for glob patterns

@danmoseley
Copy link
Member

I think it makes sense to do what node-glob does, which means only use forward slash for glob patterns

Why is that? I assumed that in Git and Sublime on Windows this was just a hangover from their non-Windows heritage. it's certainly handy to be able to paste in a Windows path and add ** on the end or suchlike.

@khellang
Copy link
Member Author

How do you handle escaping if you handle both forward and backward slashes as separators?

@danmoseley
Copy link
Member

@JeremyKuhne this is an IO area we might want to invest in..

@TylerLeonhardt
Copy link

I'd also like to point out that

Directory.EnumerateFiles(
    folderPath,
    pattern,
    SearchOption.AllDirectories)

isn't good at best effort attempts to enumerate files.

If a path is too long or if you are unauthorized to access a file, the EnumerateFiles will throw and you won't be able to continue enumerating after that throwing file.

@JeremyKuhne
Copy link
Member

JeremyKuhne commented Jan 22, 2018

Note that we're reviewing an extensibility mechanism for enumeration that will allow building globbing solutions. See #24429.

@JeremyKuhne JeremyKuhne self-assigned this Feb 6, 2018
@iSazonov
Copy link
Contributor

iSazonov commented Feb 8, 2018

Currently .Net Core implement DOS-like globbing for Windows and fnmatch-like globing on Unix.
.Net Core assume that applications should have the same behavior on all platform. So .Net Core should implement 3th model for globbing - unified/modern globbing which has the same behavior on all platforms. Perhaps it should be POSIX 1003.2, 3.13.

PowerShell implementation
https://msdn.microsoft.com/en-us/library/aa717088%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396#Anchor_1
https://github.com/PowerShell/PowerShell/blob/master/src/System.Management.Automation/namespaces/LocationGlobber.cs
It support wildcards ('*', '?', '[', ']') in file and directory names.
Currently it haven't case matching, it is not optimal and we expect to get high performance solution from .Net Core.

Case matching is needed for IntelliSense scenarios.
The scenarios also require sorted file enumeration. Currently PowerShell collect paths, sort and enumarate for IntelliSense - it can result in huge extra allocations. Perhaps .Net Core can do this more optimal.

For performance .Net Core could use low level API for enumeration:

  • On Windows FindFirstFileEx (not clear about hard/soft link cycle detection)
  • On Linux/Unix
    • modern FTS (it seems present on Linux only)
    • glob(3)

FTS support hard/soft link cycle detection (and case matching). In PowerShell we have to implement this by inode cache and checks.

@danmoseley
Copy link
Member

POSIX apparently doesn't support globstar - nor does Powershell. Personally, I think it's important.

@danmoseley
Copy link
Member

danmoseley commented Feb 25, 2018

We don't have time to implement a new globbing spec in 2.1, but I wonder how difficult it would be to add a generic MatchType.RegularExpression to MatchType.Win32 and MatchType.Dos. @JeremyKuhne ?

@JeremyKuhne
Copy link
Member

I wonder how difficult it would be to add a generic MatchType.RegularExpression to MatchType.Win32 and MatchType.Dos.

Very easy. Performance would not be great until we get a span Regex implementation (as we'd have to create strings), but having one would allow us to light up existing usages when we do get one...

@iSazonov
Copy link
Contributor

POSIX apparently doesn't support globstar - nor does Powershell. Personally, I think it's important.

Currently Bash has the support.
And in PowerShell repo we seems discussed to add globstart - we want this.

to add a generic MatchType.RegularExpression

If we can add and later make performance optimization I vote for the adding.

/cc @SteveL-MSFT

@danmoseley
Copy link
Member

@JeremyKuhne do you want to get the API approved for it (the enum member) then? It would have to be done by end of month.

@JeremyKuhne
Copy link
Member

@JeremyKuhne do you want to get the API approved for it (the enum member) then? It would have to be done by end of month.

I'll try. We'd have to take a regex dependency to do so.

@tapika
Copy link

tapika commented Apr 15, 2018

One more implementation to your list:
https://sourceforge.net/p/syncproj/code/HEAD/tree/SolutionProjectBuilder.cs#l1419

What currently syncProj does not support is listing of file extensions by "{cpp,h}" - and I've concluded not to support that syntax, as it can be resorted to regex match pattern (cpp|h), but then it matches either .h or .cpp whichever exists, but typical use for syncProj is to match actually both files - so .cpp and .h files, not one of them. I could add support a.{c,h} as inclusion of files a.c and a.h, but then what if we have {c*,h} - that goes into more complex direction, and for me it's easier to sort it out by actually searching files and listing them.

But if you manage to write function similar to my, and even to improve it, let me know. (Without heavy class hierarchy similar to Microsoft.Extensions.FileSystemGlobbing)

@StefanBertels
Copy link

Just another interesting package: https://github.com/dazinator/DotNet.Glob

@JeremyKuhne JeremyKuhne removed their assignment Jan 17, 2020
@JeremyKuhne
Copy link
Member

The hope is that we are able to add to MatchOptions so that it would be something like this (don't mind the terms):

    public enum MatchType
    {
        // These exist
        Simple,
        Win32,

        // New
        Globbing,
        MSBuildGlobbing,
        Regex
    }

Adding Regex may not be possible due to taking the dependency on the Regex library. If it isn't we'd add something like 'Custom' and a property in the EnumerationOptions that allows you to specify a match delegate.

@iSazonov
Copy link
Contributor

Question about custom delegate is - would we want to pass custom options to the delegate?

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@carlossanlop carlossanlop added this to the Future milestone Jun 18, 2020
@jozkee jozkee self-assigned this Nov 5, 2020
@jozkee jozkee modified the milestones: Future, 6.0.0 Nov 5, 2020
@danmoseley danmoseley added the Bottom Up Work Not part of a theme, epic, or user story label Nov 6, 2020
@yufeih
Copy link
Contributor

yufeih commented Nov 20, 2020

I would like to see an API that checks if an input matches a glob pattern without touching the file system, similar to Regex, you create an instance and use it repeatedly against different strings:

public class Glob
{
    public Glob(string pattern);
    public bool IsMatch(string input);
    public static bool IsMatch(string input, string pattern);
}

@jozkee
Copy link
Member

jozkee commented Nov 20, 2020

@yufeih did you consider using the Matcher from Microsoft.Extensions.FileSystemGlobbing? MatcherExtensions.Match does exactly what you are asking for. Also consider Dotnet.Glob, which can be used to match strings against the specified pattern, plus it has Span support.

@yufeih
Copy link
Contributor

yufeih commented Nov 20, 2020

@jozkee I'm currently using Glob. Microsoft.Extensions.FileSystemGlobbing lacks support for pattern expansion /src/**/*.{csproj,cs}. If BCL supports file system globbing, having a paired file system independent API with the same behavior is very helpful. In our scenario, we are not only globbing an initial set of files, but also using glob patterns to bulk set properties.

@danmoseley
Copy link
Member

Should this be labeled "User Story" and have a customer-focused title in the user story form ie "PERSONA-VERB-NOUN"

@jeffhandley jeffhandley changed the title Feature Request: File System Globbing Developers can enumerate directories and files using globbing patterns Jan 14, 2021
@jeffhandley jeffhandley added User Story A single user-facing feature. Can be grouped under an epic. Priority:2 Work that is important, but not critical for the release labels Jan 14, 2021
@jozkee jozkee added the Cost:M Work that requires one engineer up to 2 weeks label Jan 15, 2021
@jeffhandley jeffhandley modified the milestones: 6.0.0, 7.0.0 Jun 14, 2021
@jeffhandley jeffhandley removed the Bottom Up Work Not part of a theme, epic, or user story label Jan 9, 2022
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022
@Benjin
Copy link

Benjin commented Nov 11, 2022

I'm finding a short-coming of Microsoft.Extensions.FileSystemGlobbing, and hoping that the improvement can make it into this upcoming work: I'd like an option for the includes and excludes to be evaluated in the same order in which they were added to the Matcher.

root /
  - helloWorld.txt
  - ExcludeMe /
      - notIncluded.txt
      - ButActuallyIncludeMe /
          - hiEarth.txt
Matcher m = new();
m.AddInclude("**/*");
m.AddExclude("ExcludeMe/**/*");
m.AddInclude("ExcludeMe/ButActuallyIncludeMe/**/*);

m.Execute("root");

// root/helloWorld.txt
// root/ExcludeMe/ButActuallyIncludeMe/hiEarth.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.IO Cost:M Work that requires one engineer up to 2 weeks Priority:2 Work that is important, but not critical for the release Team:Libraries User Story A single user-facing feature. Can be grouped under an epic.
Projects
None yet
Development

No branches or pull requests