-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question mark character not treated as special in path segment #39
Comments
@dlong500 thanks for investigating this! It doesn't seem like we have a test for this case. If you'd want to send a test and possibly a fix, I can get that in our next major release. We are actually making a few fixes related to the globs not catching certain things, and we need to bump a major just in case someone relying on the mistaken behavior. |
@phated I can take a stab at it. So are you in agreement that my expected behavior is correct? It looks like the issue stems from |
@dlong500 I'm not exactly sure. The question mark seems to be an "extglob" which usually only matters when prefixed to a glob type pattern with parens. Why do you think that question mark should be considered a globbing pattern when not part of an extglob? I'd need something to reference to determine. I don't think we should turn off strict mode. |
I'm going based off of the syntax specified in the Maybe there is another way I can specify the same type of pattern using extglobs or something, but all I know is it was working with |
Hmm, well major versions are supposed to contain breaking changes and if they weren't following actual glob "specs" (for as much as there is one) then I definitely could see how that wouldn't be compatible between different libraries. We've tried to stick to libraries that follow as closely to bash as possible. |
According to the I think there could be a case for getting that matcher added to |
Are you saying I should open a PR in the Maybe there should be a strict option in |
I'm a maintainer on is-glob. I checked the strict regexp and it just seems to be missing the generic questionmark path. |
I've had a tiny bit of time to work on this and it's easy to add the question mark character to the strict regex but that trips up a lot of the tests related to invalid regex patterns and capture groups. Without knowing the history of why all the existing tests are there I don't want to be messing with a bunch of other tests. What would you recommend is the best path forward? |
Yeah, I'd guess it would be a breaking change and the tests would need to be updated. We are in the process of making a breaking release to this library, too, so it could be upstreamed here. I was mostly asking for the implementation/breaking tests proposed as a discussion point. I can update all the broken tests if everyone agrees that |
But that result is correct. I assume that if you want
This should also return the correct result with I don't usually have strong opinions on these kinds of decisions, I just follow the logic and read up on prior work to see where that leads me. If the consensus is that Let me know your thoughts. |
@jonschlinkert I'm fine with whatever we decide, but I'm curious why picomatch says that ? is a globbing pattern. |
I expect there can be some other way to accomplish what I'm after and I'm perfectly happy to leave things as-is if I can adjust my patterns/regex to work. What's confusing is the Here is a more detailed example of the type of thing I've been doing that works with This would delete folders like:
but not folders like:
If I'm reading the docs correctly for extglobs, maybe I can achieve the same thing using a pattern like: Is that correct? |
Hmm, good point. Sometimes after I go back and forth on something so many times I forget what the final decision was. I'll go back and refresh myself on what the expected behavior, then I'll come back with my thoughts. @dlong500 sorry for any confusion if I got this wrong. I'll also read over #39 (comment) and see if I have any suggestions or if this is indeed a bug. |
Here is a refresher on what
To conditionally match any character or nothing, you could do Specific number of characters Since you can match a specific number of characters using
min/max However, you can use the following "trick" to use a regex quantifier to match any number of characters: picomatch('data/100-123?\\{0,3}_files/[0-3]/', { unescape: true }); Essentially, the Please let me know if this works for you. edit: I just updated the globs to use |
So, does this mean that there is indeed a bug in |
@jonschlinkert Thanks for your time looking into this, and sorry to be dense, but I'm still a bit confused by the examples you gave. I'm specifically trying to match a single character (not zero or more than one) in a path segment. I want to match folders like:
But NOT the following folders:
Matching the 0-3 in the subfolders is working fine with an extglob (and as you mention can be done with other methods as well), but what is the simplest way to match the middle part of the path ( edit I tried using an extglob to match a single character like |
I’m almost certainly the one whose dense, no need to apologize. I should have used 1 instead of 0 in my examples. You should be able to use whatever numbers you need as constraints.
…Sent from my iPhone
On Mar 9, 2021, at 4:40 PM, Davison Long ***@***.***> wrote:
@jonschlinkert Thanks for your time looking into this, and sorry to be dense, but I'm still a bit confused by the examples you gave. I'm specifically trying to match a single character (not zero or more than one) in a path segment.
I want to match folders like:
data/100-123A_files/0/
data/100-123A_files/1/
data/100-123A_files/2/
data/100-123A_files/3/
data/100-123B_files/0/
data/100-123B_files/1/
data/100-123B_files/2/
data/100-123B_files/3/
But NOT the following folders:
data/100-123_files/0/
data/100-123AX_files/0/
Matching the 0-3 in the subfolders is working fine with an extglob (and as you mention can be done with other methods as well), but what is the simplest way to match the middle part of the path (100-123?_files) where the question mark represents exactly one character.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I'm still perplexed about how to match only a single character (not zero or more than one) in a path segment. Using brace expansion requires a comma which means you can have it match: In theory using an extglob such as EDIT Ok, I'm stupid. I realize now that you were using regex quantifiers--not brace expansion--in your picomatch example. The only thing I'll have to figure out is if that will work with the UPDATE So it doesn't look like I can pass any options from |
Sorry for the late response. I'm not sure what else I can do to help but I'm happy to assist if you
Probably not. It seems fine IMHO, but I wouldn't know without actually testing the generated regex against huge sets of files. Meaning, like all code, it depends. Picomatch generates a regular expression from a user-defined pattern, so skepticism is always a healthy and natural default position. Here are a couple of things that might speed up your matches (you're probably already doing these things but in case it helps you or anyone else who sees this discussion):
Since globs are used in scripts and code that's executed inside CI/CD pipelines, thinking about performance is a good thing.
Lol indeed. I wanted to start on this and I never got around to it. My goal was to create this guide to globbing then use that to inspire the major concepts that would need to be addressed in the spec. There are many, many nuances and different similar implementations. Not just in terms of matchers (bash glob, wildmat, extglob, grep, gitignore, and many others -- not to mention regex and awk), but also differences in implementations across different languages. There are also many features that make sense when globbing files with JavaScript that do not make sense with bash, and vice versa. If anyone wants to help tackle this, I'd be thrilled to help you get started. In summary, it seems like the main blocker is how options are passed to fast-glob and del. I'm not very familiar with the internals of either, but I'd be happy to answer questions or help in whatever way I can. |
I feel like we are doing 2 separate things in this issue right now.
|
is-glob is doing the correct thing IMO.
…Sent from my iPhone
On Mar 17, 2021, at 7:09 PM, Blaine Bublitz ***@***.***> wrote:
I feel like we are doing 2 separate things in this issue right now.
@jonschlinkert is trying to solve @dlong500's problem with no changes to this library or is-glob
I am trying to figure out if this library and is-glob should actually be treating ? as a glob pattern that matches a single character (it currently rejects ? unless it is part of an extglob).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I still am not sure if |
What were you expecting to happen?
The string
base/folder?/file1.txt
should returnbase
as the non-magic parent pathWhat actually happened?
globParent('base/folder?/file1.txt')
returnsbase/folder?
Please provide the following information:
node -v
): 15.10Additional information
I've been trying to track down the source of my issue where the
del
library wasn't deleting expected folders based on a pattern like the one mentioned above (afterdel
switched to usingfast-glob
). This led me to follow the dependency tree fromdel
tofast-glob
toglob-parent
.Unless I'm completely mistaken about how this library works I would expect a question mark to be treated like other special characters. For example,
globParent('base/folder*/file1.txt')
returnsbase
.The text was updated successfully, but these errors were encountered: