Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path: add path.glob #47490

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -2162,3 +2162,37 @@ The externally maintained libraries used by Node.js are:
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""

- glob(7), located at src/node_path,cc, is licensed as follows:
"""
Valid-License-Identifier: MIT
SPDX-URL: https://spdx.org/licenses/MIT.html
Usage-Guide:
To use the MIT License put the following SPDX tag/value pair into a
comment according to the placement guidelines in the licensing rules
documentation:
Comment on lines +2170 to +2173
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we following this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are instructions for maintainers of the Linux kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/LICENSES/preferred/MIT?h=v6.3-rc6
I am not really sure how we should embed this since it only really appears here
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/glob.c?h=v6.3-rc6#n10

see also the comment on top of #define MODULE_LICENSE:

/*
 * The following license idents are currently accepted as indicating free
 * software modules
 *
 *	"GPL"				[GNU Public License v2]
 *	"GPL v2"			[GNU Public License v2]
 *	"GPL and additional rights"	[GNU Public License v2 rights and more]
 *	"Dual BSD/GPL"			[GNU Public License v2
 *					 or BSD license choice]
 *	"Dual MIT/GPL"			[GNU Public License v2
 *					 or MIT license choice]
 *	"Dual MPL/GPL"			[GNU Public License v2
 *					 or Mozilla license choice]
 *
 * The following other idents are available
 *
 *	"Proprietary"			[Non free products]
 *
 * Both "GPL v2" and "GPL" (the latter also in dual licensed strings) are
 * merely stating that the module is licensed under the GPL v2, but are not
 * telling whether "GPL v2 only" or "GPL v2 or later". The reason why there
 * are two variants is a historic and failed attempt to convey more
 * information in the MODULE_LICENSE string. For module loading the
 * "only/or later" distinction is completely irrelevant and does neither
 * replace the proper license identifiers in the corresponding source file
 * nor amends them in any way. The sole purpose is to make the
 * 'Proprietary' flagging work and to refuse to bind symbols which are
 * exported with EXPORT_SYMBOL_GPL when a non free module is loaded.
 *
 * In the same way "BSD" is not a clear license information. It merely
 * states, that the module is licensed under one of the compatible BSD
 * license variants. The detailed and correct license information is again
 * to be found in the corresponding source files.
 *
 * There are dual licensed components, but when running with Linux it is the
 * GPL that is relevant so this is a non issue. Similarly LGPL linked with GPL
 * is a GPL combined work.
 *
 * This exists for several reasons
 * 1.	So modinfo can show license info for users wanting to vet their setup
 *	is free
 * 2.	So the community can ignore bug reports including proprietary modules
 * 3.	So vendors can do likewise based on their own policies
 */

Copy link
Member Author

@MoLow MoLow Apr 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did what I best understood from what I found in the Linux kernel and in other license examples so if anyone can help out with what really should be used as the license - I'd appreciate

SPDX-License-Identifier: MIT
License-Text:

MIT License

Copyright (c) <year> <copyright holders>
MoLow marked this conversation as resolved.
Show resolved Hide resolved

Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
"""
10 changes: 10 additions & 0 deletions lib/path.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ const {
StringPrototypeToLowerCase,
} = primordials;

const { glob: _glob } = internalBinding('path');

const {
CHAR_UPPERCASE_A,
CHAR_LOWERCASE_A,
Expand Down Expand Up @@ -153,6 +155,12 @@ function _format(sep, pathObject) {
return dir === pathObject.root ? `${dir}${base}` : `${dir}${sep}${base}`;
}

function glob(pattern, name) {
validateString(pattern, 'pattern');
validateString(name, 'name');
return _glob(pattern, name);
}

const win32 = {
/**
* path.resolve([from ...], to)
Expand Down Expand Up @@ -1064,6 +1072,7 @@ const win32 = {

return ret;
},
glob,

sep: '\\',
delimiter: ';',
Expand Down Expand Up @@ -1530,6 +1539,7 @@ const posix = {

return ret;
},
glob,

sep: '/',
delimiter: ':',
Expand Down
1 change: 1 addition & 0 deletions node.gyp
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@
'src/node_metadata.cc',
'src/node_options.cc',
'src/node_os.cc',
'src/node_path.cc',
'src/node_perf.cc',
'src/node_platform.cc',
'src/node_postmortem_metadata.cc',
Expand Down
1 change: 1 addition & 0 deletions src/node_binding.cc
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
V(mksnapshot) \
V(options) \
V(os) \
V(path) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion, but maybe we should name the binding and file glob instead of path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjihrig I'm planning on moving a couple of functions to C++ (actually done the same exact change), so this change is beneficial for me too.

V(performance) \
V(permission) \
V(pipe_wrap) \
Expand Down
7 changes: 7 additions & 0 deletions src/node_external_reference.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ namespace node {
using CFunctionCallbackWithOneByteString =
uint32_t (*)(v8::Local<v8::Value>, const v8::FastOneByteString&);
using CFunctionCallback = void (*)(v8::Local<v8::Value> receiver);
using CFunctionCallbackWithTwoOneByteStringsReturningBool =
bool (*)(v8::Local<v8::Value>,
const v8::FastOneByteString&,
const v8::FastOneByteString&);
using CFunctionCallback = void (*)(v8::Local<v8::Value> receiver);
using CFunctionCallbackReturnDouble =
double (*)(v8::Local<v8::Object> receiver);
using CFunctionCallbackWithInt64 = void (*)(v8::Local<v8::Object> receiver,
Expand All @@ -29,6 +34,7 @@ class ExternalReferenceRegistry {
#define ALLOWED_EXTERNAL_REFERENCE_TYPES(V) \
V(CFunctionCallback) \
V(CFunctionCallbackWithOneByteString) \
V(CFunctionCallbackWithTwoOneByteStringsReturningBool) \
V(CFunctionCallbackReturnDouble) \
V(CFunctionCallbackWithInt64) \
V(CFunctionCallbackWithBool) \
Expand Down Expand Up @@ -90,6 +96,7 @@ class ExternalReferenceRegistry {
V(module_wrap) \
V(options) \
V(os) \
V(path) \
V(performance) \
V(permission) \
V(process_methods) \
Expand Down
131 changes: 131 additions & 0 deletions src/node_path.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
#include "env-inl.h"
#include "node_errors.h"
#include "node_external_reference.h"
#include "util-inl.h"
#include "v8-fast-api-calls.h"

namespace node {

namespace path {
using v8::Context;
using v8::FunctionCallbackInfo;
using v8::Local;
using v8::Object;
using v8::Value;

// extracted from
MoLow marked this conversation as resolved.
Show resolved Hide resolved
// https://github.com/torvalds/linux/blob/cdc9718d5e590d6905361800b938b93f2b66818e/lib/glob.c
bool glob(char const* pat, char const* str) {
MoLow marked this conversation as resolved.
Show resolved Hide resolved
/*
* Backtrack to previous * on mismatch and retry starting one
* character later in the string. Because * matches all characters
* (no exception for /), it can be easily proved that there's
* never a need to backtrack multiple levels.
*/
char const* back_pat = nullptr;
char const* back_str = nullptr;

/*
* Loop over each token (character or class) in pat, matching
* it against the remaining unmatched tail of str. Return false
* on mismatch, or true after matching the trailing nul bytes.
*/
for (;;) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There is no need to iterate each character. input.find_first_of("?*[") until the input is finished might produce better results if performance is the priority. If not, we can keep it as it is, and later improve it. Your call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand these comments - is there really motivation in diverging from the implementation lifted from glob rather than benefit from its updates?

If we have improvement suggestions wouldn't it be better to upstream them to linux glob?

(genuinely asking, not sure it's. just not really intuitive)

unsigned char c = *str++;
unsigned char d = *pat++;

switch (d) {
case '?': /* Wildcard: anything but nul */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This switch case can be removed and be simplified for better performance.

if (c == '\0') return false;
break;
case '*': /* Any-length wildcard */
if (*pat == '\0') /* Optimize trailing * case */
return true;
back_pat = pat;
back_str = --str; /* Allow zero-length match */
break;
case '[': { /* Character class */
bool match = false, inverted = (*pat == '!');
char const* cls = pat + inverted;
unsigned char a = *cls++;

/*
* Iterate over each span in the character class.
* A span is either a single character a, or a
* range a-b. The first span may begin with ']'.
*/
do {
unsigned char b = a;

if (a == '\0') /* Malformed */
goto literal;

if (cls[0] == '-' && cls[1] != ']') {
b = cls[1];

if (b == '\0') goto literal;

cls += 2;
/* Any special action if a > b? */
}
match |= (a <= c && c <= b);
} while ((a = *cls++) != ']');

if (match == inverted) goto backtrack;
pat = cls;
} break;
case '\\':
d = *pat++;
[[fallthrough]];
default: /* Literal character */
literal:
if (c == d) {
if (d == '\0') return true;
break;
}
backtrack:
if (c == '\0' || !back_pat) return false; /* No point continuing */
/* Try again from last *, one character later in str. */
pat = back_pat;
str = ++back_str;
break;
}
}
}

void SlowGlob(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);
CHECK_GE(args.Length(), 2);
CHECK(args[0]->IsString());
CHECK(args[1]->IsString());

std::string pattern = Utf8Value(env->isolate(), args[0]).ToString();
anonrig marked this conversation as resolved.
Show resolved Hide resolved
std::string str = Utf8Value(env->isolate(), args[1]).ToString();
args.GetReturnValue().Set(glob(pattern.c_str(), str.c_str()));
}
bool FastGlob(Local<Value> receiver,
const v8::FastOneByteString& pattern,
const v8::FastOneByteString& str) {
return glob(pattern.data, str.data);
}

v8::CFunction fast_glob_(v8::CFunction::Make(FastGlob));

void Initialize(Local<Object> target,
Local<Value> unused,
Local<Context> context,
void* priv) {
SetFastMethod(context, target, "glob", SlowGlob, &fast_glob_);
}

void RegisterExternalReferences(ExternalReferenceRegistry* registry) {
registry->Register(SlowGlob);
registry->Register(FastGlob);
registry->Register(fast_glob_.GetTypeInfo());
}
} // namespace path

} // namespace node

NODE_BINDING_CONTEXT_AWARE_INTERNAL(path, node::path::Initialize)
NODE_BINDING_EXTERNAL_REFERENCE(path, node::path::RegisterExternalReferences)
1 change: 1 addition & 0 deletions test/parallel/test-bootstrap-modules.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ const expectedModules = new Set([
'NativeModule internal/process/task_queues',
'NativeModule timers',
'Internal Binding trace_events',
'Internal Binding path',
'NativeModule internal/constants',
'NativeModule path',
'NativeModule internal/process/execution',
Expand Down
96 changes: 96 additions & 0 deletions test/parallel/test-path-glob.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import '../common/index.mjs';
import { describe, it } from 'node:test';
import * as assert from 'node:assert';
import * as path from 'node:path';


// https://github.com/torvalds/linux/blob/cdc9718d5e590d6905361800b938b93f2b66818e/lib/globtest.c
const patterns = [
{ expected: true, pattern: 'a', name: 'a' },
{ expected: false, pattern: 'a', name: 'b' },
{ expected: false, pattern: 'a', name: 'aa' },
{ expected: false, pattern: 'a', name: '' },
{ expected: true, pattern: '', name: '' },
{ expected: false, pattern: '', name: 'a' },
/* Simple character class tests */
{ expected: true, pattern: '[a]', name: 'a' },
{ expected: false, pattern: '[a]', name: 'b' },
{ expected: false, pattern: '[!a]', name: 'a' },
{ expected: true, pattern: '[!a]', name: 'b' },
{ expected: true, pattern: '[ab]', name: 'a' },
{ expected: true, pattern: '[ab]', name: 'b' },
{ expected: false, pattern: '[ab]', name: 'c' },
{ expected: true, pattern: '[!ab]', name: 'c' },
{ expected: true, pattern: '[a-c]', name: 'b' },
{ expected: false, pattern: '[a-c]', name: 'd' },
/* Corner cases in character class parsing */
{ expected: true, pattern: '[a-c-e-g]', name: '-' },
{ expected: false, pattern: '[a-c-e-g]', name: 'd' },
{ expected: true, pattern: '[a-c-e-g]', name: 'f' },
{ expected: true, pattern: '[]a-ceg-ik[]', name: 'a' },
{ expected: true, pattern: '[]a-ceg-ik[]', name: ']' },
{ expected: true, pattern: '[]a-ceg-ik[]', name: '[' },
{ expected: true, pattern: '[]a-ceg-ik[]', name: 'h' },
{ expected: false, pattern: '[]a-ceg-ik[]', name: 'f' },
{ expected: false, pattern: '[!]a-ceg-ik[]', name: 'h' },
{ expected: false, pattern: '[!]a-ceg-ik[]', name: ']' },
{ expected: true, pattern: '[!]a-ceg-ik[]', name: 'f' },
/* Simple wild cards */
{ expected: true, pattern: '?', name: 'a' },
{ expected: false, pattern: '?', name: 'aa' },
{ expected: false, pattern: '??', name: 'a' },
{ expected: true, pattern: '?x?', name: 'axb' },
{ expected: false, pattern: '?x?', name: 'abx' },
{ expected: false, pattern: '?x?', name: 'xab' },
/* Asterisk wild cards (backtracking) */
{ expected: false, pattern: '*??', name: 'a' },
{ expected: true, pattern: '*??', name: 'ab' },
{ expected: true, pattern: '*??', name: 'abc' },
{ expected: true, pattern: '*??', name: 'abcd' },
{ expected: false, pattern: '??*', name: 'a' },
{ expected: true, pattern: '??*', name: 'ab' },
{ expected: true, pattern: '??*', name: 'abc' },
{ expected: true, pattern: '??*', name: 'abcd' },
{ expected: false, pattern: '?*?', name: 'a' },
{ expected: true, pattern: '?*?', name: 'ab' },
{ expected: true, pattern: '?*?', name: 'abc' },
{ expected: true, pattern: '?*?', name: 'abcd' },
{ expected: true, pattern: '*b', name: 'b' },
{ expected: true, pattern: '*b', name: 'ab' },
{ expected: false, pattern: '*b', name: 'ba' },
{ expected: true, pattern: '*b', name: 'bb' },
{ expected: true, pattern: '*b', name: 'abb' },
{ expected: true, pattern: '*b', name: 'bab' },
{ expected: true, pattern: '*bc', name: 'abbc' },
{ expected: true, pattern: '*bc', name: 'bc' },
{ expected: true, pattern: '*bc', name: 'bbc' },
{ expected: true, pattern: '*bc', name: 'bcbc' },
/* Multiple asterisks (complex backtracking) */
{ expected: true, pattern: '*ac*', name: 'abacadaeafag' },
{ expected: true, pattern: '*ac*ae*ag*', name: 'abacadaeafag' },
{ expected: true, pattern: '*a*b*[bc]*[ef]*g*', name: 'abacadaeafag' },
{ expected: false, pattern: '*a*b*[ef]*[cd]*g*', name: 'abacadaeafag' },
{ expected: true, pattern: '*abcd*', name: 'abcabcabcabcdefg' },
{ expected: true, pattern: '*ab*cd*', name: 'abcabcabcabcdefg' },
{ expected: true, pattern: '*abcd*abcdef*', name: 'abcabcdabcdeabcdefg' },
{ expected: false, pattern: '*abcd*', name: 'abcabcabcabcefg' },
{ expected: false, pattern: '*ab*cd*', name: 'abcabcabcabcefg' },
];

const invalid = [null, undefined, 1, Number.MAX_SAFE_INTEGER, true, false, Symbol(), {}, [], () => {}];

describe('path.glob', () => {
for (const { expected, pattern, name } of patterns) {
it(`pattern "${pattern}" should ${expected ? '' : 'not '}match "${name}"`, () => {
assert.strictEqual(path.glob(pattern, name), expected);
});
}

for (const x of invalid) {
MoLow marked this conversation as resolved.
Show resolved Hide resolved
const name = typeof x === 'symbol' ? 'Symnol()' : x;
MoLow marked this conversation as resolved.
Show resolved Hide resolved
it(`${name} should throw as a parameter`, () => {
assert.throws(() => path.glob(x, ''), { code: 'ERR_INVALID_ARG_TYPE' });
assert.throws(() => path.glob('', x), { code: 'ERR_INVALID_ARG_TYPE' });
});
}
});
3 changes: 3 additions & 0 deletions tools/license-builder.sh
Original file line number Diff line number Diff line change
Expand Up @@ -146,4 +146,7 @@ addlicense "node-fs-extra" "lib/internal/fs/cp" "$licenseText"

addlicense "base64" "deps/base64/base64/" "$(cat "${rootdir}/deps/base64/base64/LICENSE" || true)"

licenseText="$(curl -sL https://raw.githubusercontent.com/torvalds/linux/09a9639e56c01c7a00d6c0ca63f4c7c41abe075d/LICENSES/preferred/MIT)"
addlicense "glob(7)" "src/node_path,cc" "$licenseText"

mv "$tmplicense" "$licensefile"