-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Unicode property escapes (and /u
flag)
#116
Comments
Hi! I definitely appreciate why the Unicode flag is useful 😊 Moo builds a single RegExp which combines all of the tokens, so the flags effectively have to be the same for all of your tokens. Out of interest, since Babel already supports compiling RegExps with the unicode flag, if you use Babel with Moo I imagine it would "just work"... care to try? :) |
It depends on what environment you're targeting with Babel. If you're targeting es5, it works, since it generates a RegExp without the However if you're targeting environments that support the And, of course, it doesn't work without Babel. In my branch, I apply the If you think this is worth discussing further, I can submit my branch as a PR, and we can continue the discussion there. |
I think it makes more sense to enable the moo.compile({
id: /[$_\p{ID_Start}][$\p{ID_Continue}]*/u,
plus: '+',
ws: /\p{WSpace}+/u,
}) and this would work: moo.compile({
id: /[$_a-zA-Z][$\w]*/,
plus: '+',
ws: /\s+/,
}) But this would not: moo.compile({
id: /[$_\p{ID_Start}][$\p{ID_Continue}]*/u,
plus: '+',
mostOfBmp: /./,
}) That seems like a good thing, because adding the Importantly, a string converted to a regular expression does not change its meaning when the |
I agree with Nathan, I was going to suggest the same thing. If you'd like to PR this that would be great :) Sent with GitHawk |
ES2018 added support for unicode property escapes. This allows you to match complex unicode ranges (e.g. chars valid in identifiers) much more compactly than with explicit unicode ranges. For example, this regex matches all valid JS identifiers:
Compare with the regex used by acorn:
https://github.com/acornjs/acorn/blob/2ffed00236071aece0a79813b98c36f302ff1f9d/acorn/src/identifier.js#L22-L31
However, this requires the
/u
flag, which is currently forbidden:moo/moo.js
Lines 44 to 49 in 13e1157
I presume the
/u
flag was disabled because it added complexity to the implementation but (previously) had no significant advantages; however, I believe that these new property escapes would make proper unicode support in grammars built with moo dramatically simpler.It has pretty good support in current browsers and with Babel. I have no idea what the performance implications of the
/u
flag are, but I would expect that support could be implemented as purely opt-in.The text was updated successfully, but these errors were encountered: