Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pattern Builder #1476

Closed
RunDevelopment opened this issue Jul 12, 2018 · 1 comment
Closed

Pattern Builder #1476

RunDevelopment opened this issue Jul 12, 2018 · 1 comment

Comments

@RunDevelopment
Copy link
Member

RunDevelopment commented Jul 12, 2018

When working with language definitions, you sometimes have parts which are used across multiple patterns (e.g. names). This redundancy makes it harder to change languages and blows up the file size.

My proposal is to add a function to build patterns.

Behavior

The function build(basePattern, replacements) will take a regular expression basePattern and an object containing regular expressions (or strings) replacements.

The source of the base patterns will contain placeholders where replacements will be inserted. The placeholder will hold the key to which replacement will be used.

This is a simple text replacement, so surprises may come.
To minimize this risk, there are some restrictions and each of the replacements will be wrapped into a non-capturing group.

The source of the base pattern with all placeholders replaced plus the flags of the base pattern will form the pattern to be returned.

The flags of replacements will be ignored.

Restrictions

To minimize the risk of text replacement in the source code, there will be a bunch of restrictions to both the base pattern and the replacements:

Placeholder positioning:

  1. Cannot be inside a char set.
  2. Cannot be after an unescaped backslash

Replacements:

  1. No capturing groups. They could mess with backreferences in the base expression.
  2. No backreferences.

Examples

I will use the placeholder <<\w+>> where \w+ will be used as the key to get a replacement.

(If you have a better placeholder, please tell me. I only choose this one because it isn't regex syntax and can be easily spotted. But I don't really like it...)

build(/a<<b>>?/i, {b: /b+/i}) == /a(?:b+)?/i
build(/a<<0>>?/i, [/b+/.source]) == /a(?:b+)?/i
build(/<<0>>/m, [/^\w+/]) == /(?:^\w+)/ // warning: the meaning of ^ might have changed
build(/(a)<<0>>\1/, [/(b)/]) // error: replacement contains capturing group
build(/a<<1>>/, [/b/]) // error: replacements["1"] undefined

Why regular expressions all the way?

It will be cheaper to use strings as base patterns and replacements. That's true.
But strings do not provide two things that regular expressions do.

  1. They are easier to write.
    Inline regexes are more convenient than strings and are supported by IDEs with features like syntax highlighting.
  2. They will catch error early on.
    Each of the patterns will be compiled by the browser (or node), so they are guaranteed to have correct syntax.

Of course, these things only really matter to developers but not to the end user who's computer will have to deal with additional overhead.

Minimizing overhead

To minimize the overhead created by using and checking a bunch of regular expressions, we can do multiple things:

  1. Replace regexes with strings using gulp.
    How does gulp know what patterns are replacements?
    Well, just write e.g. /pattern/.source and gulp will then convert it. Of course, it will only do that for the minified version. (gulp: Inline regex source #1537)
  2. Do restriction checks as tests.
    As long as patterns are created and used in a deterministic fashion, it's ok to check all of them once using npm test.
This was referenced Aug 22, 2018
@RunDevelopment
Copy link
Member Author

I'm closing this now because you can have a much nicer syntax with ES6 tagged template literals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants