Pattern Builder #1476

RunDevelopment · 2018-07-12T22:26:00Z

When working with language definitions, you sometimes have parts which are used across multiple patterns (e.g. names). This redundancy makes it harder to change languages and blows up the file size.

My proposal is to add a function to build patterns.

Behavior

The function build(basePattern, replacements) will take a regular expression basePattern and an object containing regular expressions (or strings) replacements.

The source of the base patterns will contain placeholders where replacements will be inserted. The placeholder will hold the key to which replacement will be used.

This is a simple text replacement, so surprises may come.
To minimize this risk, there are some restrictions and each of the replacements will be wrapped into a non-capturing group.

The source of the base pattern with all placeholders replaced plus the flags of the base pattern will form the pattern to be returned.

The flags of replacements will be ignored.

Restrictions

To minimize the risk of text replacement in the source code, there will be a bunch of restrictions to both the base pattern and the replacements:

Placeholder positioning:

Cannot be inside a char set.
Cannot be after an unescaped backslash

Replacements:

No capturing groups. They could mess with backreferences in the base expression.
No backreferences.

Examples

I will use the placeholder <<\w+>> where \w+ will be used as the key to get a replacement.

(If you have a better placeholder, please tell me. I only choose this one because it isn't regex syntax and can be easily spotted. But I don't really like it...)

build(/a<<b>>?/i, {b: /b+/i}) == /a(?:b+)?/i
build(/a<<0>>?/i, [/b+/.source]) == /a(?:b+)?/i
build(/<<0>>/m, [/^\w+/]) == /(?:^\w+)/ // warning: the meaning of ^ might have changed
build(/(a)<<0>>\1/, [/(b)/]) // error: replacement contains capturing group
build(/a<<1>>/, [/b/]) // error: replacements["1"] undefined

Why regular expressions all the way?

It will be cheaper to use strings as base patterns and replacements. That's true.
But strings do not provide two things that regular expressions do.

They are easier to write.
Inline regexes are more convenient than strings and are supported by IDEs with features like syntax highlighting.
They will catch error early on.
Each of the patterns will be compiled by the browser (or node), so they are guaranteed to have correct syntax.

Of course, these things only really matter to developers but not to the end user who's computer will have to deal with additional overhead.

Minimizing overhead

To minimize the overhead created by using and checking a bunch of regular expressions, we can do multiple things:

Replace regexes with strings using gulp.
How does gulp know what patterns are replacements?
Well, just write e.g. /pattern/.source and gulp will then convert it. Of course, it will only do that for the minified version. (gulp: Inline regex source #1537)
Do restriction checks as tests.
As long as patterns are created and used in a deterministic fashion, it's ok to check all of them once using npm test.

The text was updated successfully, but these errors were encountered:

RunDevelopment · 2019-03-13T22:53:05Z

I'm closing this now because you can have a much nicer syntax with ES6 tagged template literals.

mAAdhaTTah added the enhancement label Jul 20, 2018

This was referenced Aug 22, 2018

Pattern builder #1538

Closed

Resources #1539

Open

RunDevelopment closed this as completed Mar 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pattern Builder #1476

Pattern Builder #1476

RunDevelopment commented Jul 12, 2018 •

edited

Loading

RunDevelopment commented Mar 13, 2019

Pattern Builder #1476

Pattern Builder #1476

Comments

RunDevelopment commented Jul 12, 2018 • edited Loading

Behavior

Restrictions

Examples

Why regular expressions all the way?

Minimizing overhead

RunDevelopment commented Mar 13, 2019

RunDevelopment commented Jul 12, 2018 •

edited

Loading