Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maint: Restrict empty-matching regexes #1548

Merged
merged 5 commits into from
Oct 13, 2020

Conversation

jneen
Copy link
Member

@jneen jneen commented Jun 19, 2020

It is very common for a lexer developer to accidentally include a regex that matches the empty string, which can cause a number of issues. This patch restricts the use of empty-matching regexes in two ways:

  • A regex that matches the empty string must have a predicate - either pushing a new state, :pop!, or a custom block. Otherwise it will infinite-loop.

  • Any empty-matching regex that does not contain lookahead/lookbehind causes the state to be "closed" - no more rules can be added. This is because these regexes must serve as "default" rules, and it's guaranteed that the processing won't continue past them.

http://jneen.net/ added 3 commits June 19, 2020 12:41
* an empty-matching regex may not be used without a predicate
  (such as :pop!, :push, or a block)

* an empty-matching regex with no lookahead/lookbehind or
  anchoring "closes" the state - it is invalid to add more rules, as
  they would never be reached.
lib/rouge/regex_lexer.rb Outdated Show resolved Hide resolved
lib/rouge/regex_lexer.rb Outdated Show resolved Hide resolved
lib/rouge/regex_lexer.rb Outdated Show resolved Hide resolved
@pyrmont pyrmont added the author-action The PR has been reviewed but action by the author is needed label Jun 19, 2020
@pyrmont pyrmont added needs-review The PR needs to be reviewed and removed author-action The PR has been reviewed but action by the author is needed labels Jun 19, 2020
Copy link
Contributor

@pyrmont pyrmont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what prompted the fix? Did you come across the infinite loop bug in your own work or did you just notice it was possible?

Otherwise looks good to me. Want me to merge it?

@pyrmont pyrmont removed the needs-review The PR needs to be reviewed label Jun 19, 2020
@pyrmont pyrmont self-assigned this Jun 19, 2020
@jneen
Copy link
Member Author

jneen commented Sep 26, 2020

I don't quite remember what prompted this, but I think I had seen it come up somewhere. It's a really difficult bug to track down if you're not super familiar with the internals of Rouge.

@jneen
Copy link
Member Author

jneen commented Sep 26, 2020

But yes, I think this is ready to merge.

@pyrmont pyrmont merged commit 6fdc78c into master Oct 13, 2020
pyrmont added a commit that referenced this pull request Oct 13, 2020
The merging of #1548 caused issues with two lexers that had been added after the
PR was originally submitted. This commit fixes those errors.

It also reduces the number of calls to `String#upcase` and `String#downcase` in
the Apex lexer (this lexer was updated in #1548).
pyrmont added a commit that referenced this pull request Nov 10, 2020
The maintenance changes in #1548 resulted in updates to two rules in
the Ruby lexer, one regarding regular expression flags and the other
regarding heredoc strings. However, these patterns should not have been
changed. The change in #1548 was to prevent rules that matched an empty
string unless they contained a 'predicate' of some sort. Both rules in
the Ruby lexer contained such a predicate and so should not have been
changed. This commit fixes that error.
pyrmont added a commit that referenced this pull request Nov 10, 2020
The changes made in #1548 to avoid empty regular expression patterns,
broke the template string rules in the JavaScript lexer (and lexers
such as the TypeScript lexer that inherit from it). This commit adds the
rules necessary to fix this lexing.
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 19, 2021
This commit makes several changes to the way ‘empty’ regular expressions work in
Rouge.

1. The use of ‘empty’ regular expressions is restricted. An empty-matching regex
   may not be used without a predicate (such as `:pop!`, `:push`, or a block).

2. The use of an empty-matching regular expression with no lookahead/lookbehind
   or anchoring "closes" the state. It is invalid to add more rules, as they
   would never be reached.

3. Lexers that were using improper empty-matching regular expressions have been
   fixed.
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 19, 2021
The merging of rouge-ruby#1548 caused issues with two lexers that had been added after the
PR was originally submitted. This commit fixes those errors.

It also reduces the number of calls to `String#upcase` and `String#downcase` in
the Apex lexer (this lexer was updated in rouge-ruby#1548).
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 19, 2021
The maintenance changes in rouge-ruby#1548 resulted in updates to two rules in
the Ruby lexer, one regarding regular expression flags and the other
regarding heredoc strings. However, these patterns should not have been
changed. The change in rouge-ruby#1548 was to prevent rules that matched an empty
string unless they contained a 'predicate' of some sort. Both rules in
the Ruby lexer contained such a predicate and so should not have been
changed. This commit fixes that error.
mattt pushed a commit to NSHipster/rouge that referenced this pull request May 19, 2021
The changes made in rouge-ruby#1548 to avoid empty regular expression patterns,
broke the template string rules in the JavaScript lexer (and lexers
such as the TypeScript lexer that inherit from it). This commit adds the
rules necessary to fix this lexing.
@tancnle tancnle deleted the maint.restrict-empty-matching-regexes branch September 22, 2021 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants