Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two exponential regex backtracking vulnerabilities #158

Merged
merged 1 commit into from
Mar 11, 2019

Conversation

andersk
Copy link
Contributor

@andersk andersk commented Mar 10, 2019

ESCAPED_CHAR already matches \\, so matching it again in another alternative was just causing an exponential complexity explosion.

Fixes #157.

@jgm
Copy link
Member

jgm commented Mar 10, 2019

ESCAPED_CHAR already matches \\

Yes, but it doesn't match \ when it is not followed by an ESCAPABLE -- e.g. when it is followed by a letter. I believe that is why the extra alternative was introduced.

@jgm
Copy link
Member

jgm commented Mar 10, 2019

Of course, instead of using ESCAPED_CHAR here we could use '\\\\' + ESCAPABLE + '?'; I imagine this would fix things.

@andersk
Copy link
Contributor Author

andersk commented Mar 10, 2019

I don’t understand why letters are relevant. In both cases, the alternative that I deleted is simply \\\\, because that’s already matched. There’s no alternative for \ not followed by an ESCAPABLE, neither before nor after my change.

@jgm
Copy link
Member

jgm commented Mar 10, 2019

var ESCAPABLE = '[!"#$%&\'()*+,./:;<=>?@[\\\\\\]^_`{|}~-]';
var ESCAPED_CHAR = '\\\\' + ESCAPABLE;

Here \\ is string escape for a single \, so the regex in normal notation is

/\\[!"#$%&\'()*+,./:;<=>?@[\\\]^_`{|}~-]/

This will not match on a backslash followed by an alphabetic.

EDIT: So, my original reLinkLabel was

/^\[(?:[^\\\[\]]|\\[!"#$%&\'()*+,./:;<=>?@[\\\]^_`{|}~-]|\\){0,1000}]/

and with your modification it's

/^\[(?:[^\\\[\]]|\\[!"#$%&\'()*+,./:;<=>?@[\\\]^_`{|}~-]){0,1000}]/

I think the original will match a [foo\bar] but your modification won't. Am I missing something?

@andersk andersk force-pushed the exponential-regex branch 3 times, most recently from 4a94960 to 45e49ae Compare March 10, 2019 18:29
@andersk
Copy link
Contributor Author

andersk commented Mar 10, 2019

Oh, yeah, I see where the confusion occurred—this is a regex constructed from a string literal, not a regex literal, so four source backslashes matches one backslash in the text, not two.

Your original reLinkLabel is then an inefficient equivalent* of simply

/^\[(?:[^\\\[\]]|\\.){0,1000}\\?]/

except the case of the match ending with \] is filtered out by a separate check, so we can simplify even further and remove the separate check:

/^\[(?:[^\\\[\]]|\\.){0,1000}]/

Edit: This also fixes a bug where [\\\] was incorrectly accepted.

The reLinkDestinationBraces case is similar except in that case a match ending with \> was allowed—but perhaps it shouldn’t have been? Test case:

[text](<url\>)

commonmark.js says this is a link, cmark says it’s not.

How’s this patch, then? For now I’ve preserved the existing behavior.

(* Edit: My regex is less strict about checking the length of backslash + non-ESCAPABLE sequences, but there’s already a separate check for that.)

@jgm
Copy link
Member

jgm commented Mar 10, 2019

OK, great. I agree that your simplified regex does the same work, although the original has the advantage of corresponding a bit more directly to the spec. (To see that yours works, you need to reason about the sorts of things that can come after \\.) Given the performance issue, it's worth making the change.

As for [text](<url\>), I think cmark is right here, since the spec explicitly says that backslash escapes work in URLs, and that they disable the normal markdown meaning of control characters like >. To be sure, the spec could be more explicit on this, so it might be worth adding an issue to commonmark/commonmark as well.

Do you want to adjust the regex in light of this?

ESCAPED_CHAR already matches `\\`, so matching it again in another
alternative was causing exponential complexity explosion.

This makes the following behavior changes:

* `[foo\\\]` is no longer incorrectly accepted as a link reference.
* `<foo\>` is no longer incorrectly accepted as an angle-bracketed
  link destination.

Fixes commonmark#157.

Signed-off-by: Anders Kaseorg <[email protected]>
@andersk andersk force-pushed the exponential-regex branch from 45e49ae to f29e64c Compare March 10, 2019 19:18
@andersk
Copy link
Contributor Author

andersk commented Mar 10, 2019

Alright, updated.

@andersk
Copy link
Contributor Author

andersk commented Mar 10, 2019

Although there’s still a difference from cmark here. On [text](<url\>), cmark gives

<p>[text](&lt;url&gt;)</p>

while this patch gives

<p><a href="%3Curl%3E">text</a></p>

which is consistent with the second definition of link destination. I think that’s correct?

@jgm jgm merged commit ce34529 into commonmark:master Mar 11, 2019
@jgm
Copy link
Member

jgm commented Mar 11, 2019

Many thanks!

I think the second (quadratic) case in #157 is still not fixed, though.
I just added a test case for that, and when you run the test suite it slows down on that case.
It should be a separate issue, anyway.

It would also be helpful to have an issue in commonmark/commonmark noting the discrepancy for [text](<url\>) and suggesting that (a) a new test case be added and (b) cmark be fixed.

@jgm
Copy link
Member

jgm commented Mar 11, 2019

The second case is actually rather difficult.
Consider

[](
[]([]()
[]([]([]()
[]([]([]([]()

The current parsing strategy requires parsing to the end, then backtracking to the next open paren. This leads to nonlinear performance. (This affects also commonmark-hs. Note that cmark does not parse these cases as links, hence the performance issue does not affect it, but a correctness issue does.) Do you want to open a separate issue for this so we can track it?

@jgm
Copy link
Member

jgm commented Mar 11, 2019

I'd also love to hear ideas about how to parse this efficiently.

@andersk
Copy link
Contributor Author

andersk commented Mar 11, 2019

I think the second case is equivalent to the “unclosed inline links” case of #129. Do you still want me to split that into a separate issue?

@andersk
Copy link
Contributor Author

andersk commented Mar 11, 2019

It would also be helpful to have an issue in commonmark/commonmark noting the discrepancy for [text](<url\>) and suggesting that (a) a new test case be added and (b) cmark be fixed.

Opened commonmark/commonmark-spec#562 for that.

@jgm
Copy link
Member

jgm commented Mar 11, 2019 via email

colinodell added a commit to thephpleague/commonmark that referenced this pull request Mar 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants