Fix two exponential regex backtracking vulnerabilities #158

andersk · 2019-03-10T00:20:36Z

ESCAPED_CHAR already matches \\, so matching it again in another alternative was just causing an exponential complexity explosion.

Fixes #157.

jgm · 2019-03-10T06:43:03Z

ESCAPED_CHAR already matches \\

Yes, but it doesn't match \ when it is not followed by an ESCAPABLE -- e.g. when it is followed by a letter. I believe that is why the extra alternative was introduced.

jgm · 2019-03-10T06:46:46Z

Of course, instead of using ESCAPED_CHAR here we could use '\\\\' + ESCAPABLE + '?'; I imagine this would fix things.

andersk · 2019-03-10T06:56:59Z

I don’t understand why letters are relevant. In both cases, the alternative that I deleted is simply \\\\, because that’s already matched. There’s no alternative for \ not followed by an ESCAPABLE, neither before nor after my change.

jgm · 2019-03-10T16:42:00Z

var ESCAPABLE = '[!"#$%&\'()*+,./:;<=>?@[\\\\\\]^_`{|}~-]';
var ESCAPED_CHAR = '\\\\' + ESCAPABLE;

Here \\ is string escape for a single \, so the regex in normal notation is

/\\[!"#$%&\'()*+,./:;<=>?@[\\\]^_`{|}~-]/

This will not match on a backslash followed by an alphabetic.

EDIT: So, my original reLinkLabel was

/^\[(?:[^\\\[\]]|\\[!"#$%&\'()*+,./:;<=>?@[\\\]^_`{|}~-]|\\){0,1000}]/

and with your modification it's

/^\[(?:[^\\\[\]]|\\[!"#$%&\'()*+,./:;<=>?@[\\\]^_`{|}~-]){0,1000}]/

I think the original will match a [foo\bar] but your modification won't. Am I missing something?

andersk · 2019-03-10T18:31:56Z

Oh, yeah, I see where the confusion occurred—this is a regex constructed from a string literal, not a regex literal, so four source backslashes matches one backslash in the text, not two.

Your original reLinkLabel is then an inefficient equivalent* of simply

/^\[(?:[^\\\[\]]|\\.){0,1000}\\?]/

except the case of the match ending with \] is filtered out by a separate check, so we can simplify even further and remove the separate check:

/^\[(?:[^\\\[\]]|\\.){0,1000}]/

Edit: This also fixes a bug where [\\\] was incorrectly accepted.

The reLinkDestinationBraces case is similar except in that case a match ending with \> was allowed—but perhaps it shouldn’t have been? Test case:

[text](<url\>)

commonmark.js says this is a link, cmark says it’s not.

How’s this patch, then? For now I’ve preserved the existing behavior.

(* Edit: My regex is less strict about checking the length of backslash + non-ESCAPABLE sequences, but there’s already a separate check for that.)

jgm · 2019-03-10T19:14:38Z

OK, great. I agree that your simplified regex does the same work, although the original has the advantage of corresponding a bit more directly to the spec. (To see that yours works, you need to reason about the sorts of things that can come after \\.) Given the performance issue, it's worth making the change.

As for [text](<url\>), I think cmark is right here, since the spec explicitly says that backslash escapes work in URLs, and that they disable the normal markdown meaning of control characters like >. To be sure, the spec could be more explicit on this, so it might be worth adding an issue to commonmark/commonmark as well.

Do you want to adjust the regex in light of this?

ESCAPED_CHAR already matches `\\`, so matching it again in another alternative was causing exponential complexity explosion. This makes the following behavior changes: * `[foo\\\]` is no longer incorrectly accepted as a link reference. * `<foo\>` is no longer incorrectly accepted as an angle-bracketed link destination. Fixes commonmark#157. Signed-off-by: Anders Kaseorg <[email protected]>

andersk · 2019-03-10T19:18:48Z

Alright, updated.

andersk · 2019-03-10T19:24:37Z

Although there’s still a difference from cmark here. On [text](<url\>), cmark gives

<p>[text](&lt;url&gt;)</p>

while this patch gives

<p><a href="%3Curl%3E">text</a></p>

which is consistent with the second definition of link destination. I think that’s correct?

jgm · 2019-03-11T16:03:47Z

Many thanks!

I think the second (quadratic) case in #157 is still not fixed, though.
I just added a test case for that, and when you run the test suite it slows down on that case.
It should be a separate issue, anyway.

It would also be helpful to have an issue in commonmark/commonmark noting the discrepancy for [text](<url\>) and suggesting that (a) a new test case be added and (b) cmark be fixed.

jgm · 2019-03-11T16:13:57Z

The second case is actually rather difficult.
Consider

[](
[]([]()
[]([]([]()
[]([]([]([]()

The current parsing strategy requires parsing to the end, then backtracking to the next open paren. This leads to nonlinear performance. (This affects also commonmark-hs. Note that cmark does not parse these cases as links, hence the performance issue does not affect it, but a correctness issue does.) Do you want to open a separate issue for this so we can track it?

jgm · 2019-03-11T16:15:09Z

I'd also love to hear ideas about how to parse this efficiently.

andersk · 2019-03-11T17:35:46Z

I think the second case is equivalent to the “unclosed inline links” case of #129. Do you still want me to split that into a separate issue?

andersk · 2019-03-11T17:47:22Z

It would also be helpful to have an issue in commonmark/commonmark noting the discrepancy for [text](<url\>) and suggesting that (a) a new test case be added and (b) cmark be fixed.

Opened commonmark/commonmark-spec#562 for that.

jgm · 2019-03-11T17:52:36Z

Anders Kaseorg <[email protected]> writes:

I think the second case is equivalent to the “unclosed inline links” case of #129. Do you still want me to split that into a separate issue?

So it is! I had forgotten about that issue.

See commonmark/commonmark.js@ce345929 and commonmark/commonmark.js#158

andersk force-pushed the exponential-regex branch 3 times, most recently from 4a94960 to 45e49ae Compare March 10, 2019 18:29

andersk force-pushed the exponential-regex branch from 45e49ae to f29e64c Compare March 10, 2019 19:18

jgm merged commit ce34529 into commonmark:master Mar 11, 2019

andersk mentioned this pull request Mar 11, 2019

Add test for [text](<url\>) commonmark/commonmark-spec#562

Closed

colinodell added a commit to thephpleague/commonmark that referenced this pull request Mar 24, 2019

Fix two exponential backtracking issues

9248319

See commonmark/commonmark.js@ce345929 and commonmark/commonmark.js#158

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix two exponential regex backtracking vulnerabilities #158

Fix two exponential regex backtracking vulnerabilities #158

andersk commented Mar 10, 2019

jgm commented Mar 10, 2019

jgm commented Mar 10, 2019

andersk commented Mar 10, 2019 •

edited

Loading

jgm commented Mar 10, 2019 •

edited

Loading

andersk commented Mar 10, 2019 •

edited

Loading

jgm commented Mar 10, 2019

andersk commented Mar 10, 2019

andersk commented Mar 10, 2019

jgm commented Mar 11, 2019 •

edited

Loading

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

andersk commented Mar 11, 2019

andersk commented Mar 11, 2019

jgm commented Mar 11, 2019 via email

Fix two exponential regex backtracking vulnerabilities #158

Fix two exponential regex backtracking vulnerabilities #158

Conversation

andersk commented Mar 10, 2019

jgm commented Mar 10, 2019

jgm commented Mar 10, 2019

andersk commented Mar 10, 2019 • edited Loading

jgm commented Mar 10, 2019 • edited Loading

andersk commented Mar 10, 2019 • edited Loading

jgm commented Mar 10, 2019

andersk commented Mar 10, 2019

andersk commented Mar 10, 2019

jgm commented Mar 11, 2019 • edited Loading

jgm commented Mar 11, 2019

jgm commented Mar 11, 2019

andersk commented Mar 11, 2019

andersk commented Mar 11, 2019

jgm commented Mar 11, 2019 via email

andersk commented Mar 10, 2019 •

edited

Loading

jgm commented Mar 10, 2019 •

edited

Loading

andersk commented Mar 10, 2019 •

edited

Loading

jgm commented Mar 11, 2019 •

edited

Loading