-
-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix two exponential regex backtracking vulnerabilities #158
Conversation
Yes, but it doesn't match |
Of course, instead of using ESCAPED_CHAR here we could use |
I don’t understand why letters are relevant. In both cases, the alternative that I deleted is simply |
Here
This will not match on a backslash followed by an alphabetic. EDIT: So, my original
and with your modification it's
I think the original will match a |
4a94960
to
45e49ae
Compare
Oh, yeah, I see where the confusion occurred—this is a regex constructed from a string literal, not a regex literal, so four source backslashes matches one backslash in the text, not two. Your original /^\[(?:[^\\\[\]]|\\.){0,1000}\\?]/ except the case of the match ending with /^\[(?:[^\\\[\]]|\\.){0,1000}]/ Edit: This also fixes a bug where The [text](<url\>) commonmark.js says this is a link, cmark says it’s not. How’s this patch, then? For now I’ve preserved the existing behavior. (* Edit: My regex is less strict about checking the length of backslash + non-ESCAPABLE sequences, but there’s already a separate check for that.) |
OK, great. I agree that your simplified regex does the same work, although the original has the advantage of corresponding a bit more directly to the spec. (To see that yours works, you need to reason about the sorts of things that can come after As for Do you want to adjust the regex in light of this? |
ESCAPED_CHAR already matches `\\`, so matching it again in another alternative was causing exponential complexity explosion. This makes the following behavior changes: * `[foo\\\]` is no longer incorrectly accepted as a link reference. * `<foo\>` is no longer incorrectly accepted as an angle-bracketed link destination. Fixes commonmark#157. Signed-off-by: Anders Kaseorg <[email protected]>
45e49ae
to
f29e64c
Compare
Alright, updated. |
Although there’s still a difference from cmark here. On <p>[text](<url>)</p> while this patch gives <p><a href="%3Curl%3E">text</a></p> which is consistent with the second definition of link destination. I think that’s correct? |
Many thanks! I think the second (quadratic) case in #157 is still not fixed, though. It would also be helpful to have an issue in commonmark/commonmark noting the discrepancy for |
The second case is actually rather difficult.
The current parsing strategy requires parsing to the end, then backtracking to the next open paren. This leads to nonlinear performance. (This affects also commonmark-hs. Note that cmark does not parse these cases as links, hence the performance issue does not affect it, but a correctness issue does.) Do you want to open a separate issue for this so we can track it? |
I'd also love to hear ideas about how to parse this efficiently. |
I think the second case is equivalent to the “unclosed inline links” case of #129. Do you still want me to split that into a separate issue? |
Opened commonmark/commonmark-spec#562 for that. |
Anders Kaseorg <[email protected]> writes:
I think the second case is equivalent to the “unclosed inline links” case of #129. Do you still want me to split that into a separate issue?
So it is! I had forgotten about that issue.
|
ESCAPED_CHAR
already matches\\
, so matching it again in another alternative was just causing an exponential complexity explosion.Fixes #157.