-
-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New pathological parsing case (quadratic) #178
Comments
Note, we don't get a similar problem with
because in this case the opening delimiter span quickly gets used up for matches. In the above case, it does not, because of the "multiple of 3" rule. So we have to keep coming back to it. |
The problem is that when a match fails because of the "multiple of 3" rule, we can't set |
One possible solution would be to make |
The new "multiple of 3" rule defeats one of our optimizations.
@jgm Sorry to comment something what's closed but... where is the quadratic complexity coming from? There is only one potential opener, the initial |
@mity Yeah, my current code is also not quadratic on this, it doesn't push runs that are can_close but not can_open. But you can make my code quadratic too with a similar test case, say `'***a' * n. I haven't looked at the patch code or played with the implementation, but is it enough to parametrize on just the match char and the length mod 3? It seems to me there are more cases, either the potential match is both can_open and can_close, or not. You'll get a different match depending on whether the closer is both. |
+++ Martin Mitáš [Jan 04 17 02:50 ]:
***@***.*** Sorry to comment something what's closed but... where is the
quadratic complexity coming from?
There is only one potential opener, the initial **. All the other marks
are only potential closers so they do not need to be remembered in the
openers list. So the list of potential openers should always have just
a single member.
In cmark we just have one list of potential openers and
closers, which we traverse backwards looking for an opener
for each successive closer.
Of course we immediately skip over elements in the list
that are not marked can_open, but we still traverse
them.
I avoided unnecessary traversals before by keeping track
of a lower bound for an opener. So, if we failed to match
an opener for a * delimiter at position 6, then when we're
matching an opener at position 10, we don't need to go back
before position 6.
But the new "multiple of 3" rule messed that up. When
we failed to match an opener because of that rule, we
couldn't revise the lower bound for * openers, since
the same opener might match a delimiter run with a different
length.
My solution was to parameterize not just on the character
type but on the length mod 3.
How do you handle this in md4c? I noted that you didn't
have the quadratic complexity in this case.
|
@raphlinus Well, I don't know internals of Cmark. I'm just interested because of my own iplementation where I do not need any complex handling beyond the algorithm described in appendix of CommonMark spec, and I just want to be sure I haven't overlooked something... My way of thinking is as follows:
Therefore I am wondering whether I miss something important or whether the fix is actually a workaround of some another issue inside Cmark which means the list is saturated with something what should not be in the list on the 1st place. |
@jgm That explains it. Ignore mu latest post then, I wrote that before seeing your comment. |
+++ Raph Levien [Jan 04 17 08:12 ]:
I haven't looked at the patch code or played with the implementation,
but is it enough to parametrize on just the match char and the length
mod 3? It seems to me there are more cases, either the potential match
is both can_open and can_close, or not. You'll get a different match
depending on whether the closer is both.
Let's see. The multiple-of-3 rule only kicks in
when either the opening or closing delimiter run is BOTH
can-open and can-close.
So the condition for a match is:
1. delim char is the same, AND
2. either (a) the opening delimiter run is ONLY can-open and
the closing run is ONLY can-close, or (b) the sum of their
lengths mod 3 is not 0.
We're storing our knowledge about the last delimiter
run that meets conditions 1 and 2(b). It would be better
to take into account 2(a) as well. Perhaps we could
make the second array parameter something like
(opening delimiter run is only can-open
and closing is only can-close) ? 3 : (sum of lengths mod 3)
|
@jgm I think that's enough information. When touching a closing run that is can_close only, you look at both the mod-3 and the can_open case, and take the later of the two? I was thinking of an array of 6 rather than an array of 4, where that logic was done at write time, but I think yours is better. Thanks! |
commonmark/commonmark.js@903f35e Author: John MacFarlane <[email protected]> Date: Fri Aug 30 16:24:48 2019 -0700 Fix pathological case commonmark/cmark#178. commonmark/commonmark.js@34b8f65 Author: John MacFarlane <[email protected]> Date: Fri Aug 30 16:28:09 2019 -0700
Found by @raphlinus:
That is,
The text was updated successfully, but these errors were encountered: