Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflowError with long base64 image and LINKS_ALLOW_MATCHED_PARENTHESES disabled #381

Closed
mrombout opened this issue Dec 19, 2019 · 3 comments

Comments

@mrombout
Copy link

Describe the bug

Most likely the same, or similar issue as #364.

Using a Parser with LINKS_ALLOW_MATCHED_PARENTHESES = false.

When parsing a reasonably sized base64 image in markdown the parser has trouble matching and ends up with a StackOverflowError. Decreasing the size of the base64 image will eventually allow the regex to cope.

I found this because for the time being, I am using PegdownOptionsAdapter, which has that option enabled by default.

To Reproduce

Running:

openjdk version "11.0.5" 2019-10-15
OpenJDK Runtime Environment (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04)
OpenJDK 64-Bit Server VM (build 11.0.5+10-post-Ubuntu-0ubuntu1.118.04, mixed mode, sharing)

See https://github.com/mrombout/flexmark-so-sscce, or below.

MutableDataSet options = new MutableDataSet().set(Parser.LINKS_ALLOW_MATCHED_PARENTHESES, false);

Parser parser = Parser.builder(options).build();
HtmlRenderer renderer = HtmlRenderer.builder(options).build();

Document parse = parser.parse("![]()\n");
String render = renderer.render(parse);

System.out.println(render);

Expected behavior

Output an image tag with the base64 data.

<p><img src="" alt="" /></p>

Additional context

Enabling Parser.LINKS_ALLOW_MATCHED_PARENTHESES avoid the regular expressions which I think is the culprit in InlineParserImpl#parseLinkDestination:1168:

if (options.linksAllowMatchedParentheses) {
    // ... 
} else {
    // spec 0.27 compatibility
    BasedSequence matched = match(myParsing.LINK_DESTINATION);
    return matched != null && spaceInUrls ? matched.trimEnd(BasedSequence.SPACE) : matched;
}
@mrombout mrombout changed the title Stack StackOverflowError with long base64 image and LINKS_ALLOW_MATCHED_PARENTHESES disabled Dec 19, 2019
@vsch
Copy link
Owner

vsch commented Dec 19, 2019

@mrombout, I think you are right.

I have been trying to figure out which option causes this and even have a hand-rolled parser to replace the regex one, which does not have the issue. However, in some cases the stack overflow is still caused and I have not been able to isolate exactly what causes this.

I will take another look through the code to see what other parts of code use this regex but in the mean time I will disable this feature by default.

@vsch vsch added the 🪲 bug label Dec 19, 2019
@vsch
Copy link
Owner

vsch commented Dec 20, 2019

@mrombout, my bad. I completely forgot about the matched parens option and my hand-rolled parser, which does not suffer from stack overflow, is bypassed if this option is not selected. Thus the continued stack overflow occurrences.

I am adding a flag to the state machine parser for no parens and will have it parsing in both cases without regex.

Thank you for bringing this up. It was on my to do list and nagging me to figure it out. Your finding pointed me in the right direction.

@vsch
Copy link
Owner

vsch commented Dec 20, 2019

Fix for this is available. Repo updated, maven updated but may take a while to show up in maven central.

@vsch vsch closed this as completed Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants