Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some inputs taking long time to parse #273

Closed
manunio opened this issue Nov 13, 2023 · 23 comments
Closed

Some inputs taking long time to parse #273

manunio opened this issue Nov 13, 2023 · 23 comments

Comments

@manunio
Copy link

manunio commented Nov 13, 2023

Hi, While fuzzing using jazzer.js (for oss-fuzz) Parser().parse() was taking longer than exptected time to parse.

const commonmark = require('./dist/commonmark.js');

console.time("test")
const data = `---
pedaommen-

 >-t<!--
	This is another comm| baz | boParagraph two.

<!-- ntic: true
---

Just a [URL](/url/).

[URL and title](/url/ "title").

[URL and ttle](/url/ "title").

[URL and title](/url/  "title preceded by two spaces").

[URL and title](/url/	"title precede/ "title has spaces afterward"  ).

[URL and title]( e]( /url/has space/ "url has space and title").

[Empty]().`

new commonmark.Parser().parse(data)
console.timeEnd("test")
node test.js
test: 2:35.008 (m:ss.mmm)
@jgm
Copy link
Member

jgm commented Nov 13, 2023

I tried the input

---
pedaommen-

 >-t<!--
	This is another comm| baz | boParagraph two.

<!-- ntic: true
---

Just a [URL](/url/).

[URL and title](/url/ "title").

[URL and ttle](/url/ "title").

[URL and title](/url/  "title preceded by two spaces").

[URL and title](/url/	"title precede/ "title has spaces afterward"  ).

[URL and title]( e]( /url/has space/ "url has space and title").

[Empty]().

with the commonmark dingus and it reported a parse time of 1 ms.

I then tried your program above. It completed in 1.682 ms.

So...I can't reproduce this! Are you using the latest code from the repository?

@manunio
Copy link
Author

manunio commented Nov 13, 2023

Hi, Yes its latest, I have tested it against 97da298. I have attached the input file produced by jazzer.js maybe that helps :)

const commonmark = require('./dist/commonmark.js')
const fs = require('fs')
const data = fs.readFileSync('./timeout-64699b4ddd13b0497d6ec6ffb5f21d485800abfc.txt').toString()

console.time("test")
new commonmark.Parser().parse(data)
console.timeEnd("test")

timeout-64699b4ddd13b0497d6ec6ffb5f21d485800abfc.txt

@manunio
Copy link
Author

manunio commented Nov 13, 2023

Following are system configs

❯ node --version
v18.18.2
❯ npm --version
9.8.1
OS: Ubuntu 20.04.6 LTS x86_64

@manunio
Copy link
Author

manunio commented Nov 13, 2023

Tested it against master in github's codespace following is what it produces.

@manunio ➜ /workspaces/commonmark.js (master) $ cat fuzz.js
const commonmark = require('./dist/commonmark.js')
const fs = require('fs')
const data = fs.readFileSync('./timeout-64699b4ddd13b0497d6ec6ffb5f21d485800abfc.txt').toString()

console.time("test")
new commonmark.Parser().parse(data)
console.timeEnd("test")

@manunio ➜ /workspaces/commonmark.js (master) $ node fuzz.js 
test: 1:38.560 (m:ss.mmm)

@manunio ➜ /workspaces/commonmark.js (master) $ node --version
v20.8.1

@manunio ➜ /workspaces/commonmark.js (master) $ git --no-pager log --oneline | head -n 5
97da298 Track underscore bottom separately mod 3, like asterisk
df3ea1e Fix list tightness.
20b52e5 Fix "CommomMark" typo (#270)
9a16ff4 Declarations do not need a space, per the spec.
46538e5 Allow `<!doctype` to be case-insensitive.

@jgm
Copy link
Member

jgm commented Nov 13, 2023

I couldn't reproduce it with this fuzz.js and timeout file either. Odd! Can you reproduce it on the online "try commonmark" dingus?

I was using node v18.17.1, but it's hard to imagine that's the issue.

@manunio
Copy link
Author

manunio commented Nov 14, 2023

I was not able to reproduce it with https://spec.commonmark.org/dingus/ .
I think its using an older version as i found following changes in source introduced here 4874cb4.

- var reClosingCodeFence = /^(?:`{3,}|~{3,})(?= *$)/;
+ var reClosingCodeFence = /^(?:`{3,}|~{3,})(?=[ \t]*$)/;

Then i tried testing it locally against local dingus build, It was throwing following error in browser console

Uncaught InternalError: too much recursion
    match http://127.0.0.1:9000/commonmark.js:9270
    parseHtmlTag http://127.0.0.1:9000/commonmark.js:9385
    parseInline http://127.0.0.1:9000/commonmark.js:10115
    parseInlines http://127.0.0.1:9000/commonmark.js:10139
    processInlines http://127.0.0.1:9000/commonmark.js:11098
    parse http://127.0.0.1:9000/commonmark.js:11149
    parseAndRender http://127.0.0.1:9000/dingus.js:87
    Lodash 6
    jQuery 9
    onIframeLoad http://127.0.0.1:9000/dingus.js:122
    jQuery 8
    <anonymous> http://127.0.0.1:9000/dingus.js:141

at

var m = re.exec(this.subject.slice(this.pos));

Screenshot_20231114_072122

@jgm
Copy link
Member

jgm commented Nov 14, 2023

OK, I can reproduce it now. I had not regenerated dist/, that was the issue.

@jgm
Copy link
Member

jgm commented Nov 14, 2023

Bisected: the problem starts with commit 28c82d8 and the next one.
Here's the regex that is causing the problem:

 var HTMLCOMMENT = "<!-->|<!--->|<!--(?:[^-]+|-[^-]|--[^>])*-->"

@jgm
Copy link
Member

jgm commented Nov 14, 2023

OK, I can see why this causes pathological behavior!

@jgm jgm closed this as completed in 9f548fe Nov 14, 2023
@jgm
Copy link
Member

jgm commented Nov 14, 2023

This should fix it. Thanks for noticing the issue!

@manunio
Copy link
Author

manunio commented Nov 14, 2023

@jgm would be interested in Fuzzing this project at oss-fuzz? It's a free service run by Google to fuzz important open source projects. as an example cmark is already being fuzzed at oss-fuzz.

@jgm
Copy link
Member

jgm commented Nov 14, 2023

Sure.
Well, to be honest I'm not sure. I don't think it's as critical if a client-side library has a performance bug like this.

@manunio
Copy link
Author

manunio commented Nov 14, 2023

Sure. Well, to be honest I'm not sure. I don't think it's as critical if a client-side library has a performance bug like this.

Fair enough , I'm too more inclined towards other type of bugs for client slide libs, like the bugs found in yaml package(see below). The above mentioned performance reports were result of running it for few minutes , and i believe that through oss-fuzz and proper seed we will be able to discover other type of bugs. As for timeouts, we can fine tune input length so that it produces less noise.

Edit:
Jazzer.js was able to find prototype pollution vulnerability by their Prototype Pollution bug detector and many such bug-detectors are in works as per their earlier twitter post.

@jgm
Copy link
Member

jgm commented Nov 14, 2023

What would be required to put it on oss-fuzz?

@manunio
Copy link
Author

manunio commented Nov 14, 2023

What would be required to put it on oss-fuzz?

Only thing i need is an mail id, so that maintainers can receiver bug reports via mail which points to a private tracker(google's monorail).

Edit:
Mail id must be google id like gmail to access these, as there is an limitation which makes this a requirement (https://google.github.io/oss-fuzz/faq/#why-do-you-require-a-google-account-for-authentication)

@manunio
Copy link
Author

manunio commented Nov 14, 2023

Please note that the mail id will be part of public config at oss-fuzz for example:
https://github.com/google/oss-fuzz/blob/master/projects/ron/project.yaml

@manunio
Copy link
Author

manunio commented Nov 14, 2023

Additionally I'll submit a pr here which adds the minimal fuzzing code(which will call Parser.parse()) and add jazzer as dev dependency here. Then will submit a pr at oss-fuzz which contains necessary logic from an OSS-Fuzz perspective to integrate commonmark.js

@manunio
Copy link
Author

manunio commented Nov 15, 2023

Hi @jgm found a similar too much recursion error, should i post it here or create a new issue ?

@manunio
Copy link
Author

manunio commented Nov 15, 2023

I'm receiving following output with cmark.

❯ cmark timeout-0c457d4f5c4c88bd5ef8a65765f75140d786edb0
Traceback (most recent call last):
  File "/home/maxx/.local/bin/cmark", line 8, in <module>
    sys.exit(main())
  File "/home/maxx/.local/lib/python3.10/site-packages/commonmark/cmark.py", line 34, in main
    for line in f:
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 81: invalid continuation byte

@jgm
Copy link
Member

jgm commented Nov 16, 2023

cmark output indicates that the input is not UTF-8 encoded.

@jgm
Copy link
Member

jgm commented Nov 16, 2023

You can post it here, either way.

@manunio
Copy link
Author

manunio commented Nov 16, 2023

const commonmark = require('./dist/commonmark.js');
const fs = require('fs');

console.time("test")
const data = fs.readFileSync('./timeout-0c457d4f5c4c88bd5ef8a65765f75140d786edb0.txt').toString()
new commonmark.Parser().parse(data)
console.timeEnd("test")

timeout-0c457d4f5c4c88bd5ef8a65765f75140d786edb0.txt

@manunio
Copy link
Author

manunio commented Nov 16, 2023

@jgm Do you approve of me going ahead with oss-fuzz integration?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants