Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support htmlparser2 #352

Merged
merged 2 commits into from
Jul 8, 2020
Merged

Conversation

hansottowirtz
Copy link
Contributor

We need to inline very large svgs (~100MB). Cheerio uses parse5 by default, which is very slow and memory intensive for files of this size. Cheerio also supports (and ships with) htmlparser2, which has a slightly different API. This PR makes using htmlparser2 possible by detecting which API is present.

I'm not an expert on this and I'm not sure if the test I wrote is complete enough, but from my tests the only important API difference was that el.childNodes[0].nodeValue in parse5 corresponds to el.children[0].data in htmlparser2.

@simison
Copy link
Member

simison commented Apr 6, 2020

Why not switch to htmlparser2 for everyone?

@jrit
Copy link
Collaborator

jrit commented Jun 24, 2020

Yeah, I have the same question as @simison - why support both? Is there something that parse5 is better at?

@hansottowirtz
Copy link
Contributor Author

It seems like parse5 is a lot faster while htmlparser2 is more spec-compliant and supports XML (and for our use case, is more memory-efficient).

Cheerio uses parse5 but can fall back to htmlparser2 (e.g. needed for XML). In my opinion, as Juice can receive a Cheerio instance with both parsers, it should also support both parsers.

Cheerio parse5 parser discussion: cheeriojs/cheerio#985 and cheeriojs/cheerio#863
Cheerio supports both parsers: cheeriojs/cheerio#1259 (comment)

@TrySound
Copy link
Contributor

TrySound commented Jul 1, 2020

Btw some packages (mjml) use [email protected] which is published with latest tag in npm.
http://unpkg.com/cheerio@latest/package.json

@jrit jrit merged commit ec41c65 into Automattic:master Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants