-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html5 compatible parser? #1690
html5 compatible parser? #1690
Comments
https://github.com/karlwestin/node-gumbo-parser as an alternative too (but requires native bindings) |
Yeah, htmlparser2 is pretty clearly not spec compliant, just rather fast, and cheerio is built on-top of it. That said, if the parse5 htmlparser2 adapter is good enough, I should just be able to do an in-place switch. I'm going to devote a bit of time to that, as it would solve some of the thornier problems I'm having with cheerio and htmlparser2. |
Hi, guys. parse5 dev here. There is a cheerio fork that uses parse5 as underlying parser: whacko. This should work as a simple drop-in replacement for cheerio. Please, let me know if you need any farther assistance on this. |
Hi @inikulin . My only concern with whacko is that it is a fork from an older version. There were some significant api changes between 0.13 and 0.17 with regards to looping mechanics, and I don't want to change them again if possible. |
@azakus Ok, I see. I'll try to find spare time to sync whacko with upstream within this/next week. |
@azakus latest upstream changes were applied to whacko. Updated version (17.0.1) is available via npm. |
Thanks! |
Use parse5 automatic head/body insertion Depend on parse5 to make proper head and body tags, use those to insert correctly into head and body of master document Mostly removes need for <div hidden></div> trick, except for the the final concatenation into main document body. Fixes #73 html5 compatible parser? Fixes #61 Vulcanizer inlines non-head content into head, generates incorrect HTML Fixes #53 Vulcanize breaks SVG. Fixes #67 Characters with accents are not properly converted/formated
@azakus You are welcome |
I was comparing different node.js parsers for a different project, and studied the various implementations.
Based on my experience implementing an html5 parser (via porting html5lib), and looking at the Blink HTML parser, I have doubts that htmlparser2 actually follows the spec. Some of the issues on that repo seem to confirm this. Perhaps we should consider something like https://github.com/inikulin/parse5. The speed is comparable, and the code structure is what I would expect for an HTML parser (matches the specification) and they use the (very extensive!) html5lib test suite https://github.com/inikulin/parse5/tree/master/test/data
The text was updated successfully, but these errors were encountered: