Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html5 compatible parser? #1690

Closed
jmesserly opened this issue Oct 10, 2014 · 8 comments · May be fixed by gcloud-lerralice/tools#94 or Exnadella/tools#26
Closed

html5 compatible parser? #1690

jmesserly opened this issue Oct 10, 2014 · 8 comments · May be fixed by gcloud-lerralice/tools#94 or Exnadella/tools#26
Assignees

Comments

@jmesserly
Copy link

I was comparing different node.js parsers for a different project, and studied the various implementations.

Based on my experience implementing an html5 parser (via porting html5lib), and looking at the Blink HTML parser, I have doubts that htmlparser2 actually follows the spec. Some of the issues on that repo seem to confirm this. Perhaps we should consider something like https://github.com/inikulin/parse5. The speed is comparable, and the code structure is what I would expect for an HTML parser (matches the specification) and they use the (very extensive!) html5lib test suite https://github.com/inikulin/parse5/tree/master/test/data

@nevir
Copy link
Contributor

nevir commented Oct 13, 2014

https://github.com/karlwestin/node-gumbo-parser as an alternative too (but requires native bindings)

@dfreedm
Copy link
Member

dfreedm commented Oct 14, 2014

Yeah, htmlparser2 is pretty clearly not spec compliant, just rather fast, and cheerio is built on-top of it.

That said, if the parse5 htmlparser2 adapter is good enough, I should just be able to do an in-place switch.

I'm going to devote a bit of time to that, as it would solve some of the thornier problems I'm having with cheerio and htmlparser2.

@dfreedm dfreedm self-assigned this Oct 14, 2014
@inikulin
Copy link

Hi, guys. parse5 dev here. There is a cheerio fork that uses parse5 as underlying parser: whacko. This should work as a simple drop-in replacement for cheerio. Please, let me know if you need any farther assistance on this.

@dfreedm
Copy link
Member

dfreedm commented Oct 14, 2014

Hi @inikulin . My only concern with whacko is that it is a fork from an older version. There were some significant api changes between 0.13 and 0.17 with regards to looping mechanics, and I don't want to change them again if possible.

@inikulin
Copy link

@azakus Ok, I see. I'll try to find spare time to sync whacko with upstream within this/next week.

@inikulin
Copy link

@azakus latest upstream changes were applied to whacko. Updated version (17.0.1) is available via npm.

@dfreedm
Copy link
Member

dfreedm commented Oct 20, 2014

Thanks!

dfreedm referenced this issue in Polymer/polymer-bundler Oct 20, 2014
Use parse5 automatic head/body insertion

Depend on parse5 to make proper head and body tags, use those to insert correctly into head and body of master document

Mostly removes need for  <div hidden></div> trick, except for the the final concatenation into main document body.

Fixes #73 html5 compatible parser?
Fixes #61 Vulcanizer inlines non-head content into head, generates incorrect HTML
Fixes #53 Vulcanize breaks SVG.
Fixes #67 Characters with accents are not properly converted/formated
@inikulin
Copy link

@azakus You are welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants