-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parse5 is about half the performance of htmlparser2 #1259
Comments
Hey there, I agree this is a bit confusing right now. If you prefer to use const dom = htmlparser2.parseDOM(file.contents, options)
const $ = cheerio.load(dom) Cheerio is capable of handling both DOM structures thanks for @jugglinmike ! |
Hey, both ended up being too slow for my particular need so I ended up parsing myself https://github.com/testimio/mhtml-parser because I only needed very primitive processing and structure rather than creating a whole dom tree. I used |
Also, great news and thanks @jugglinmike ! I mostly wanted to raise a flag and provide feedback. I'm going to go ahead and close this issue now - but feel free to reopen it and thanks again for the library :) |
@benjamingr lol https://github.com/testimio/mhtml-parser is using cheerio under the hood |
Take a look at this thread: #863 There are a few explanations there, not everything, but it's a good start |
@inikulin thanks for the resource i will create parse5 modification api based on basic dom api no one uses jquery anymore today document.querySelector is its succesor |
@frank-dspeed no it's not, it's only using it for benchmarks and for SVGs. RTFC :] https://github.com/testimio/mhtml-parser/blob/master/src/link-replacer.js#L35-L43 |
And honestly JSDom is so slow for what I'm using it for that I will have to end up and writing out my own parser (for my employer, not personally). It's just a big undertaking (~3-4 months) and we won't provide the same API (just like fast-mhtml above doesn't - it solves a subset). We have ~60 second parse times for some large websites. |
I'm working on JS bindings for https://github.com/cloudflare/lol-html at the moment. Which provides low output latency spec-compliant tokenisation along with CSS-selectors support, but orders of magnitude faster than parse5. Maybe it will be useful for your case. |
@inikulin i like cloudflare but they address some other stuff at present there i am doing tag-html a Template Engine and Construction Kit for ESNext Cross Environment Template needs for me the parser is only sugar on top of the goals and patterns we already archived it would allow some dom manipulation to get done in case of SSR or WebWorkers. Your Module is cool for Server Side Proxys as cloudflare is one. |
@inikulin I would, how can I promote this? |
I'm running a benchmark that parses MDN and GitHub using
_useHtmlParser2
true and false and I'm getting considerably faster times using htmlParser2.Is there ongoing work you need help with this? I don't feel great passing
_useHtml5Parser: true
The benchmark is literally loading MDN and all (html) subresources or a GitHub issue and all (html) subresources with Cheerio.
How do I work on this?
Thanks for the great library!
The text was updated successfully, but these errors were encountered: