Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect html4 parsing #694

Closed
pyhedgehog opened this issue Apr 5, 2015 · 2 comments · Fixed by #985
Closed

incorrect html4 parsing #694

pyhedgehog opened this issue Apr 5, 2015 · 2 comments · Fixed by #985

Comments

@pyhedgehog
Copy link

console.log(cheerio.load('<html><body><li><ul>item1<ul>item2<ul>item3').html());

shows

<html><body><li><ul>item1<ul>item2<ul>item3</ul></ul></ul></li></body></html>

should be

<html><body><li><ul>item1</ul><ul>item2</ul><ul>item3</ul></li></body></html>
@thealjey
Copy link

Also,

console.log(cheerio.load('<p>some<p>text</p>here</p>').html());

prints

<p>some</p><p>text</p>here<p></p>

instead of

<p>some<p>text</p>here</p>

no unclosed tags here

I realize that it doesn't make sense to have a paragraph inside a paragraph, but it's still a perfectly valid xml structure.

@fb55
Copy link
Member

fb55 commented Dec 12, 2020

This is in line with the HTML spec.

@fb55 fb55 closed this as completed Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants