Skip to content

Commit

Permalink
Merge pull request #2294 from sparklemotion/flavorjones-abruptly-clos…
Browse files Browse the repository at this point in the history
…ed-html-comments

feat(cruby): patch libxml2 to handle abruptly-closed HTML comments

---

**What problem is this PR intended to solve?**

Hackerone user [tehryanx](https://hackerone.com/tehryanx?type=user) reported a method of potentially confusing Loofah sanitizers by taking advantage of the difference between how abruptly-closed comments are handled by libxml2 and how they're handled by WHATWG-guidance-compliant modern browsers.

WHATWG advises to treat `<!-->` and `<!--->` as empty comments, but libxml2 today treats these as the _start_ of a comment.

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment

I've submitted this patch upstream at https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/124

Similar prior art is for incorrectly-closed comments:

- #2058
- https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/82


**Have you included adequate test coverage?**

Yes!


**Does this change affect the behavior of either the C or the Java implementations?**

This is a behavior change to libxml2/CRuby. Note that nekoHTML/JRuby already follow WHATWG guidance and so JRuby behavior is unchanged.
  • Loading branch information
flavorjones authored Aug 17, 2021
2 parents 7fec4fa + 7f9bb4b commit 604dad5
Show file tree
Hide file tree
Showing 3 changed files with 91 additions and 11 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Nokogiri follows [Semantic Versioning](https://semver.org/), please see the [REA

### Improved

* [CRuby] Handle abruptly-closed HTML comments as WHATWG recommends for browsers. (Thanks to HackerOne user [tehryanx](https://hackerone.com/tehryanx?type=user) for reporting this!)
* [CRuby] `Node#line` is no longer capped at 65535. libxml v2.9.0 and later support a new parse option, exposed as `Nokogiri::XML::ParseOptions::PARSE_BIG_LINES` and set in `ParseOptions::DEFAULT_XML`, `::DEFAULT_XSLT`, `::DEFAULT_HTML`, and `::DEFAULT_SCHEMA`. (Note that JRuby never had this problem.) [[#1764](https://github.com/sparklemotion/nokogiri/issues/1764), [#1493](https://github.com/sparklemotion/nokogiri/issues/1493), [#1617](https://github.com/sparklemotion/nokogiri/issues/1617), [#1505](https://github.com/sparklemotion/nokogiri/issues/1505), [#1003](https://github.com/sparklemotion/nokogiri/issues/1003), [#533](https://github.com/sparklemotion/nokogiri/issues/533)]


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
From 3ea8d08da310b645e37940eaae5cc28e251b155b Mon Sep 17 00:00:00 2001
From: Mike Dalessio <[email protected]>
Date: Sat, 17 Jul 2021 14:36:53 -0400
Subject: [PATCH] htmlParseComment: handle abruptly-closed comments

See guidance provided on abrutply-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
---
HTMLparser.c | 11 +++++++++++
include/libxml/xmlerror.h | 1 +
2 files changed, 12 insertions(+)

diff --git a/HTMLparser.c b/HTMLparser.c
index b56363a..f0bf294 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -3485,10 +3485,20 @@ htmlParseComment(htmlParserCtxtPtr ctxt) {
q = CUR_CHAR(ql);
if (q == 0)
goto unfinished;
+ if (q == '>') {
+ htmlParseErr(ctxt, XML_ERR_COMMENT_ABRUPTLY_ENDED, "Comment abruptly ended", NULL, NULL);
+ cur = '>';
+ goto finished;
+ }
NEXTL(ql);
r = CUR_CHAR(rl);
if (r == 0)
goto unfinished;
+ if (q == '-' && r == '>') {
+ htmlParseErr(ctxt, XML_ERR_COMMENT_ABRUPTLY_ENDED, "Comment abruptly ended", NULL, NULL);
+ cur = '>';
+ goto finished;
+ }
NEXTL(rl);
cur = CUR_CHAR(l);
while ((cur != 0) &&
@@ -3536,6 +3546,7 @@ htmlParseComment(htmlParserCtxtPtr ctxt) {
cur = next;
l = nl;
}
+finished:
buf[len] = 0;
if (cur == '>') {
NEXT;
diff --git a/include/libxml/xmlerror.h b/include/libxml/xmlerror.h
index c101997..7b68e40 100644
--- a/include/libxml/xmlerror.h
+++ b/include/libxml/xmlerror.h
@@ -209,6 +209,7 @@ typedef enum {
XML_ERR_VERSION_MISMATCH, /* 109 */
XML_ERR_NAME_TOO_LONG, /* 110 */
XML_ERR_USER_STOP, /* 111 */
+ XML_ERR_COMMENT_ABRUPTLY_ENDED, /* 112 */
XML_NS_ERR_XML_NAMESPACE = 200,
XML_NS_ERR_UNDEFINED_NAMESPACE, /* 201 */
XML_NS_ERR_QNAME, /* 202 */
--
2.31.0

40 changes: 29 additions & 11 deletions test/html4/test_comments.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,20 @@ class TestComment < Nokogiri::TestCase
let(:html) { "<html><body><div id=under-test><!--></div><div id=also-here></div></body></html>" }

if Nokogiri.uses_libxml?
it "behaves as if the comment is unterminated and doesn't exist" do # NON-COMPLIANT
assert_equal 0, subject.children.length
assert_equal 1, doc.errors.length
assert_match(/Comment not terminated/, doc.errors.first.to_s)
assert !other_div
if Nokogiri::VersionInfo.instance.libxml2_using_packaged? && Nokogiri::VERSION_INFO["libxml"]["patches"]&.include?("0008-htmlParseComment-handle-abruptly-closed-comments.patch")
it "behaves as if the comment is closed correctly" do # COMPLIANT
assert_equal 1, subject.children.length
assert subject.children.first.comment?
assert_equal "", subject.children.first.content
assert other_div
end
else
it "behaves as if the comment is unterminated and doesn't exist" do # NON-COMPLIANT
assert_equal 0, subject.children.length
assert_equal 1, doc.errors.length
assert_match(/Comment not terminated/, doc.errors.first.to_s)
assert !other_div
end
end
end

Expand All @@ -43,19 +52,28 @@ class TestComment < Nokogiri::TestCase
let(:html) { "<html><body><div id=under-test><!---></div><div id=also-here></div></body></html>" }

if Nokogiri.uses_libxml?
it "behaves as if the comment is unterminated and doesn't exist" do # NON-COMPLIANT
assert_equal 0, subject.children.length
assert_equal 1, doc.errors.length
assert_match(/Comment not terminated/, doc.errors.first.to_s)
assert !other_div
if Nokogiri::VersionInfo.instance.libxml2_using_packaged? && Nokogiri::VERSION_INFO["libxml"]["patches"]&.include?("0008-htmlParseComment-handle-abruptly-closed-comments.patch")
it "behaves as if the comment is closed correctly" do # COMPLIANT
assert_equal 1, subject.children.length
assert subject.children.first.comment?
assert_equal "", subject.children.first.content
assert other_div
end
else
it "behaves as if the comment is unterminated and doesn't exist" do # NON-COMPLIANT
assert_equal 0, subject.children.length
assert_equal 1, doc.errors.length
assert_match(/Comment not terminated/, doc.errors.first.to_s)
assert !other_div
end
end
end

if Nokogiri.jruby?
it "behaves as if the comment is closed correctly" do # COMPLIANT
assert_equal 1, subject.children.length
assert subject.children.first.comment?
assert_equal "-", subject.children.first.content # curious
assert_equal "-", subject.children.first.content # curious, potentially non-compliant?
assert other_div
end
end
Expand Down

0 comments on commit 604dad5

Please sign in to comment.