Skip to content

Commit

Permalink
fix: some Node reparent methods parse in context of parent
Browse files Browse the repository at this point in the history
The choice of context node has no functional impact for Nokogiri per
se; but it's necessary for Nokogumbo, which has a better HTML parser,
to ensure the parser is in the right state.

When testing this change against nodes without a parent, the test
suite found memory problems related to libxml2 manipulating pointers
on pickled nodes. As a result, the impacted methods now raise a
RuntimeError if they're called on a node without a parent (the concept
of a sibling or a replacement/swap doesn't make sense without the
context of a parent to hold it together, anyway).

Note that `Node#coerce` is now a __protected__ method (previously it was __private__).

Impacted Node methods:

- add_next_sibling, previous=, and before
- add_previous_sibling, next=, and after
- replace and swap

Fixes rubys/nokogumbo#160
  • Loading branch information
flavorjones committed Nov 28, 2020
1 parent a21a003 commit 46e33cc
Show file tree
Hide file tree
Showing 6 changed files with 115 additions and 35 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ This release ends support for:
### Fixed

* The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [[#1935](https://github.com/sparklemotion/nokogiri/issues/1935)]
* The Node methods add_previous_sibling, previous=, before, add_next_sibling, next=, after, replace, and swap now correctly use their parent as the context node for parsing markup. These methods now also raise a RuntimeError if they are called on a node with no parent. [[nokogumbo#160](https://github.com/rubys/nokogumbo/issues/160)
* [CRuby] Fixed installation on AIX with respect to `vasprintf`. [[#1908](https://github.com/sparklemotion/nokogiri/issues/1908)]
* [CRuby] On some platforms, avoid symbol name collision with glibc's `canonicalize`. [[#2105](https://github.com/sparklemotion/nokogiri/issues/2105)]
* [JRuby] Standardize reading from IO like objects, including StringIO. [[#1888](https://github.com/sparklemotion/nokogiri/issues/1888), [#1897](https://github.com/sparklemotion/nokogiri/issues/1897)]
Expand Down
52 changes: 29 additions & 23 deletions lib/nokogiri/xml/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,8 @@ def children=(node_or_tags)
#
# Also see related method +swap+.
def replace(node_or_tags)
raise("Cannot replace a node with no parent") unless parent

# We cannot replace a text node directly, otherwise libxml will return
# an internal error at parser.c:13031, I don't know exactly why
# libxml is trying to find a parent node that is an element or document
Expand All @@ -278,7 +280,7 @@ def replace(node_or_tags)
return replacee.replace node_or_tags
end

node_or_tags = coerce(node_or_tags)
node_or_tags = parent.coerce(node_or_tags)

if node_or_tags.is_a?(XML::NodeSet)
node_or_tags.each { |n| add_previous_sibling n }
Expand Down Expand Up @@ -824,7 +826,7 @@ def parse(string_or_io, options = nil)
#
# Unfortunately, this means we're no longer parsing "in context" and so namespaces that
# would have been inherited from the context node won't be handled correctly. This hack was
# writting in 2010, and I regret it, because it's silently degrading functionality in a way
# written in 2010, and I regret it, because it's silently degrading functionality in a way
# that's not easily prevented (or even detected).
#
# I think preferable behavior would be to either:
Expand Down Expand Up @@ -1141,6 +1143,28 @@ def canonicalize(mode = XML::XML_C14N_1_0, inclusive_namespaces = nil, with_comm

# @!endgroup

protected

def coerce(data)
case data
when XML::NodeSet
return data
when XML::DocumentFragment
return data.children
when String
return fragment(data).children
when Document, XML::Attr
# unacceptable
when XML::Node
return data
end

raise ArgumentError, <<-EOERR
Requires a Node, NodeSet or String argument, and cannot accept a #{data.class}.
(You probably want to select a node from the Document with at() or search(), or create a new Node via Node.new().)
EOERR
end

private

def keywordify(keywords)
Expand All @@ -1155,10 +1179,12 @@ def keywordify(keywords)
end

def add_sibling(next_or_previous, node_or_tags)
raise("Cannot add sibling to a node with no parent") unless parent

impl = (next_or_previous == :next) ? :add_next_sibling_node : :add_previous_sibling_node
iter = (next_or_previous == :next) ? :reverse_each : :each

node_or_tags = coerce node_or_tags
node_or_tags = parent.coerce(node_or_tags)
if node_or_tags.is_a?(XML::NodeSet)
if text?
pivot = Nokogiri::XML::Node.new "dummy", document
Expand Down Expand Up @@ -1195,26 +1221,6 @@ def inspect_attributes
[:name, :namespace, :attribute_nodes, :children]
end

def coerce(data)
case data
when XML::NodeSet
return data
when XML::DocumentFragment
return data.children
when String
return fragment(data).children
when Document, XML::Attr
# unacceptable
when XML::Node
return data
end

raise ArgumentError, <<-EOERR
Requires a Node, NodeSet or String argument, and cannot accept a #{data.class}.
(You probably want to select a node from the Document with at() or search(), or create a new Node via Node.new().)
EOERR
end

# @private
IMPLIED_XPATH_CONTEXTS = [".//".freeze].freeze

Expand Down
6 changes: 1 addition & 5 deletions test/html/test_node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,7 @@ def test_unlink_then_swap
node = @html.at('a')
node.unlink

another_node = @html.at('div')
assert another_node, 'should have a node'

# This used to segv
assert node.add_previous_sibling another_node
assert_raises(RuntimeError) { node.add_previous_sibling @html.at('div') }
end

def test_swap
Expand Down
4 changes: 2 additions & 2 deletions test/xml/test_node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1202,9 +1202,9 @@ def test_set_node_lang
def test_text_node_robustness_gh1426
# notably, the original bug report was about libxml-ruby interactions
# this test should blow up under valgrind if we regress on libxml-ruby workarounds
message = "<h2>BOOM!</h2>"
message = "<section><h2>BOOM!</h2></section>"
10_000.times do
node = Nokogiri::HTML::DocumentFragment.parse(message)
node = Nokogiri::HTML::DocumentFragment.parse(message).at_css("h2")
node.add_previous_sibling(Nokogiri::XML::Text.new('before', node.document))
node.add_next_sibling(Nokogiri::XML::Text.new('after', node.document))
end
Expand Down
81 changes: 81 additions & 0 deletions test/xml/test_node_reparenting.rb
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,87 @@ class TestNodeReparenting < Nokogiri::TestCase
end
end
end

describe "in-context parsing" do
specs = {
:add_child => :self,
:<< => :self,

:replace => :parent,
:swap => :parent,

:children= => :self,
:inner_html= => :self,

:add_previous_sibling => :parent,
:previous= => :parent,
:before => :parent,

:add_next_sibling => :parent,
:next= => :parent,
:after => :parent,
}

specs.each do |method, which|
describe "##{method} parsing input" do
let(:xml) do
<<~EOF
<root>
<parent><context></context></parent>
</root>
EOF
end

let(:doc) { Nokogiri::XML::Document.parse(xml) }
let(:context_node) { doc.at_css("context") }

after do
GC.start
end

describe "with a parent" do
let(:expected_callee) do
if which == :self
context_node
elsif which == :parent
context_node.parent
else
raise("unable to discern what the test means by #{which}")
end
end

before do
class << expected_callee
attr_reader :coerce_was_called
def coerce(data)
@coerce_was_called = true
super
end
end
end

it "in context of #{which}" do
context_node.__send__(method, "<child>content</child>")

assert expected_callee.coerce_was_called, "expected coerce to be called on #{which}"
end
end

if which == :parent
describe "without a parent" do
before { context_node.unlink }

it "raises an exception" do
ex = assert_raise(RuntimeError) do
context_node.__send__(method, "<child>content</child>")
end
assert_match(/no parent/, ex.message)
end
end
end
end
end
end
end

describe "ad hoc node reparenting behavior" do
Expand Down
6 changes: 1 addition & 5 deletions test/xml/test_unparented_node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -453,11 +453,7 @@ def test_replace

def test_replace_on_unparented_node
foo = Node.new('foo', @node.document)
if Nokogiri.jruby? # JRuby Nokogiri doesn't raise an exception
@node.replace(foo)
else
assert_raises(RuntimeError){ @node.replace(foo) }
end
assert_raises(RuntimeError) { @node.replace(foo) }
end

def test_illegal_replace_of_node_with_doc
Expand Down

0 comments on commit 46e33cc

Please sign in to comment.