Skip to content

Commit

Permalink
feat: {Node,NodeSet}#wrap accept a Node argument
Browse files Browse the repository at this point in the history
Duplicating an instantiated Node is significantly faster than
re-parsing a string for multiple invocations.

Closes #2657
  • Loading branch information
flavorjones committed Nov 15, 2022
1 parent eda3ca9 commit 7507518
Show file tree
Hide file tree
Showing 6 changed files with 236 additions and 26 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,15 @@ This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/

### Added

* `Node#wrap` and `NodeSet#wrap` now also accept a `Node` type argument, which will be `dup`ed for each wrapper. For cases where many nodes are being wrapped, creating a `Node` once using `Document#create_element` and passing that `Node` multiple times is significantly faster than re-parsing markup on each call. [[#2657](https://github.com/sparklemotion/nokogiri/issues/2657)]
* [CRuby] Invocation of custom XPath or CSS handler functions may now use the `nokogiri` namespace prefix. Historically, the JRuby implementation _required_ this namespace but the CRuby implementation did not support it. It's recommended that all XPath and CSS queries use the `nokogiri` namespace going forward. Invocation without the namespace is planned for deprecation in v1.15.0 and removal in a future release. [[#2147](https://github.com/sparklemotion/nokogiri/issues/2147)]


### Fixed

* `SAX::Parser`'s `encoding` attribute will not be clobbered when an alternative encoding is passed into `SAX::Parser#parse_io`. [[#1942](https://github.com/sparklemotion/nokogiri/issues/1942)] (Thanks, [@kp666](https://github.com/kp666)!)
* Serialized `HTML4::DocumentFragment` will now be properly encoded. Previously this empty string was encoded as `US-ASCII`. [[#2649](https://github.com/sparklemotion/nokogiri/issues/2649)]
* `Node#wrap` now uses the parent as the context node for parsing wrapper markup, falling back to the document for unparented nodes. Previously the document was always used.
* [CRuby] UTF-16-encoded documents longer than ~4000 code points now serialize properly. Previously the serialized document was corrupted when it exceeded the length of libxml2's internal string buffer. [[#752](https://github.com/sparklemotion/nokogiri/issues/752)]
* [CRuby] The HTML5 parser now correctly handles text at the end of `form` elements.
* [CRuby] `HTML5::Document#fragment` now always uses `body` as the parsing context. Previously, fragments were parsed in the context of the associated document's root node, which allowed for inconsistent parsing. [[#2553](https://github.com/sparklemotion/nokogiri/issues/2553)]
Expand Down
74 changes: 65 additions & 9 deletions lib/nokogiri/xml/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -176,13 +176,69 @@ def prepend_child(node_or_tags)
end
end

###
# Add html around this node
# :call-seq:
# wrap(markup) -> self
# wrap(node) -> self
#
# Wrap this Node with the node parsed from +markup+ or a dup of the +node+.
#
# [Parameters]
# - *markup* (String)
# Markup that is parsed and used as the wrapper. This node's parent, if it exists, is used
# as the context node for parsing; otherwise the associated document is used. If the parsed
# fragment has multiple roots, the first root node is used as the wrapper.
# - *node* (Nokogiri::XML::Node)
# An element that is `#dup`ed and used as the wrapper.
#
# [Returns] +self+, to support chaining.
#
# Also see NodeSet#wrap
#
# *Example* with a +String+ argument:
#
# doc = Nokogiri::HTML5(<<~HTML)
# <html><body>
# <a>asdf</a>
# </body></html>
# HTML
# doc.at_css("a").wrap("<div></div>")
# doc.to_html
# # => <html><head></head><body>
# # <div><a>asdf</a></div>
# # </body></html>
#
# *Example* with a +Node+ argument:
#
# doc = Nokogiri::HTML5(<<~HTML)
# <html><body>
# <a>asdf</a>
# </body></html>
# HTML
# doc.at_css("a").wrap(doc.create_element("wrap"))
# doc.to_html
# # <html><head></head><body>
# # <wrap><a>asdf</a></wrap>
# # </body></html>
#
# Returns self
def wrap(html)
new_parent = document.parse(html).first
add_next_sibling(new_parent)
def wrap(node_or_tags)
case node_or_tags
when String
new_parent = if parent
parent.coerce(node_or_tags).first
else
coerce(node_or_tags).first
end
when XML::Node
new_parent = node_or_tags.dup
else
raise ArgumentError, "Requires a String or Node argument, and cannot accept a #{node_or_tags.class}"
end

if parent
add_next_sibling(new_parent)
else
new_parent.unlink
end
new_parent.add_child(self)
self
end
Expand All @@ -193,7 +249,7 @@ def wrap(html)
# +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a String
# containing markup.
#
# Returns self, to support chaining of calls (e.g., root << child1 << child2)
# Returns +self+, to support chaining of calls (e.g., root << child1 << child2)
#
# Also see related method +add_child+.
def <<(node_or_tags)
Expand Down Expand Up @@ -241,7 +297,7 @@ def add_next_sibling(node_or_tags)
# +node_or_tags+ can be a Nokogiri::XML::Node, a ::DocumentFragment, a ::NodeSet, or a String
# containing markup.
#
# Returns self, to support chaining of calls.
# Returns +self+, to support chaining of calls.
#
# Also see related method +add_previous_sibling+.
def before(node_or_tags)
Expand All @@ -255,7 +311,7 @@ def before(node_or_tags)
# +node_or_tags+ can be a Nokogiri::XML::Node, a Nokogiri::XML::DocumentFragment, or a String
# containing markup.
#
# Returns self, to support chaining of calls.
# Returns +self+, to support chaining of calls.
#
# Also see related method +add_next_sibling+.
def after(node_or_tags)
Expand Down
71 changes: 67 additions & 4 deletions lib/nokogiri/xml/node_set.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# coding: utf-8
# frozen_string_literal: true

module Nokogiri
Expand Down Expand Up @@ -260,10 +261,72 @@ def inner_html(*args)
collect { |j| j.inner_html(*args) }.join("")
end

###
# Wrap this NodeSet with +html+
def wrap(html)
map { |node| node.wrap(html) }
# :call-seq:
# wrap(markup) -> self
# wrap(node) -> self
#
# Wrap each member of this NodeSet with the node parsed from +markup+ or a dup of the +node+.
#
# [Parameters]
# - *markup* (String)
# Markup that is parsed and used as the wrapper. Each node's parent, if it exists, is used
# as the context node for parsing; otherwise the associated document is used. If the parsed
# fragment has multiple roots, the first root node is used as the wrapper.
# - *node* (Nokogiri::XML::Node)
# An element that is `#dup`ed and used as the wrapper.
#
# [Returns] +self+, to support chaining.
#
# ⚠ Note that if a +String+ is passed, the markup will be parsed <b>once per node</b> in the
# NodeSet. You can avoid this overhead in cases where you know exactly the wrapper you wish to
# use by passing a +Node+ instead.
#
# Also see Node#wrap
#
# *Example* with a +String+ argument:
#
# doc = Nokogiri::HTML5(<<~HTML)
# <html><body>
# <a>a</a>
# <a>b</a>
# <a>c</a>
# <a>d</a>
# </body></html>
# HTML
# doc.css("a").wrap("<div></div>")
# doc.to_html
# # => <html><head></head><body>
# # <div><a>a</a></div>
# # <div><a>b</a></div>
# # <div><a>c</a></div>
# # <div><a>d</a></div>
# # </body></html>
#
# *Example* with a +Node+ argument
#
# 💡 Note that this is faster than the equivalent call passing a +String+ because it avoids
# having to reparse the wrapper markup for each node.
#
# doc = Nokogiri::HTML5(<<~HTML)
# <html><body>
# <a>a</a>
# <a>b</a>
# <a>c</a>
# <a>d</a>
# </body></html>
# HTML
# doc.css("a").wrap(doc.create_element("div"))
# doc.to_html
# # => <html><head></head><body>
# # <div><a>a</a></div>
# # <div><a>b</a></div>
# # <div><a>c</a></div>
# # <div><a>d</a></div>
# # </body></html>
#
def wrap(node_or_tags)
map { |node| node.wrap(node_or_tags) }
self
end

###
Expand Down
9 changes: 9 additions & 0 deletions test/html5/test_api.rb
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,15 @@ def test_html_eh
refute_predicate(doc, :xml?)
end

def test_node_wrap
doc = Nokogiri.HTML5("<html><body><div></div></body></html>")
div = doc.at_css("div")
div.wrap("<section></section>")

assert_equal("section", div.parent.name)
assert_equal("body", div.parent.parent.name)
end

describe Nokogiri::HTML5::Document do
describe "#fragment" do
it "parses text nodes in a `body` context" do
Expand Down
66 changes: 59 additions & 7 deletions test/xml/test_node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1297,13 +1297,65 @@ def test_text_node_robustness_gh1426
end
end

def test_wrap
xml = '<root><thing><div class="title">important thing</div></thing></root>'
doc = Nokogiri::XML(xml)
thing = doc.at_css("thing")
thing.wrap("<wrapper/>")
assert_equal("wrapper", thing.parent.name)
assert_equal("thing", doc.at_css("wrapper").children.first.name)
describe "#wrap" do
let(:xml) { "<root><thing><div>important thing</div></thing></root>" }
let(:doc) { Nokogiri::XML(xml) }

describe "string markup argument" do
it "parses and wraps" do
thing = doc.at_css("thing")
rval = thing.wrap("<wrapper/>")
wrapper = doc.at_css("wrapper")

assert_equal(rval, thing)
assert_equal(wrapper, thing.parent)
assert_equal("root", wrapper.parent.name)
assert_equal(1, wrapper.children.length)
assert_equal("thing", wrapper.children.first.name)
end

it "wraps unparented nodes" do
thing = doc.create_element("thing")
thing.wrap("<wrapper/>")

assert_equal("wrapper", thing.parent.name)
assert_nil(thing.parent.parent)
end
end

describe "Node argument" do
it "wraps using a dup of the node" do
thing = doc.at_css("thing")
wrapper_template = doc.create_element("wrapper")
rval = thing.wrap(wrapper_template)
wrapper = doc.at_css("wrapper")

assert_equal(rval, thing)
refute_equal(wrapper, wrapper_template)
assert_equal(wrapper, thing.parent)
assert_equal("root", wrapper.parent.name)
assert_equal(1, wrapper.children.length)
assert_equal("thing", wrapper.children.first.name)
end

it "wraps unparented nodes" do
thing = doc.create_element("thing")
wrapper_template = doc.create_element("wrapper")
thing.wrap(wrapper_template)

refute_equal(wrapper_template, thing.parent)
assert_equal("wrapper", thing.parent.name)
assert_nil(thing.parent.parent)
end
end

it "raises an ArgumentError on other types" do
thing = doc.at_css("thing")

assert_raises(ArgumentError) do
thing.wrap(1)
end
end
end

describe "#line" do
Expand Down
40 changes: 34 additions & 6 deletions test/xml/test_node_set.rb
Original file line number Diff line number Diff line change
Expand Up @@ -540,17 +540,45 @@ def awesome(ns)

describe "#wrap" do
it "wraps each node within a reified copy of the tag passed" do
employees = (xml / "//employee").wrap("<wrapper/>")
assert_equal("wrapper", employees[0].parent.name)
assert_equal("employee", xml.search("//wrapper").first.children[0].name)
employees = xml.css("employee")
rval = employees.wrap("<wrapper/>")
wrappers = xml.css("wrapper")

assert_equal(rval, employees)
assert_equal(employees.length, wrappers.length)
employees.each do |employee|
assert_equal("wrapper", employee.parent.name)
end
wrappers.each do |wrapper|
assert_equal("staff", wrapper.parent.name)
assert_equal(1, wrapper.children.length)
assert_equal("employee", wrapper.children.first.name)
end
end

it "wraps each node within a dup of the Node argument" do
employees = xml.css("employee")
rval = employees.wrap(xml.create_element("wrapper"))
wrappers = xml.css("wrapper")

assert_equal(rval, employees)
assert_equal(employees.length, wrappers.length)
employees.each do |employee|
assert_equal("wrapper", employee.parent.name)
end
wrappers.each do |wrapper|
assert_equal("staff", wrapper.parent.name)
assert_equal(1, wrapper.children.length)
assert_equal("employee", wrapper.children.first.name)
end
end

it "handles various node types and handles recursive reparenting" do
xml = "<root><foo>contents</foo></root>"
doc = Nokogiri::XML(xml)
doc = Nokogiri::XML("<root><foo>contents</foo></root>")
nodes = doc.at_css("root").xpath(".//* | .//*/text()") # foo and "contents"
nodes.wrap("<wrapper/>")
wrappers = doc.css("wrapper")

assert_equal("root", wrappers.first.parent.name)
assert_equal("foo", wrappers.first.children.first.name)
assert_equal("foo", wrappers.last.parent.name)
Expand All @@ -565,7 +593,7 @@ def awesome(ns)
<employee>goodbye</employee>
</employees>
EOXML
employees = frag.xpath(".//employee")
employees = frag.css("employee")
employees.wrap("<wrapper/>")
assert_equal("wrapper", employees[0].parent.name)
assert_equal("employee", frag.at(".//wrapper").children.first.name)
Expand Down

0 comments on commit 7507518

Please sign in to comment.