-
-
Notifications
You must be signed in to change notification settings - Fork 905
Examples
John F. Douthat edited this page Sep 11, 2019
·
7 revisions
Problem: Finding the previous, nearest Element of a certain type.
Solution: Using a recursive method to parse all elements regardless of being a sibling or a child of another sibling.
require 'rubygems'
require 'nokogiri'
def search_for_previous_element(node, name)
result = node
while result = result.previous_sibling
return result if result.element? && result.name == name
end
nil
end
parent = Nokogiri::HTML.fragment(DATA.read)
start_here = parent.at('div.block#foo')
# A Nokogiri::XML::Element of the nearest, previous h1.
previous_element_h1 = search_for_previous_element(start_here, 'h1')
puts previous_element_h1 #=> <h1>this is what I want</h1>
__END__
<lorem>
<h1>wrong one!</h1>
<ipsum>
<h1>wrong one!</h1>
<dolor></dolor>
<h1>this is what I want</h1>
<sit></sit>
<div class="block" id="foo">
this is where I start
</div>
<amet></amet>
<h1>wrong one!</h1>
</ipsum>
<h1>wrong one!</h1>
</lorem>
Problem: Given an HTML document like this...
<p>Not sure how to start your day? Let us help!</p>
<h1>1.0 Getting Started</h1>
<p>Welcome!</p>
<h2>1.1 First Things First</h2>
<p>Get out of bed.</p>
<h2>1.2 Get Dressed</h2>
<p>Put on your clothes.</p>
<h3>1.2.1 First, the undergarments</h3>
<p>...and then the rest</p>
<h1>2.0 Eating Breakfast</h1>
<p>And so on, and so on...</p>
...wrap the content of each 'section' in <div class='section'>...</div>
for hierarchical styling (e.g. with CSS such as div.section { margin-left:1em}
). The end result looks like this:
<p>Not sure how to start your day? Let us help!</p>
<h1>1.0 Getting Started</h1>
<div class='section'>
<p>Welcome!</p>
<h2>1.1 First Things First</h2>
<div class='section'>
<p>Get out of bed.</p>
</div>
<h2>1.2 Get Dressed</h2>
<div class='section'>
<p>Put on your clothes.</p>
<h3>1.2.1 First, the undergarments</h3>
<div class='section'>
<p>...and then the rest</p>
</div>
</div>
</div>
<h1>2.0 Eating Breakfast</h1>
<div class='section'>
<p>And so on, and so on...</p>
</div>
Solution: Use a stack while walking through the top level of the document, creating and inserting nodes as appropriate.
# Assuming doc is a Nokogiri::HTML::Document
if body = doc.css_at('body') then
stack = []
body.children.each do |node|
# non-matching nodes will get level of 0
level = node.name[ /h([1-6])/i, 1 ].to_i
level = 99 if level == 0
stack.pop while (top=stack.last) && top[:level]>=level
stack.last[:div].add_child( node ) if stack.last
if level<99
div = Nokogiri::XML::Node.new('div',@nokodoc)
div.set_attribute( 'class', 'section' )
node.add_next_sibling(div)
stack << { :div=>div, :level=>level }
end
end
end
Articles tagged Nokogiri on stackoverflow.com are another good resource for Nokogiri examples.