Skip to content

Commit

Permalink
Merge pull request #284 from josecolella/jc-th-add-breakpoint-scrubber
Browse files Browse the repository at this point in the history
[RubyConf] Create scrubber for replacing double breakpoints into paragraph nodes
  • Loading branch information
flavorjones authored Jan 1, 2025
2 parents 868a852 + 4d94183 commit 2abdafc
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Active Record extensions for HTML sanitization are available in the [`loofah-act
* Add the _nofollow_ attribute to all hyperlinks.
* Add the _target=\_blank_ attribute to all hyperlinks.
* Remove _unprintable_ characters from text nodes.
* Modify _double breakpoints_ characters to paragraph nodes.
* Format markup as plain text, with (or without) sensible whitespace handling around block elements.
* Replace Rails's `strip_tags` and `sanitize` view helper methods.

Expand Down Expand Up @@ -235,6 +236,7 @@ doc.scrub!(:noopener) # adds rel="noopener" attribute to links
doc.scrub!(:noreferrer) # adds rel="noreferrer" attribute to links
doc.scrub!(:unprintable) # removes unprintable characters from text nodes
doc.scrub!(:targetblank) # adds target="_blank" attribute to links
doc.scrub!(:double_breakpoint) # removes double breakpoints to paragraph nodes
```

See `Loofah::Scrubbers` for more details and example usage.
Expand Down
52 changes: 52 additions & 0 deletions lib/loofah/scrubbers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,57 @@ def scrub(node)
end
end

#
# === scrub!(:double_breakpoint)
#
# +:double_breakpoint+ replaces double-break tags with closing/opening paragraph tags.
#
# double_breakpoint_markup = "<p>Some text here in a logical paragraph.<br><br>Some more text, apparently a second paragraph.</p>"
# Loofah.html5_fragment(messy_markup).scrub!(:double_breakpoint)
# => "<p>Some text here in a logical paragraph.</p><p>Some more text, apparently a second paragraph.</p>"
#
class DoubleBreakpoint < Scrubber
def initialize # rubocop:disable Lint/MissingSuper
@direction = :top_down
end

def scrub(node)
return CONTINUE unless (node.type == Nokogiri::XML::Node::ELEMENT_NODE) && (node.name == "p")

paragraph_with_break_point_nodes = node.xpath("//p[br[following-sibling::br]]")

paragraph_with_break_point_nodes.each do |paragraph_node|
new_paragraph = paragraph_node.add_previous_sibling("<p>").first

paragraph_node.children.each do |child|
remove_blank_text_nodes(child)
end

paragraph_node.children.each do |child|
# already unlinked
next if child.parent.nil?

if child.name == "br" && child.next_sibling.name == "br"
new_paragraph = paragraph_node.add_previous_sibling("<p>").first
child.next_sibling.unlink
child.unlink
else
child.parent = new_paragraph
end
end

paragraph_node.unlink
end

CONTINUE
end

private

def remove_blank_text_nodes(node)
node.unlink if node.text? && node.blank?
end
end
#
# A hash that maps a symbol (like +:prune+) to the appropriate Scrubber (Loofah::Scrubbers::Prune).
#
Expand All @@ -364,6 +415,7 @@ def scrub(node)
targetblank: TargetBlank,
newline_block_elements: NewlineBlockElements,
unprintable: Unprintable,
double_breakpoint: DoubleBreakpoint,
}

class << self
Expand Down
13 changes: 13 additions & 0 deletions test/integration/test_scrubbers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ class IntegrationTestScrubbers < Loofah::TestCase
ENTITY_HACK_ATTACK_TEXT_SCRUB = "Hack attack!&lt;script&gt;alert('evil')&lt;/script&gt;"
ENTITY_HACK_ATTACK_TEXT_SCRUB_UNESC = "Hack attack!<script>alert('evil')</script>"

BREAKPOINT_FRAGMENT = "<p>Some text here in a logical paragraph.<br><br>Some more text, apparently a second paragraph.<br><br>Et cetera...</p>"
BREAKPOINT_RESULT = "<p>Some text here in a logical paragraph.</p><p>Some more text, apparently a second paragraph.</p><p>Et cetera...</p>"

context "scrubbing shortcuts" do
context "#scrub_document" do
it "is a shortcut for parse-and-scrub" do
Expand Down Expand Up @@ -236,6 +239,16 @@ def html5?
assert_equal doc, result
end
end

context ":double_breakpoint" do
it "replaces double line breaks with paragraph tags" do
doc = klass.parse("<html><body>#{BREAKPOINT_FRAGMENT}</body></html>")
result = doc.scrub!(:double_breakpoint)

assert_equal BREAKPOINT_RESULT, doc.xpath("/html/body").inner_html.delete("\n")
assert_equal doc, result
end
end
end

context "#text" do
Expand Down

0 comments on commit 2abdafc

Please sign in to comment.