-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RubyConf] Create scrubber for replacing double breakpoints into paragraph nodes #284
[RubyConf] Create scrubber for replacing double breakpoints into paragraph nodes #284
Conversation
@torihuang @josecolella Thanks again for this PR! Looking at the failing tests, it seems like the HTML4 and the HTML5 parser handle newlines differently, and that's causing a failure one way or the other depending on whether there are newlines in the expected result or not. Demonstration: #! /usr/bin/env ruby
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "loofah"
end
input = "<html><body><h1>Hello</h1><div>World</div></body></html>"
doc = Nokogiri::HTML4::Document.parse(input)
doc.at_css("h1").add_next_sibling("<b>there</b>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1>\n" + "<b>there</b><div>World</div>"
doc = Nokogiri::HTML4::Document.parse(input)
doc.at_css("h1").add_next_sibling("<p>there</p>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1>\n" + "<p>there</p>\n" + "<div>World</div>"
doc = Nokogiri::HTML5::Document.parse(input)
doc.at_css("h1").add_next_sibling("<b>there</b>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1><b>there</b><div>World</div>"
doc = Nokogiri::HTML5::Document.parse(input)
doc.at_css("h1").add_next_sibling("<p>there</p>")
doc.at_css("body").inner_html
# => "<h1>Hello</h1><p>there</p><div>World</div>" In any case, now that I understand what's going on, I'll wrap this up by tomorrow! |
Sounds good. Thanks @flavorjones |
@flavorjones were you still interested in getting this merged? |
@josecolella Totally! I've been really distracted the last few weeks, but I will absolutely circle back on this. |
@flavorjones Any update here? |
@josecolella Really sorry for the delay. This was harder than I expected to wrap up (at least in a way that I didn't think was gross). I will be spending a few weeks (at my new job!) on the sanitizer stack starting in late October and will do my best to get this merged then. |
59aac52
to
5ac043d
Compare
because the html4 and html5 parsers just handle tags and insert newlines differently, and their presence/absence is orthogonal to wrapping in a `p` tag.
5ac043d
to
4d94183
Compare
@torihuang and @josecolella -- Sorry this took so long to get back to! Thank you so much for the work, and for your patience. Merging, will be in v2.24.0 shortly. |
Thank you @flavorjones for your time in getting this merged! |
Why?
What?
:double_breakpoint
) that replaces double breakpoints into paragraph nodes (thank you @flavorjones )How did we test?
Important
There is a failing test right now where the expectation and actual result match except for newline characters. In discussing with @flavorjones, this might be related to minitest and how it formats html
References #279