PermitScrubber treats ProcessingInstructions as Elements #115

dmpotter44 · 2021-05-24T21:35:37Z

The PermitScrubber class treats ProcessingInstructions as if they were Elements. (Actually, strictly speaking, it doesn't check the node type at all, beyond special handling for CDATA nodes, it just checks the name field.) This leads to some odd behavior:

require 'rails-html-sanitizer'
sanitizer = Rails::Html::SafeListSanitizer.new
# Sanitizer is using PermitScrubber with default properties
# The actual issue is within PermitScrubber, but for simplicity in showing the
# issue, these examples use a SafeListSanitizer
sanitizer.sanitize('<a href=example.html>Expected link</a><?a href=pi.html>not expected PI')
# => "<a href=\"example.html\">Expected link</a><?a href=pi.html>not expected PI"

However, this is because the PI has a white-listed name. So, for example:

sanitizer.sanitize('<b>Bold</b><?made up>')
# => "<b>Bold</b>"

I haven't figured out any way to do anything evil with this in modern browsers because modern browsers seem to correctly parse and ignore processing instructions. However, the contents of the PI are passed through to the final result unmodified, so:

sanitizer.sanitize('<?a <script>alert("Hello")//<?a </script>')
# => "<?a <script>alert(\"Hello\")//<?a </script>"

There was apparently an issue in older versions of Internet Explorer where PIs were not parsed correctly, potentially allowing the script to be run. It does not appear to run in modern browsers, though. (Chrome just changes the PIs into comments.)

It's worth noting that, yes, HTML has processing instructions, because SGML has processing instructions. <?a something> is a valid HTML processing instruction. It doesn't do anything, but it's valid SGML and may be parsed as such. (Unlike XML PIs, SGML PIs end with just > and not ?>.)

The solution is likely to update the scrub method in PermitScrubber to remove PIs. Alternatively, the allowed_node? method could be updated to determine if the node is really an element before checking it against the whitelist.

The text was updated successfully, but these errors were encountered:

flavorjones · 2021-05-26T04:33:21Z

@dmpotter44 Thanks for opening this issue! I've reproduced what you're seeing.

I'll take a deeper look when I have a bit more time, but I think I agree with your assessment that these nodes should be stripped when sanitizing.

Fixes #115

flavorjones · 2021-07-19T03:17:31Z

PR submitted at #116 to address this.

Some scrubbers want to allow comments through, but in v1.4.0 didn't get the chance because only elements were passed through to `keep_node?`. This change allows comments and elements through, but still omits other non-elements like processing instructions (see #115).

flavorjones self-assigned this May 26, 2021

flavorjones added a commit that referenced this issue Jul 19, 2021

PermitScrubber does not permit Processing Instructions

c06d465

Fixes #115

flavorjones mentioned this issue Jul 19, 2021

PermitScrubber does not permit Processing Instructions #116

Merged

flavorjones closed this as completed in #116 Jul 20, 2021

flavorjones mentioned this issue Aug 18, 2021

fix: pass comment nodes to the scrubber #117

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PermitScrubber treats ProcessingInstructions as Elements #115

PermitScrubber treats ProcessingInstructions as Elements #115

dmpotter44 commented May 24, 2021

flavorjones commented May 26, 2021

flavorjones commented Jul 19, 2021

PermitScrubber treats ProcessingInstructions as Elements #115

PermitScrubber treats ProcessingInstructions as Elements #115

Comments

dmpotter44 commented May 24, 2021

flavorjones commented May 26, 2021

flavorjones commented Jul 19, 2021