Error handling for string functions when parsing context element would be blocked by config. #101

otherdaniel · 2021-06-18T16:17:22Z

I'm presently mucking around with an implementation of the API along the lines of #42 (comment) , to see whether it "feels" right. It mostly does, except for error handling: Since we now have an explicit context element... should the config apply to the context or not?

The issue comes up since several fairly simple test cases now break: There's a new test cases along the lines of:

  new Sanitizer({allowElements: ["p"]}).sanitize("<p>bla<em>bla</p>")  // "<p>blabla</p>

For re-writing the tests, I now need to pick a context (regardless of whether directly in the API, or indirectly by parsing separately). Most test don't use anything fancy, so I thought I'd just pick a <div> or <template>. In my current implementation, I'm also checking the context against the configuration (built-in & supplied), which means that neither context would work, since I'm only allowing <p>. That semms logical, but... fairly annoying. In particular, it means that you e.g. cannot sanitize into a <template> element without also allowing <template> in the tree.

On the other hand, if we would not check the context element against the configuration, then you could do something like .sanitizeFor("script", "alert(1)"), which seems like even worse.

  // Should this be: <template><p>blabla</p></template>, or null?
  new Sanitizer({allowElements: ["p"]}).sanitizeFor("template", "<p>bla<em>bla</p>")

  // Is this different from:
  ... .sanitizeFor("script", ...)
  some_script_element.SetInnerHTML("alert(1)")

What I have presently implemented:

The context element gets checked against the configuration. (And thus also against the baseline & defaults).
.sanitizeFor returns null if the context would have been removed.
.SetSanitizedHTML / SetHTML throws if the context element would have been removed (since it can't fullfil the contract.)

Solutions I'm seeing:

Check the context, as described above.
Never check the context.
Check the context, but with a more restricted check. (E.g. only against the baseline.)
???

The text was updated successfully, but these errors were encountered:

mozfreddyb · 2023-08-30T08:40:31Z

We've discussed this in the meeting. We're leaning towards ignoring the context element as part of the sanitizer pass. Essentially, we'd parse the input into a document fragment and only perform the sanitizer pass only on that resulting fragment.

Also, this would be inconsistent if we'd pay attention to a context element but not it's attributes.

@otherdaniel is going to think this through and incorporate this into e.g. #196.

otherdaniel · 2023-11-14T17:02:42Z

We've discussed this in the meeting. We're leaning towards ignoring the context element as part of the sanitizer pass. Essentially, we'd parse the input into a document fragment and only perform the sanitizer pass only on that resulting fragment.

Yes, but I for one was a bit unhappy with that. The part that I think we agree on is:

setHTML (& friends) will call replaceChildren(nodes) (or somesuch) on the context node, and will not otherwise modify the context node (or its attributes).

The part that I think we don't agree on:

Should the sanitize operation take the context element into account when making allow/remove decisions?

The case where this makes a substantial difference is if the context element is <script>: If $0 points to an HTMLScriptElement, what does $0.setHTML("alert(1);") do?

The options are:

The sanitize algorithm will remove the text node (because it wouldn't allow a text node in a <script> anywhere else), yielding an empty fragment, and will then $0.replaceChildren with that empty fragment, resulting in an empty <script>. That requires the sanitization algorithm to know about the context element.
The sanitize algorithm will allow the text node (because it always allows text nodes and only allows/disallows elements or attributes) and will then $0.replaceChilden with that fragment, resulting in <script>alert(1);</script>.
It could also throw, on grounds that it can't fulfill its safe-by-default contract.

The middle one is an XSS (by design), which seems like a terrible choice for a safe-by-default feature designed to be XSS-safe. My preference is the first; but I would also be happy with the third option.

In algorithm-y terms, those would require one extra step to check the context element against the defaults (or the config) at the beginning of the sanitize operation.

I'll add these three options to the explainer PR, so we have specific text to discuss..

annevk · 2023-11-15T12:29:46Z

In the meeting we decided on a fourth option, to return early (but probably after sanitizer dictionary validation). In particular to avoid throwing for a case people might not expect to throw (they can branch on the element local name if they care about identifying the no-op).

mozfreddyb added the v1 label Mar 23, 2022

annevk mentioned this issue Oct 18, 2023

Add more examples. #196

Merged

mozfreddyb added this to the v1 milestone Jan 23, 2024

otherdaniel mentioned this issue Mar 15, 2024

Update the spec to match the current group consensus. #208

Merged

otherdaniel closed this as completed in #208 Mar 19, 2024

otherdaniel closed this as completed in db21b50 Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error handling for string functions when parsing context element would be blocked by config. #101

Error handling for string functions when parsing context element would be blocked by config. #101

otherdaniel commented Jun 18, 2021

mozfreddyb commented Aug 30, 2023

otherdaniel commented Nov 14, 2023

annevk commented Nov 15, 2023

Error handling for string functions when parsing context element would be blocked by config. #101

Error handling for string functions when parsing context element would be blocked by config. #101

Comments

otherdaniel commented Jun 18, 2021

mozfreddyb commented Aug 30, 2023

otherdaniel commented Nov 14, 2023

annevk commented Nov 15, 2023