Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling for string functions when parsing context element would be blocked by config. #101

Closed
otherdaniel opened this issue Jun 18, 2021 · 3 comments · Fixed by #208
Milestone

Comments

@otherdaniel
Copy link
Collaborator

I'm presently mucking around with an implementation of the API along the lines of #42 (comment) , to see whether it "feels" right. It mostly does, except for error handling: Since we now have an explicit context element... should the config apply to the context or not?

The issue comes up since several fairly simple test cases now break: There's a new test cases along the lines of:

  new Sanitizer({allowElements: ["p"]}).sanitize("<p>bla<em>bla</p>")  // "<p>blabla</p>

For re-writing the tests, I now need to pick a context (regardless of whether directly in the API, or indirectly by parsing separately). Most test don't use anything fancy, so I thought I'd just pick a <div> or <template>. In my current implementation, I'm also checking the context against the configuration (built-in & supplied), which means that neither context would work, since I'm only allowing <p>. That semms logical, but... fairly annoying. In particular, it means that you e.g. cannot sanitize into a <template> element without also allowing <template> in the tree.

On the other hand, if we would not check the context element against the configuration, then you could do something like .sanitizeFor("script", "alert(1)"), which seems like even worse.

  // Should this be: <template><p>blabla</p></template>, or null?
  new Sanitizer({allowElements: ["p"]}).sanitizeFor("template", "<p>bla<em>bla</p>")

  // Is this different from:
  ... .sanitizeFor("script", ...)
  some_script_element.SetInnerHTML("alert(1)") 

What I have presently implemented:

  • The context element gets checked against the configuration. (And thus also against the baseline & defaults).
  • .sanitizeFor returns null if the context would have been removed.
  • .SetSanitizedHTML / SetHTML throws if the context element would have been removed (since it can't fullfil the contract.)

Solutions I'm seeing:

  1. Check the context, as described above.
  2. Never check the context.
  3. Check the context, but with a more restricted check. (E.g. only against the baseline.)
  4. ???
@mozfreddyb mozfreddyb added the v1 label Mar 23, 2022
@mozfreddyb
Copy link
Collaborator

We've discussed this in the meeting. We're leaning towards ignoring the context element as part of the sanitizer pass. Essentially, we'd parse the input into a document fragment and only perform the sanitizer pass only on that resulting fragment.

Also, this would be inconsistent if we'd pay attention to a context element but not it's attributes.

@otherdaniel is going to think this through and incorporate this into e.g. #196.

@otherdaniel
Copy link
Collaborator Author

We've discussed this in the meeting. We're leaning towards ignoring the context element as part of the sanitizer pass. Essentially, we'd parse the input into a document fragment and only perform the sanitizer pass only on that resulting fragment.

Yes, but I for one was a bit unhappy with that. The part that I think we agree on is:

  • setHTML (& friends) will call replaceChildren(nodes) (or somesuch) on the context node, and will not otherwise modify the context node (or its attributes).

The part that I think we don't agree on:

  • Should the sanitize operation take the context element into account when making allow/remove decisions?

The case where this makes a substantial difference is if the context element is <script>: If $0 points to an HTMLScriptElement, what does $0.setHTML("alert(1);") do?

The options are:

  • The sanitize algorithm will remove the text node (because it wouldn't allow a text node in a <script> anywhere else), yielding an empty fragment, and will then $0.replaceChildren with that empty fragment, resulting in an empty <script>. That requires the sanitization algorithm to know about the context element.
  • The sanitize algorithm will allow the text node (because it always allows text nodes and only allows/disallows elements or attributes) and will then $0.replaceChilden with that fragment, resulting in <script>alert(1);</script>.
  • It could also throw, on grounds that it can't fulfill its safe-by-default contract.

The middle one is an XSS (by design), which seems like a terrible choice for a safe-by-default feature designed to be XSS-safe. My preference is the first; but I would also be happy with the third option.

In algorithm-y terms, those would require one extra step to check the context element against the defaults (or the config) at the beginning of the sanitize operation.

I'll add these three options to the explainer PR, so we have specific text to discuss..

@annevk
Copy link
Collaborator

annevk commented Nov 15, 2023

In the meeting we decided on a fourth option, to return early (but probably after sanitizer dictionary validation). In particular to avoid throwing for a case people might not expect to throw (they can branch on the element local name if they care about identifying the no-op).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants