Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizer API creating mock context-element can cause XSS when used in different context #42

Closed
annevk opened this issue Oct 13, 2020 · 29 comments
Labels

Comments

@annevk
Copy link
Collaborator

annevk commented Oct 13, 2020

The fragment parsing algorithm takes two arguments, a context element and a string. This API only takes a string. I guess that means it makes up its own context element, but this might not always be correct and I have a slight worry that the difference could be exploited at worst and at best lead to confusion.

(Related to #37.)

@otherdaniel
Copy link
Collaborator

This reference, I think: https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments

Thanks. I hadn't considered that. I guess one could have a sanitizeFor(DOMString, Element) variant.

My gut feeling is that predictability and ease-of-use of Sanitzer will be of higher value than expressivness, and that defining this based on a synthetic context element for all cases will be the better choice. That said, it's a subtle area with real bug potential, and I don't think I fully understand the consequences yet.

@annevk
Copy link
Collaborator Author

annevk commented Oct 13, 2020

For instance, obj.innerHTML = sanitize(input) would do completely the wrong thing when obj is an SVG or MathML element.

The more fundamental problem is that the API parses and creates a tree internally (presumably using the HTML body element as context or some such) and then expects a serialized version of that tree to parse identically, but it very much depends on context how that serialization will parse in actuality.

(This is one of the reasons I do not think the API should go string -> string.)

@otherdaniel
Copy link
Collaborator

Not sure if I follow...

String -> String: I agree. (And I think so does everyone else.) I'm not sure we're all agreed on whether we can afford to drop this entirely or not, but at the very least this shouldn't be the default API. (PR #41 tries to help a little, by at least removing string -> string from the motivating examples.)

"expects a serialized version of that tree to parse identically": I'm not sure this is so. As I understand it, with MathML or SVG as context element, the parser processes the children differently (e.g. puts them into MathML or SVG namespace, and apparently some name mangling for attributes). So if we'd use the DocumentFragement version, wouldn't it do the exact same wrong thing it'd do with the string version?

That is, if I change your example to obj.appendChild(s.sanitize(input)), wouldn't it mis-parse in the same fashion for an SVG obj? Or am I misunderstanding your argument?

In case I don't: Then we really have a problem that parsing (by itself; even without unparsing) has multiple modes of operation, and we can choose to either drop all except one for simplicity, or alternatively provide an API that accounts for them. (Maybe sanitizeFor with a context element, or maybe some sort of mode flag in the options.). My gut feeling is that there aren't all that super many use cases for user input directly into top-level SVG or MathML elements, and hence simplicity over expressivity might do fine; but I'm not super sure, either.

@annevk
Copy link
Collaborator Author

annevk commented Oct 13, 2020

obj.appendChild(s.sanitize(input)) would potentially result in a tree you cannot get with the parser, but it would not result in a tree that the sanitizer did not expect (due to a reparse resulting in a different tree).

Anyway, yes, the problem is also that the parser takes more inputs than this API allows for. https://html.spec.whatwg.org/#html-fragment-parsing-algorithm has a bunch of branching and initializing around the context element so I think it's definitely going to be a problem for some developers, especially if they use innerHTML on those elements today.

@mozfreddyb mozfreddyb changed the title API is less expressive than innerHTML Sanitizer API creating mock context-element can cause XSS when used in different context Oct 28, 2020
@mozfreddyb
Copy link
Collaborator

The mXSS aspect of this (see also #37) is mostly unavoidable as long as folks use sanitizeToString. We're going to nudge folks to use the non-string API as much as possible.

@annevk
Copy link
Collaborator Author

annevk commented Mar 23, 2021

While you changed the summary in October, I think the original issue is still valid. At least https://github.com/WICG/sanitizer-api#proposed-api still seems strictly less expressive than innerHTML.

@mozfreddyb mozfreddyb reopened this May 17, 2021
@mozfreddyb
Copy link
Collaborator

mozfreddyb commented May 17, 2021

Despite the mXSS issue, @annevk noted that accepting an input might be error-prone and surprising to a developer.

As an example, a developer wants to insert input like <td>test<td>test into a <table> element and wants to sanitize it before.

  • When doing so directly, using foo.innerHTML, the parser will have the right context element (foo) and parse accordingly.
  • When going through sanitize() however, the parser will make up a context-element on the fly (we currently create a mock body) and parse accordingly. With this specific input, the result of foo.append(s.sanitize('<td>test<td>test') will be testtest 😕

I don't think tables are the most exciting use case, but I'm concerned if there are more subtle bugs lurking - bugs which aren't only confusing but maybe worse.

So, if we want to make sure there is a context element supplied, I can see two options:
Change the sanitize() function to accept a second, optional parameter that denotes the context element (how exactly would you denote a parsing context? Reference an existing DOM node?) That would still leave the risk of someone sanitizing for one context and then re-using the sanitized output on another context.

@annevk suggested the (imo slightly more radical) alternative of proposing a new SetinnerHTML method for the Element interface that auto-sanitizes. This would solve the context issue, but would make custom sanitizer configurations more complicated.
It would leave the developer with a pattern that requires referencing a sanitizer config for every sanitizer invocation (whereas the
existing proposed API uses a new Sanitizer(/*options*/) pattern which captures them once for continued reuse). Anne noted developers who need to use custom sanitizer options may just write their own wrapper function which accepts an element and some input but hard-codes the config to be used.

I think we might get away with the former (accepting a second, optional context parameter), but it's arguable a less cleaner design. @annevk suggested I tag @rniwa and @domenic to see what they think.

@domenic
Copy link

domenic commented May 17, 2021

Would contextElement.setInnerHTML(html, { sanitizer: s }) work? That seems better than the bug-prone contextElement.append(s.sanitize(html)) and more elegant than contextElement.append(s.sanitize(html, contextElement)).

Perhaps orthogonally, I think a context element should be mandatory, not optional.

@koto
Copy link

koto commented May 17, 2021

Wouldn't rearchitecting this mean that the sanitizer API would always be coupled to the DOM (i.e. you can't sanitize without passing it a node)?

I see the value of a standalone sanitizer API that simply returns values, without also inserting them to the DOM directly - for example, that way you could sanitize in a Worker, or in general separate your sanitization and rendering layers. Sanitizing-and-returning is how DOMPurify works by default, and it seems to be a useful primitive for the devs -- regardless of the quirks that some context mixing introduces. Without a standalone "context free" return value (be it a string, TrustedHTML or a fragment), migrating existing sanitizers libs to the API would force them to create a string (likely from innerHTML getter) anyway, so I'm not sure how much that practically buys us. Dropping the context-free method is more elegant, but might make the API less applicable to what the devs need.

@otherdaniel
Copy link
Collaborator

Hmm. The way I'm thinking about this is that we need a good forward-looking solution, and one for all the existing code. I think for looking forward, we have the string-less .sanitize plus node manipulation (like .replaceChildren). So far I haven't seen much that would improve on this.

The bigger -- much bigger -- problem is what to offer to existing code bases. IMHO Trusted Types - which allows you to gradually replace the string usage and to eventually enforce string-less DOM manipulation - is a pretty good answer, but let's just ignore TT for now since not all browsers support it. The problem with offering "radical" new APIs for legacy code is that legacy code tends to not pick them up. Libraries in particular want to be able to run on both older and newer browsers without maiintaining two versions, and that will turn them away from a sanitizing-setter until it's supported everywhere.

Looking at the proposals with those two thoughts in mind: If we want to add a context-dependent string setter, why not make it node-based in the first place? If we want to replace string-based setters with context-dependent ones... I think we'll have an adoption problem.

Maybe an alternative would be a Document.setSanitizer(.), where an app can opt-in to having .innerHTML being sanitized with the chosen Sanitizer? That would allow existing libraries to (mostly?) keep working, both with or without Sanitizer. It'd also allow the top-level page to make a pro-Sanitizer decision without having to wait for all libraries being adapted first.


Related, but not directly: It seems mXSS revolve around a handful of usual suspects. Basically, a number of elements with special parsing rules attached to them. With the split between defaults and a (non-overrridable) baseline we can be much more aggressive about the defaults. I wonder if we can make a fairly aggressive, usually-mXSS-safe default config.

@rniwa
Copy link

rniwa commented May 18, 2021

I see the value of a standalone sanitizer API that simply returns values, without also inserting them to the DOM directly - for example, that way you could sanitize in a Worker

Given most browser engine's HTML parsing / tree building code isn't thread safe, I don't see how that would be possible at least in short to mid term although designing API with future use cases is always a good idea.

or in general separate your sanitization and rendering layers. Sanitizing-and-returning is how DOMPurify works by default, and it seems to be a useful primitive for the devs -- regardless of the quirks that some context mixing introduces.

The issue here is that without knowing the context, we can't really initialize the parser. It's like being asked to play music without knowing key or tempo.

Maybe an alternative would be a Document.setSanitizer(.), where an app can opt-in to having .innerHTML being sanitized with the chosen Sanitizer? That would allow existing libraries to (mostly?) keep working, both with or without Sanitizer. It'd also allow the top-level page to make a pro-Sanitizer decision without having to wait for all libraries being adapted first.

This kind of one-size-fits-all solution rarely works for something as context dependent as HTML parsing sanitation rules. This will quickly lead us to a situation in which we need to allow a new element in some context but then we'd have to study every dependency ever used to be confident that it's safe to do so.

Related, but not directly: It seems mXSS revolve around a handful of usual suspects. Basically, a number of elements with special parsing rules attached to them. With the split between defaults and a (non-overrridable) baseline we can be much more aggressive about the defaults. I wonder if we can make a fairly aggressive, usually-mXSS-safe default config.

Addressing most use cases is a good idea in most API designs but not so with security features. It's actively harmful to have a security API that's safe most of the time or unsafe in some edge cases. That's how almost all security bugs get introduced.

@rniwa
Copy link

rniwa commented May 18, 2021

One solution here might be to not directly modify the input string in sanitize method but instead SanitizedHTML object containing the exact same string. Only when that object is based to the setter of innerHTML and other DOM API, it would then get passed to the fragment parsing algorithm. This avoids the issue of having to know the context object at the time of invoking the sanitizing algorithm.

@annevk
Copy link
Collaborator Author

annevk commented May 18, 2021

What I suggested to @mozfreddyb at one point was to split the feature in two, in a way:

  • setInnerHTML() (open to bikeshedding to something like setHTML() to make it more attractive than innerHTML) that is safe by default and also takes configuration
  • A Sanitizer object of sorts that takes a DocumentFragment and returns a DocumentFragment. This exposes what setInnerHTML() would use internally to sanitize the parsed string. This could be the migration path for libraries that do not want to use the context-sensitive setInnerHTML() for some reason.

@otherdaniel
Copy link
Collaborator

I like the suggested options, but we probably haven't exhausted the solution space yet.

 

I find one aspect important: The API usability/adoption angle. For security APIs, API usability has been a major problem. The whole XSS/mXSS space is arguably an example, since DOM node-based APIs have been largely XSS safe, but are apparently too annoying to use and so people use .innerHTML instead. The problem here isn't edge cases, but a lack of easy to use options. It's very much the job of Sanitizer to make security easy. If we restrict the API too much, we loose that.

The particular danger I'm seeing here is fusing sanitization with tree insertion. That gives us a known-good context. But it also forces a particular code structure, where the sanitization must be the last step before tree manipulation. That fits some use cases, but not others.

 

Wrapping the sanitizer result is an option. It kinda re-invents Trusted Types. Looking at our TT experience, the good news is that it works, the bad news is that it has led to problems (where code that wasn't written to expect a wrapper threw exceptions), which we've compensated for by having deployability features. (E.g. CSP report-only, error events, etc.)

 

I do wonder whether we can have a simpler solution. The root cause of the security issues seems to be context confusion. (E.g. parse in one context; re-parse in another.) If so, then we can reliably fix it by ensuring a defined, generic context is always applied. (There's also usability issues, like no table cells supported outside of tables, but IMHO we can trade those against usability gains elsewhere.)

If we had a .predictableInnerHTML= accessor that uses a generic context - like DOMParser does, or the current Sanitizer API - then we'd have a very easy way to sanitize while supporting existing code. Basically, an accessor that does .replaceChildren(new DOMParser.parse(argument)).

Arguably, this is more consistent with the DOM than actual .innerHTML= behaviour, which changes behaviour in subtle ways depending on its callee. And .predictableInnerHTML is even polyfill-able.

If the introductory assumption above - security issues come from context confusion - is false and there's already issues with a single context e.g. from only parsing, then I'd really appreciate concrete examples/pocs so I can understand.

@rniwa
Copy link

rniwa commented May 22, 2021

If we had a .predictableInnerHTML= accessor that uses a generic context - like DOMParser does, or the current Sanitizer API - then we'd have a very easy way to sanitize while supporting existing code. Basically, an accessor that does .replaceChildren(new DOMParser.parse(argument)).

Given this is already possible, what is stopping authors from adopting that? Is it simply that people aren't aware of this option? Or don't realize the subtle difference between that and innerHTML?

If the introductory assumption above - security issues come from context confusion - is false and there's already issues with a single context e.g. from only parsing, then I'd really appreciate concrete examples/pocs so I can understand.

There are are a few examples. For example, the following will result in the literal string hello<script>world</script> to be inserted as a child of body element.

document.body.appendChild(document.createElement('plaintext')).innerHTML = hello<script>world</script>';

The context of where things are getting inserted matters as well:

document.body.appendChild(document.createElement('script')).innerHTML = 'alert("oops")';
document.body.setAttribute('onmousemove', 'alert("oops")');

So simply sanitizing HTML may not be enough without knowing where it can be inserted.

@annevk
Copy link
Collaborator Author

annevk commented May 23, 2021

I don't think predictableInnerHTML makes sense in the context of the HTML parser, as it's context-sensitive by nature. If you wanted an actual analogue to replaceChildren() you'd need something like E4X.

@koto
Copy link

koto commented May 25, 2021

There are are a few examples. For example, the following will result in the literal string hello<script>world</script> to be inserted as a child of body element.

document.body.appendChild(document.createElement('plaintext')).innerHTML = hello<script>world</script>';

Are there any examples that demonstrate how this is possible in the context of the Sanitizer API? It is purposefully very restrictive. For example, script nodes are never in the output, so this vector just isn't applicable.

@otherdaniel
Copy link
Collaborator

otherdaniel commented Jun 9, 2021

So... after pondering this for a bit, I think we now have a better idea of where this is headed. And I think we can cover all of the feedback, at the cost of changing the API around a good bit. I'll post an outline here, and will send a spec PR with the details fleshed out. Comments are very welcome, either here or on the PR.

The corner stones are:

  • We didn't find an agreeable way in which we could divorce HTML parsing/unparsing from its context. Nor did there seem to be an obviously good way to create a "synthetic" context for general use.
  • Thus, all Sanitizer operations that parse HTML will require a context element, either implicitly or explicitly.
  • The Sanitizer will not "disappear" the context. If an operation result reflects a specific parse context, this context will be reflected in the value returned.

With those guidelines, the API would look like so:

  Sanitizer.sanitize( (Document or DocumentFragment) input) => DocumentFragment  // No more strings as input.
  Element.setInnerHTML(DOMString markup, Sanitizer s)  // Context implied by 'this'. Name needs discussion.
  Sanitizer.sanitizeFor(DOMString elementName, DOMString markup) => Element  // Returns element (type elementName)
  • The first one should be uncontroversial. But since it no longer takes a string argument, I'm guessing it will be less useful in practice.
  • The second one should also be uncontroversial - maybe except its name, and/or exact signature - but it requires that sanitization always occurs as the last step, just before insertion into the DOM. This perfectly matches some use cases, but others not at all.
  • The final one is arguably a bit surprising: It creates a DOM node, then parses into that node, and returns it. The thinking here is that parsing from a string - without modifying the DOM - is such an essential need, that we require some operation to support it. In otder to do so, the user must give us a context. We then realized that a container for the parse context plus the parse tree already exists, and it's the element node itself. Thus, that's what we return.

Those should largely solve the mXSS issues, at least when used as intended. The user still has a choice to get a plain string out, and thus "disappearing" the context on their own: .sanitizeFor(....).innerHTML. However, this requires an active step by the user, and the API is structured to at least show a better path.

wdyt?

@otherdaniel
Copy link
Collaborator

otherdaniel commented Jun 9, 2021

The promised PR is #99. It still leaves a few things open (like error conditions), but is a more fleshed out version of what I suggested above.

@annevk
Copy link
Collaborator Author

annevk commented Jun 14, 2021

Sanitizer.sanitize, it seems weird to return a DocumentFragment when a Document is passed. Is that really what callers would want? Also, why does this need to return anything to begin with? Can it not mutate the input?

Sanitizer.sanitizeFor seems like it would be extremely easy to implement yourself. I'd rather wait with this until we've established that's a common wrapper utility people write around setInnerHTML().

@otherdaniel
Copy link
Collaborator

Sanitizer.sanitize: My assumption is that the user wants to, sooner or later, take that result and stick it into the DOM somewhere, which is sort of what a fragment is there for. I guess if we return a copy, it would make sense to consistently use the same root node type for the copy. That said, I don't feel strongly here.

The bigger question is IMHO mutating vs copy: Currently, we've assumed (and spec-ed) this as always retuning a copy. My thinking here is that one can do a bunch of "fancy" stuff to the JS wrappers of a DOM node, like assigning properties or functions. If we mutate the tree, all of those would remain whenever the node itself remains. If we make a copy, the resulting copy would be "clean" DOM nodes, as they would have been returned by .createElement. Note sure whether (or how much) that matters, though. I'd really like to hear opinions on that.

Sanitizer.sanitzeFor: One can implement either .SetInnerHTML or .sanitizeFor with the respective other. I don't think there's a natural pick for the base. My plan would be to put use counters on both in some early trial, and to then compare usage. That is a fairer comparison than picking one as the default and then waiting for people to invent the other, without a means to even figure out whether people have done that or not.

@mozfreddyb
Copy link
Collaborator

  1. Can we easily implement & spec it with overloading so that there is a sanitize() which accepts a document and returns a document and another sanitize that accepts and returns a DocumentFragment?
  2. I like the idea of designing both sanitizeFor and the setter and seeing which one "wins".

@annevk
Copy link
Collaborator Author

annevk commented Jun 14, 2021

  1. I think letting the caller do the clone is cleaner. We should be exposing the building blocks and if it always cloned there's no access to the faster alternative. And in that case the API can accept any Node or any ParentNode.
  2. It seems rather clear to me that setInnerHTML() is the smaller building block here. Otherwise you are forced to create a new element you might not want.

See https://extensiblewebmanifesto.org/ for why I favor exposing the smaller building blocks first.

@mozfreddyb
Copy link
Collaborator

  1. Can you expand on the latest bit, @annevk? Are you saying that we can easily accept any of {Node, ParentNode, Document, DocumentFragment} when we don't clone? I think what @otherdaniel was getting at is that we were concerned that we might run into weird side effects (e.g., scripts running? MutationObservers?) if we are using a "live node". Can you help us expand on that?

@annevk
Copy link
Collaborator Author

annevk commented Jun 14, 2021

Mutation observers could run. We'd probably want to explicitly say that mutation events are not fired (still not specified, ...).

I suspect that the cost of cloning would be prohibitive for some users. To make use of this you would essentially parse the HTML yourself. You wouldn't want the resulting tree to be redundantly cloned.

@otherdaniel
Copy link
Collaborator

Can we easily implement & spec it with overloading so that there is a sanitize() which accepts a document and returns a document and another sanitize that accepts and returns a DocumentFragment?

I think WebIDL allows this, if at most one type is nullable.

It seems rather clear to me that setInnerHTML() is the smaller building block here.

In terms of developer ergonomics, .setInnerHTML and .sanitizeFor cover different use cases. While you can emulate each with the other, I don't think we're doing developers a favour that way.

In terms of smaller building blocks, .setInnerHTML does parsing, sanitization, plus DOM mutation, while .sanitizeFor does only the first two. The freshly created node cannot have children already, so parsing into an unconnected, fresh node is a substantially less complex operation than replacing an existing, potentially live sub-tree.

The etensible web manifesto speaks of high vs low level features. I note that all options on the table are at a roughly comprable semantic level.

But I think all of that is a wrong argument anyhow: If we want a theoretically minimal API, we have it already: The DOM. But what we want isn't theoretical minimalism, what we want is usable security. If a solution gets in the way of that goal, it's not a minimal solution. And as I've had to learn during this project there's a substantial number of pitfalls with parsing and tree manipulation (based on parsed strings). We should offer developers a package that offers them easy to use security primitives. That's the minimal set; not a smaller decomposition where we'll then have to hope that they correctly assemble the supplied pieces after all.

@shhnjk
Copy link

shhnjk commented Jun 14, 2021

I don't like the naming of setInnerHTML. In the security industry, we said so much about avoid using innerHTML, that library like React use the name called dangerouslySetInnerHTML. It's super hard to tell people now that "Hey! setInnerHTML is safe!". At the very least, it's confusing for developers. setHTML (or setSanitizedHTML) sounds better.

@mozfreddyb
Copy link
Collaborator

Thanks, Jun! I want to acknowledge that we haven't discussed the setter's name (parameters, etc.) that much and we really need to do that.
But let's please move the naming discussion into #100, so that this issue here remains focused on ensuring that our API parses with the right context.

@annevk
Copy link
Collaborator Author

annevk commented Jun 17, 2021

I'm not sure I follow. As I understand it sanitizeFor() does these things:

  1. Creates an element.
  2. Parses.
  3. Sanitizes.
  4. Mutates the element created in step 1.

Step 2-4 are "setInnerHTML()", step 1 is a novel element-creation API and I'd rather not try to bite that off as part of this effort. See whatwg/dom#150 for what that might entail.

Perhaps your point is that the mutations are not observable? How does that change the complexity implementation-wise? (Assuming we'd not fire mutation events as I mentioned earlier.)

otherdaniel added a commit that referenced this issue Jul 13, 2021
Specify new string handling, to reduce edge cases and strengthen our mXSS posture:
- Remove DOMString from in SanitizerInput.
- Remove sanitizeToString.
- Add Sanitizer.sanitizeFor, which explicitly adds a parsing context.
- Add Element.setHTML, which implicitly knows the parsing context.

This addresses discussion in #42 (and also #37). It does change the API substantially.
This will require a good bit of follow-on work, like changing examples in the readme & faq, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants