-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizer API creating mock context-element can cause XSS when used in different context #42
Comments
This reference, I think: https://html.spec.whatwg.org/multipage/parsing.html#parsing-html-fragments Thanks. I hadn't considered that. I guess one could have a My gut feeling is that predictability and ease-of-use of Sanitzer will be of higher value than expressivness, and that defining this based on a synthetic context element for all cases will be the better choice. That said, it's a subtle area with real bug potential, and I don't think I fully understand the consequences yet. |
For instance, The more fundamental problem is that the API parses and creates a tree internally (presumably using the HTML (This is one of the reasons I do not think the API should go string -> string.) |
Not sure if I follow... String -> String: I agree. (And I think so does everyone else.) I'm not sure we're all agreed on whether we can afford to drop this entirely or not, but at the very least this shouldn't be the default API. (PR #41 tries to help a little, by at least removing string -> string from the motivating examples.) "expects a serialized version of that tree to parse identically": I'm not sure this is so. As I understand it, with MathML or SVG as context element, the parser processes the children differently (e.g. puts them into MathML or SVG namespace, and apparently some name mangling for attributes). So if we'd use the DocumentFragement version, wouldn't it do the exact same wrong thing it'd do with the string version? That is, if I change your example to In case I don't: Then we really have a problem that parsing (by itself; even without unparsing) has multiple modes of operation, and we can choose to either drop all except one for simplicity, or alternatively provide an API that accounts for them. (Maybe |
Anyway, yes, the problem is also that the parser takes more inputs than this API allows for. https://html.spec.whatwg.org/#html-fragment-parsing-algorithm has a bunch of branching and initializing around the context element so I think it's definitely going to be a problem for some developers, especially if they use |
The mXSS aspect of this (see also #37) is mostly unavoidable as long as folks use |
While you changed the summary in October, I think the original issue is still valid. At least https://github.com/WICG/sanitizer-api#proposed-api still seems strictly less expressive than |
Despite the mXSS issue, @annevk noted that accepting an input might be error-prone and surprising to a developer. As an example, a developer wants to insert input like
I don't think tables are the most exciting use case, but I'm concerned if there are more subtle bugs lurking - bugs which aren't only confusing but maybe worse. So, if we want to make sure there is a context element supplied, I can see two options: @annevk suggested the (imo slightly more radical) alternative of proposing a new I think we might get away with the former (accepting a second, optional context parameter), but it's arguable a less cleaner design. @annevk suggested I tag @rniwa and @domenic to see what they think. |
Would Perhaps orthogonally, I think a context element should be mandatory, not optional. |
Wouldn't rearchitecting this mean that the sanitizer API would always be coupled to the DOM (i.e. you can't sanitize without passing it a node)? I see the value of a standalone sanitizer API that simply returns values, without also inserting them to the DOM directly - for example, that way you could sanitize in a Worker, or in general separate your sanitization and rendering layers. Sanitizing-and-returning is how DOMPurify works by default, and it seems to be a useful primitive for the devs -- regardless of the quirks that some context mixing introduces. Without a standalone "context free" return value (be it a string, |
Hmm. The way I'm thinking about this is that we need a good forward-looking solution, and one for all the existing code. I think for looking forward, we have the string-less The bigger -- much bigger -- problem is what to offer to existing code bases. IMHO Trusted Types - which allows you to gradually replace the string usage and to eventually enforce string-less DOM manipulation - is a pretty good answer, but let's just ignore TT for now since not all browsers support it. The problem with offering "radical" new APIs for legacy code is that legacy code tends to not pick them up. Libraries in particular want to be able to run on both older and newer browsers without maiintaining two versions, and that will turn them away from a sanitizing-setter until it's supported everywhere. Looking at the proposals with those two thoughts in mind: If we want to add a context-dependent string setter, why not make it node-based in the first place? If we want to replace string-based setters with context-dependent ones... I think we'll have an adoption problem. Maybe an alternative would be a Related, but not directly: It seems mXSS revolve around a handful of usual suspects. Basically, a number of elements with special parsing rules attached to them. With the split between defaults and a (non-overrridable) baseline we can be much more aggressive about the defaults. I wonder if we can make a fairly aggressive, usually-mXSS-safe default config. |
Given most browser engine's HTML parsing / tree building code isn't thread safe, I don't see how that would be possible at least in short to mid term although designing API with future use cases is always a good idea.
The issue here is that without knowing the context, we can't really initialize the parser. It's like being asked to play music without knowing key or tempo.
This kind of one-size-fits-all solution rarely works for something as context dependent as HTML parsing sanitation rules. This will quickly lead us to a situation in which we need to allow a new element in some context but then we'd have to study every dependency ever used to be confident that it's safe to do so.
Addressing most use cases is a good idea in most API designs but not so with security features. It's actively harmful to have a security API that's safe most of the time or unsafe in some edge cases. That's how almost all security bugs get introduced. |
One solution here might be to not directly modify the input string in |
What I suggested to @mozfreddyb at one point was to split the feature in two, in a way:
|
I like the suggested options, but we probably haven't exhausted the solution space yet.
I find one aspect important: The API usability/adoption angle. For security APIs, API usability has been a major problem. The whole XSS/mXSS space is arguably an example, since DOM node-based APIs have been largely XSS safe, but are apparently too annoying to use and so people use The particular danger I'm seeing here is fusing sanitization with tree insertion. That gives us a known-good context. But it also forces a particular code structure, where the sanitization must be the last step before tree manipulation. That fits some use cases, but not others.
Wrapping the sanitizer result is an option. It kinda re-invents Trusted Types. Looking at our TT experience, the good news is that it works, the bad news is that it has led to problems (where code that wasn't written to expect a wrapper threw exceptions), which we've compensated for by having deployability features. (E.g. CSP report-only, error events, etc.)
I do wonder whether we can have a simpler solution. The root cause of the security issues seems to be context confusion. (E.g. parse in one context; re-parse in another.) If so, then we can reliably fix it by ensuring a defined, generic context is always applied. (There's also usability issues, like no table cells supported outside of tables, but IMHO we can trade those against usability gains elsewhere.) If we had a Arguably, this is more consistent with the DOM than actual If the introductory assumption above - security issues come from context confusion - is false and there's already issues with a single context e.g. from only parsing, then I'd really appreciate concrete examples/pocs so I can understand. |
Given this is already possible, what is stopping authors from adopting that? Is it simply that people aren't aware of this option? Or don't realize the subtle difference between that and
There are are a few examples. For example, the following will result in the literal string
The context of where things are getting inserted matters as well:
So simply sanitizing HTML may not be enough without knowing where it can be inserted. |
I don't think |
Are there any examples that demonstrate how this is possible in the context of the Sanitizer API? It is purposefully very restrictive. For example, script nodes are never in the output, so this vector just isn't applicable. |
So... after pondering this for a bit, I think we now have a better idea of where this is headed. And I think we can cover all of the feedback, at the cost of changing the API around a good bit. I'll post an outline here, and will send a spec PR with the details fleshed out. Comments are very welcome, either here or on the PR. The corner stones are:
With those guidelines, the API would look like so: Sanitizer.sanitize( (Document or DocumentFragment) input) => DocumentFragment // No more strings as input.
Element.setInnerHTML(DOMString markup, Sanitizer s) // Context implied by 'this'. Name needs discussion.
Sanitizer.sanitizeFor(DOMString elementName, DOMString markup) => Element // Returns element (type elementName)
Those should largely solve the mXSS issues, at least when used as intended. The user still has a choice to get a plain string out, and thus "disappearing" the context on their own: wdyt? |
The promised PR is #99. It still leaves a few things open (like error conditions), but is a more fleshed out version of what I suggested above. |
|
The bigger question is IMHO mutating vs copy: Currently, we've assumed (and spec-ed) this as always retuning a copy. My thinking here is that one can do a bunch of "fancy" stuff to the JS wrappers of a DOM node, like assigning properties or functions. If we mutate the tree, all of those would remain whenever the node itself remains. If we make a copy, the resulting copy would be "clean" DOM nodes, as they would have been returned by
|
|
See https://extensiblewebmanifesto.org/ for why I favor exposing the smaller building blocks first. |
|
Mutation observers could run. We'd probably want to explicitly say that mutation events are not fired (still not specified, ...). I suspect that the cost of cloning would be prohibitive for some users. To make use of this you would essentially parse the HTML yourself. You wouldn't want the resulting tree to be redundantly cloned. |
I think WebIDL allows this, if at most one type is nullable.
In terms of developer ergonomics, In terms of smaller building blocks, The etensible web manifesto speaks of high vs low level features. I note that all options on the table are at a roughly comprable semantic level. But I think all of that is a wrong argument anyhow: If we want a theoretically minimal API, we have it already: The DOM. But what we want isn't theoretical minimalism, what we want is usable security. If a solution gets in the way of that goal, it's not a minimal solution. And as I've had to learn during this project there's a substantial number of pitfalls with parsing and tree manipulation (based on parsed strings). We should offer developers a package that offers them easy to use security primitives. That's the minimal set; not a smaller decomposition where we'll then have to hope that they correctly assemble the supplied pieces after all. |
I don't like the naming of |
Thanks, Jun! I want to acknowledge that we haven't discussed the setter's name (parameters, etc.) that much and we really need to do that. |
I'm not sure I follow. As I understand it
Step 2-4 are " Perhaps your point is that the mutations are not observable? How does that change the complexity implementation-wise? (Assuming we'd not fire mutation events as I mentioned earlier.) |
Specify new string handling, to reduce edge cases and strengthen our mXSS posture: - Remove DOMString from in SanitizerInput. - Remove sanitizeToString. - Add Sanitizer.sanitizeFor, which explicitly adds a parsing context. - Add Element.setHTML, which implicitly knows the parsing context. This addresses discussion in #42 (and also #37). It does change the API substantially. This will require a good bit of follow-on work, like changing examples in the readme & faq, etc.
The fragment parsing algorithm takes two arguments, a context element and a string. This API only takes a string. I guess that means it makes up its own context element, but this might not always be correct and I have a slight worry that the difference could be exploited at worst and at best lead to confusion.
(Related to #37.)
The text was updated successfully, but these errors were encountered: