Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clarifications for the pickling design proposal #393

Open
polx opened this issue Feb 18, 2022 · 24 comments
Open

clarifications for the pickling design proposal #393

polx opened this issue Feb 18, 2022 · 24 comments

Comments

@polx
Copy link

polx commented Feb 18, 2022

Hello authors of the pickling clipboard api extension design,

The document was called to my attention thanks to a comment about the clipboard APIs which definitely suffer from a very strict set of sanitized formats and sanitization procedure (and that some security aware people would even call not engouh, e.g. for html inclusions where, for example, a picture that was not shown starts to be shown when pasted inside the mail composition window). I very much appreciate the possibility of opening the boundaries of clipboard exchanges and would love to help make it a reality for mathematical content at least.

One clarification that would be useful is the amount of user consent that would be needed. Each consent is one more rejection chance for a newbie user. Only then would the security considerations be complete, I think.
I seem to understand that some consents are going to be "I, the geometry app Xyz, want to accept any mathematical expression (application/mathml) offered from site a.b.c". Is this a correct interpretation? The problematic of representing the type to an end-user seems open.

Are we running into a dangerous archive of consents that users would forget about? Maybe expiring them would be useful.

The problem of counting the formats' names is a bit strange to me. 100 sounds too small for extreme cases... I would prefer several hundreds.

In the pickling format, I am not sure I understood how binary formats will be stored... but I guess it's ok for a design.

Finally, would the set of (web and desktop) apps that have been visited by a user not be something useful for the user itself? E.g. so as to explore the possibilities? I guess that this is just speculation...

@polx
Copy link
Author

polx commented Mar 2, 2022

one more question @snianu : Do I understand that it makes sense for content types communities to actually agree on naming pickles' names if they want interoperability in advance?

@snianu
Copy link
Contributor

snianu commented Mar 2, 2022

Do I understand that it makes sense for content types communities to actually agree on naming pickles' names if they want interoperability in advance?

That is a good question. We haven't actually thought about how a web custom format could become part of the mandatory data list. Adding the agenda label to discuss this in the EditingWG meeting. Also cc'ing @annevk @whsieh @BoCupp-Microsoft @mbrodesser

@snianu snianu added the Agenda+ Agenda item to be inserted in the Editing TF meeting queue label Mar 2, 2022
@snianu
Copy link
Contributor

snianu commented Mar 9, 2022

@polx

Are we running into a dangerous archive of consents that users would forget about? Maybe expiring them would be useful.

Chromium browsers follow the permission model described here. This is per origin and gets triggered regardless of the formats being read/written via the async API.

The problem of counting the formats' names is a bit strange to me. 100 sounds too small for extreme cases... I would prefer several hundreds.

This limit was chosen because of the atom table size restriction on Windows. It is system wide and not limited to any particular app. Once the atom table is exhausted, the system basically goes into a bad state that is not recoverable. Here is a blog post about it.

In the pickling format, I am not sure I understood how binary formats will be stored

The payload is in terms of raw bytes. That should work for binary formats?

Finally, would the set of (web and desktop) apps that have been visited by a user not be something useful for the user itself?

I don't think I understand the question. Are you asking what happens if the destination app doesn't have support for the custom format copied from another app? The answer to this question is, custom formats by definition are not readable by apps that don't have support for it.

@css-meeting-bot
Copy link
Member

The Web Editing Working Group just discussed clarification on pickling.

The full IRC log of that discussion <Travis> Topic: clarification on pickling
<Travis> github: https://github.com//issues/393
<Travis> annevk: I haven't had time to look at this.
<Travis> Anupam: opener hasn't responded to my comments.
<Travis> .. concerns about writing 1000 of formats. (I tested and it does indeed bog-down my computer.)
<Travis> .. I think a hundred is reasonable.
<Travis> .. Went through security review and they were OK with that.
<Travis> Annevk: are you saying global total is 100?
<Travis> Anupam: I think there may be a security problem on your hands...
<Travis> (Sorry that comment was Annevk)
<Travis> Anupam: new windows APIs have a global limit.
<Travis> Anupam: So, attack vector is that two origins use different custom formats to communicate. (Similar to socket connections.)
<Travis> Travis: can you explain the attack?
<Travis> annevk: one origin takes all 100 formats, then another tries to use a custom format and is denied.
<Travis> .. Then the first origin can know which formats were attempted based on which ones had been added previously.
<Travis> (editor's note: Sorry didn't capture that very well)
<Travis> Annevk: suggests looking over: https://xsleaks.dev/
<Travis> whsieh: Yep, this is why Webkit blocks cross-origin custom pasteboard access.
<Travis> Travis: so some of us will need to revisit restrictions...?
<Travis> Anupam: raising the limit to 16K is Windows' limit--that could be a problem.
<Travis> annevk: you could add a limit-per-origin
<whsieh> platform info is in the UA already, no?
<Travis> .. Each type that the origin uses adds a "salt" to add randomization to prevent the other origin from guessing.
<Travis> +1 (I like that)
<Travis> johanneswilm: This is just some advice to chromium folks.
<Travis> .. anything spec-wise?
<Travis> Anupam: I think we need more discussion? Needs to be a limit and have it documented somewhere.

@polx
Copy link
Author

polx commented Mar 10, 2022

Thanks @snianu : I am still not convinced by the 100 limit but it will not be a fight subject. I understood it was per origin but the log seems to discuss it differently.

So, attack vector is that two origins use different custom formats to communicate. (Similar to socket connections.)

I didn't understand that one.

Finally, would the set of (web and desktop) apps that have been visited by a user not be something useful for the user itself?
I don't think I understand the question. Are you asking what happens if the destination app doesn't have support for the custom format copied from another app? The answer to this question is, custom formats by definition are not readable by apps that don't have support for it.

Nope. I am asking if there could not be a UI offered/conceived/discussed at browsers or OSs that say: "Ah, you have a clipboard-pickle of type zzz; you could open it at the website x/a/b and using app zz/zz."
This is done with files currently (and not always very well performing).

@snianu
Copy link
Contributor

snianu commented Mar 10, 2022

@polx

Nope. I am asking if there could not be a UI offered/conceived/discussed at browsers or OSs that say: "Ah, you have a clipboard-pickle of type zzz; you could open it at the website x/a/b and using app zz/zz."

Sorry, I'm still confused. Maybe I'll ask specific questions that I have which might be helpful:

  1. Chromium browsers require permission prompts to read/write clipboard formats regardless of whether its custom or well defined. It is only required once per origin, so I don't think we would want to show UIs for each format as that would be intrusive and not a good experience for the user. Also, I'm not sure how having yet another UI for custom formats improves security? If a web author chooses to display a custom dialog during a copy or paste operation, then they are free to do so, but I don't think we would want this experience on our platform.
  2. If a destination app doesn't support a format, then how is this new UI going to help the user? We don't know the paste target during copy, so how could a browser generate this custom message?

I am still not convinced by the 100 limit but it will not be a fight subject.

Having this per origin doesn't address the security issues listed here.

On Windows & Linux, generation of clipboard formats dynamically risks exhaustion of the atom pool. On Windows, there is room for around [16,000 registered window messages and clipboard formats](https://devblogs.microsoft.com/oldnewthing/20150319-00/?p=44433). Once those are exhausted, things will start behaving erratically because window classes use the same pool of atoms as clipboard formats. Applications will not be able to register window classes until the user logs off and back on. Linux has a limitation on the atom space as well, so an approach to overcome these limitations needs to be devised.

This could just be UA specific and doesn't have to be in the spec, but our internal security reviewers want a restriction per user session as this affects the OS and not just the browser or origin.

@BoCupp-Microsoft
Copy link
Contributor

Apologies for missing today's editing WG meeting. Looking at the minutes it seems there was some confusion over the impact of the proposed cap of 100 simultaneous custom formats described here in the explainer. I'm looking at this particular comment from the minutes:

annevk: one origin takes all 100 formats, then another tries to use a custom format and is denied.
.. Then the first origin can know which formats were attempted based on which ones had been added previously.
(editor's note: Sorry didn't capture that very well)
Annevk: suggests looking over: https://xsleaks.dev/
whsieh: Yep, this is why Webkit blocks cross-origin custom pasteboard access.

What the explainer says is that a UA should reserve up to 100 generically named slots in which to store custom clipboard content from web apps in order to work around OS limitations on Windows and Linux (and possibly other platforms). Additionally, it goes on to say that a Web Custom Format Map entry (also on the clipboard) will contain a mapping of mime-type to the corresponding generic clipboard slot into which the UA stored the clipboard contents for that mime-type. In this way, no matter what custom mime-type is written to the clipboard by the author, there's always a fixed cap on the system resources the browser can consume for the web custom format feature.

Code examples...

Site A does:

const map = {}
for (let i = 0; i < 100; i++) {
  map[`text/custom${i}`] = `clipboard content of text/custom${i}`
}
navigator.clipboard.write([new ClipboardItem(map)])

This produces 101 clipboard entires on Windows. 100 of the clipboard entries are named "Web Custom Format 0..N", each of which contain a text value "clipboard content of text/custom0..N". The 101st entry is named "Web Custom Format Map" and contains JSON as follows:

{
  "text/custom0" : "Web Custom Format0",
  "text/custom1" : "Web Custom Format1",
  "text/custom2" : "Web Custom Format2"
  ...
  "text/custom99": "Web Custom Format99"
}

When site B writes similar code but with different mime-types, for example foo/bar0..N, they will still be using the same system clipboard slots previously reserved named "Web Custom Format0..N"; no new clipboard format name registrations take place with the system even though different custom mime-types are being placed on the clipboard, and no exhaustion of system resources occurs. Further, there should be no side channel attack where an attacker discovers a previously written mime-type by another origin, since the name of the mime-type never triggers a rejection. The only rejection would come if an author simultaneously tried to write more than 100 custom clipboard formats to the clipboard all at once.

Tagging @annevk and @whsieh since it was your comments which drew my attention in the minutes.

@annevk
Copy link
Member

annevk commented Mar 11, 2022

So there's no clipboard history essentially? And each write overwrites so you can't really observe timing differences? Seems like that would work, in principle.

@polx
Copy link
Author

polx commented Mar 13, 2022

If the limit is system-wide then 100 is really a small fraction compared to common limits as 16000. But then...

Maybe the comment of @annevk is the right question: Is a history not intended to be stored? Beyond history, doesn't this give a possibility for one site in an open tab to destroy the experience made in another open tab because the registrations would be removed or dropped? It really seems to me that limiting by origin is necessary, and, I guess, limiting the number of registering origins.

Tagging @snianu and @BoCupp-Microsoft for insights.

@snianu
Copy link
Contributor

snianu commented Mar 14, 2022

If the limit is system-wide then 100 is really a small fraction compared to common limits as 16000.

The global atom table is available to all applications, and it is used for things like class registration, window message registration etc, so it isn't just for clipboard formats. See here for more details.

Beyond history, doesn't this give a possibility for one site in an open tab to destroy the experience made in another open tab because the registrations would be removed or dropped?

How is this different than writing just text/html in one tab and then writing text/plain in another tab? In this case text/html will be replaced by text/plain.

@BoCupp-Microsoft
Copy link
Contributor

So there's no clipboard history essentially? And each write overwrites so you can't really observe timing differences? Seems like that would work, in principle.

The only potentially detectable timing difference I can think of would be the maximum number Web Custom Formats ever simultaneously written to the clipboard. That assumes that a browser would call RegisterClipboardFormat (on Windows) for the number of slots needed to store an author's custom mime-types and not just reserve all 100 slots at once. It also assumes that the first call to register "Web Custom FormatN" takes a detectable different amount of time than subsequent calls. If that turns out to be a problem the browser can just reserve all slots at once.

doesn't this give a possibility for one site in an open tab to destroy the experience made in another open tab because the registrations would be removed or dropped?

No. At least on Windows, registrations are only additive and non-destructive. Also keep in mind the author isn't registering new formats, the browser is, and the formats always have the same names "Web Custom FormatN".

@annevk
Copy link
Member

annevk commented Mar 15, 2022

What about the history question?

@BoCupp-Microsoft
Copy link
Contributor

What about the history question?

I'm guessing you're asking this question more plainly: can attacker.com write code to take all remaining mime-types from the 100 allowable mime-types, and then test arbitrary mime-types to see if they were used previously by observing which ones aren't rejected by navigator.clipboard.write?

The answer to that question is no, because there's no such thing as 100 allowable mime-types. The number of mime-types that can be written to the clipboard is infinite, but you can't write more than 100 to the clipboard in a single navigator.clipboard.write operation. A subsequent write destroys the previous clipboard contents and puts a new set in its place.

@annevk
Copy link
Member

annevk commented Mar 15, 2022

What I'm wondering about is whether such a model allows for history. At least some clipboards support pasting earlier written items, for instance.

@BoCupp-Microsoft
Copy link
Contributor

Oh, sorry, I didn't realize what you were asking. That's a great question. I believe the model supports it on platforms that support including arbitrary formats into history.

From my testing on Windows, only a handful of well-known formats are included in clipboard history; there is no support for custom formats. This StackOverflow post seems to corroborate my findings.

But if Windows did support custom formats in clipboard history, or if we are talking about another platform that allows it, we could include the "Web Custom Format Map" and the applicable "Web Custom Format0..N" entries into history based on some hint from the author as discussed in this issue. Apps, like the browser, would see these custom formats restored on the clipboard and handle them in the same way as when they were first written to the clipboard - with the exception that we might not restore all "Web Custom Format" slots. Apps should be prepared for the case when the "Web Custom Format Map" has mime-types that weren't included in the history, i.e. a mime-type may reference a "Web Custom Format" slot that wasn't restored so the app should handle that case by ignoring that entry from the "Web Custom Format Map".

@annevk
Copy link
Member

annevk commented Mar 16, 2022

Thanks, could you elaborate on the exception you mention? If they are indeed stored off on the side it seems like in principle you could restore all of them, right? Anyway, it does seem like there should be no issue here in theory.

@BoCupp-Microsoft
Copy link
Contributor

Yes, you could restore all of them assuming we have platform support for adding custom formats to history and that there's no limit more restrictive than the 100 simultaneous formats that we already impose. The browser could be in control of what is marked to be included in history and we could choose to not provide the author any control.

If we do allow the author to opt out of storing some formats in history though, we'll need to mark the formats they allow for history and our "Web Custom Format Map" as the set of custom formats to be included in a clipboard history entry. The "Web Custom Format Map" will have a complete list of the mappings from mime-types to all "Web Custom Format0..N" slots, but since the author chose to not add some of those to history, some mime-types will be dangling (the slot they refer to won't be restored to the clipboard when the user chooses to paste a clipboard history item). So the exception I was talking about is that apps which process the "Web Custom Format Map" must not assume that just because a slot is named in the map that it is also present on the clipboard.

I think that's reasonable defensive code for any app to write even if we weren't talking about the clipboard history scenario, so I don't see it as a problem - dangling "Web Custom Format Map" mime-types should be ignored by apps reading the clipboard. I was just calling it out for completeness since we're thinking about potential problems when this model interacts with clipboard history.

@BoCupp-Microsoft
Copy link
Contributor

There's been async discussion since the Agenda+ on this issue was added. Reading through it again I think the issues raised have been addressed. OK to close?

@polx
Copy link
Author

polx commented Apr 14, 2022

As far as I can tell, I see the following questions not yet cleared:

  • one more question @snianu : Do I understand that it makes sense for content types communities to actually agree on naming pickles' names if they want interoperability in advance? (here)

  • Shouldn't permissions be persisted? E.g. The browser would remember that copying pickled-MathML-content from site Y into app Z was allowed. If yes, there should be a recommendation to expire and/or offer a way to manage such permissions. (this is a rephrase of the misunderstandings above)

  • I am still entirely perplex on the 100 limit, maybe we should open a new issue on this. When I see that more and more platforms consider the web as the centre of everything, reducing the amount of atoms it could write for that purpose to 0,6% sounds a bit ridiculous. I'm only asking to raise it.

Thanks in advance.

@igandrews
Copy link

I am still entirely perplex on the 100 limit, maybe we should open a new issue on this. When I see that more and more platforms consider the web as the centre of everything, reducing the amount of atoms it could write for that purpose to 0,6% sounds a bit ridiculous. I'm only asking to raise it.

From what I understand of this conversation the only limitation by the 100 limit is that one cannot write more than 100 different formats for a single copy operation but I have never seen any application even come close to that limit. Perhaps you're thinking that this means that only the first 100 unique clipboard format names will work but I don't think that is what is proposed. Instead one will be writing the same 0-n names (e.g. "Web Custom Format 0", "Web Custom Format 1", etc) and then there is another format ("Web Custom Format Map") whose data specifies what actual clipboard format names those point to. See this reply for more on that.

@snianu
Copy link
Contributor

snianu commented Apr 14, 2022

Do I understand that it makes sense for content types communities to actually agree on naming pickles' names if they want interoperability in advance?

I think the custom format design supports this use case. If a Browser supports the custom format feature, then the site should be able to read a custom clipboard MIME type if it's available on the clipboard. For native apps, they need to add support for the custom format(in this case MathML), and this is orthogonal to the custom format proposal for the web. However, if the popular native apps (let's say on Windows) like Office, Adobe etc add support for a custom type, then it probably would be adopted by all the other apps on that specific platform. Having said that, I think we should leave this decision up to the OS to decide whether they really want to standardize the format or not. We(EditingWG) don't have control over the clipboard format standardization process in the OS. I think that is what this comment alluded to, but maybe @whsieh can shed some more light on this.

@polx
Copy link
Author

polx commented May 11, 2022

Having said that, I think we should leave this decision up to the OS to decide whether they really want to standardize the format or not.

All good. That means that, as long as OSs don't try to rule it out and pickled formats exposed by the web browser remain in the clipboard (i.e. are not wiped out then apps (and other sites) have a chance to receive that content. Right?

Currently, while there has been standards specifying the clipboard UTIs/windows-names, browsers have decided that it is not safe enough.
How can it be insured that this is not the case for pickled formats? Lobbying that the receiving implementation is in a position to inform the user about the security implications?

@BoCupp-Microsoft BoCupp-Microsoft removed the Agenda+ Agenda item to be inserted in the Editing TF meeting queue label May 12, 2022
@css-meeting-bot
Copy link
Member

The Web Editing Working Group just discussed clarifying pickling design.

The full IRC log of that discussion <Travis> topic: clarifying pickling design
<Travis> github: https://github.com//issues/393
<Travis> johanneswilm: (reviewing where we are...)
<Travis> .. related to 100 custom format limit?
<Travis> .. anything new on this issue?
<Travis> .. or can we close it now?
<Travis> BoCupp: I see a new question we haven't responded to..
<Travis> .. maybe take off the Agenda label and continue discussing async.
<Travis> johanneswilm: Bo will you answer?
<Travis> BoCupp: snianu is on it.
<Travis> .. OK to let the discussion continue. will bring back to the group if we need to talk about it here (if it needs consensus)

@snianu
Copy link
Contributor

snianu commented Jul 6, 2022

That means that, as long as OSs don't try to rule it out and pickled formats exposed by the web browser remain in the clipboard (i.e. are not wiped out then apps (and other sites) have a chance to receive that content. Right?

This is only possible if the native apps add support for web custom format. The custom mime type would be in the web custom format map. The app needs to read the map to fetch the native web custom clipboard format corresponding to the mime type, and use that format to read the content from the clipboard. This is explained here.

How can it be insured that this is not the case for pickled formats? Lobbying that the receiving implementation is in a position to inform the user about the security implications?

I don't think we as browser vendors can make OS decisions, but AFAIK Windows is not going to drop support for custom formats as that would cause serious regressions in many apps including Chromium/Firefox browsers that rely on this behavior. So, I guess we could do some level of "lobbying" to ensure that the custom formats stay as is on all supported Windows versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants