Add getCurrentBrowsingContextMedia #148

eladalon1983 · 2020-10-07T13:53:28Z

getCurrentBrowsingContextMedia is equivalent to getDisplayMedia, other than that it may only capture the tab from which it is called. This allows for a simpler selection to be displayed for the user - rather than an elaborate picker, a simple dialog box is used. This simplifies things for the user and reduces the risk of the user sharing something other than what they had intended.

See also:

We think that the security properties of own-tab capture are no worse than the version that goes via the picker. We note that the application will have control over the surface that is being displayed, and that can cause some sharing of information that would otherwise be inaccessible, such as colors on visited links, or content of embedded frames, but we think that the risks are no bigger than for regular sharing, and that the proposed, simple, prompt is good enough to mitigate this concern.

This new API will be subject to access-permissions laid down by the display-capture feature-policy. (Support will be added in Chrome for this feature-policy as part of the work on this feature.)

dontcallmedom · 2020-10-07T14:12:49Z

@eladalon1983 fwiw, the IPR checker is flagging this because Google needs to rejoin the group - cf https://lists.w3.org/Archives/Public/public-webrtc/2020Oct/0005.html

henbos · 2020-10-07T14:59:54Z

I filed an issue for this PR to be able to reference: #149

eladalon1983 · 2020-10-07T22:17:51Z

@dontcallmedom: Thanks for explaining. Until Google rejoins, do I want to skip the check by marking it non-substantive? Or possibly there's another way to avoid waiting for Google to rejoin?

@henbos: Thanks.

dontcallmedom · 2020-10-08T08:51:25Z

@eladalon1983 do you know how long it might take for Google to rejoin? if this is only short term, I think the simplest approach is to wait until it happens; if there is a risk that it might take longer, I'll suggest an alternative approach.

eladalon1983 · 2020-10-08T12:57:23Z

I don't know how long it's going to take. Could potentially be a few days, I hear through the grapevine.

annevk · 2020-10-12T11:32:17Z

Does this allow for capturing the browsing context or the document/global within it? What if it's navigated across the origin boundary for instance?

ame1234

Authoring name is arlyss engebretson

eladalon1983 · 2020-10-25T12:54:57Z

@annevk, this allows capturing the browsing context, but I don't think navigation is an issue. If tab X captures itself, then navigates to another URL, then the app which captured tab X unloads and the capture ends. (I am not aware of a mechanism for ownership of the MediaStreams to be passed from the capturing application to another application/service-worker before it unloads.)

annevk · 2020-10-26T08:53:41Z

In that case it doesn't capture the browsing context. The browsing context typically outlives a navigation.

eladalon1983 · 2020-10-26T09:10:59Z

The browsing context outlives the navigation. It's the capturing entity which is unloaded when one navigates away, even though the captured entity remains.

annevk · 2020-10-26T09:13:51Z

I don't think that's really true, since it seems to me you are capturing a rendered document. A browsing context is just a container for a sequence of documents. It doesn't really have the capacity to be captured.

alvestrand · 2020-10-26T09:19:46Z

Implementation influencing API: The method of capturing done by GetDisplayMedia (and this version) involves grabbing the framebuffer after the browsing context has been rendered, I believe. So ithe capturing operation doesn't see any DOM object within the document.

It's been suggested to put this API on the document object instead of on the navigator object, but I'm not sure that makes sense.

annevk · 2020-10-26T09:39:16Z

Right, the theoretical model is such that documents get rendered. Don't really have a strong opinion on which object you put it, but I don't think this qualifies for adding "browsing context" as a web developer-exposed term.

(To be clear, I understand @jan-ivar has other concerns with this API and if any of this is in conflict with him I'll happily defer.)

TL;DR: * This is an API for capturing the current tab. * This CL handles the Blink part. Explainer: https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI Design doc: go/get-current-browsing-context-media Intent-to-Prototype: https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ PR against spec: w3c/mediacapture-screen-share#148 Next steps: * Implement the confirmation-box. * Implement unit-tests that rely on the confirmation-box. * Graduate this to an origin-trial. Bug: 1136942 Change-Id: I81333274075cd56d7e628a8a0eb025b1ae08645a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2500841 Reviewed-by: Daniel Cheng <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Commit-Queue: Elad Alon <[email protected]> Cr-Commit-Position: refs/heads/master@{#823498}

This is the second step in implementing getCurrentBrowsingContextMedia behind a runtime flag. TL;DR: This is an API for capturing the current tab. Explainer: https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI Design doc: go/get-current-browsing-context-media Intent-to-Prototype: https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ PR against spec: w3c/mediacapture-screen-share#148 Next steps: * Implement the confirmation-box. * Implement unit-tests that rely on the confirmation-box. Bug: 1136942 Change-Id: I8b25baa85565999ec44ed2f1b0bd1e19d6f148c4 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2502628 Reviewed-by: Guido Urdaneta <[email protected]> Reviewed-by: Yuri Wiitala <[email protected]> Reviewed-by: Robert Kaplow <[email protected]> Reviewed-by: Sergey Ulanov <[email protected]> Commit-Queue: Sergey Ulanov <[email protected]> Cr-Commit-Position: refs/heads/master@{#824534}

…edia API." This reverts commit 5aea604. Reason for revert: This CL is likely the cause of build failure for Linux ChromiumOS MSan Tests and Linux Chromium OS ASan LSan Tests (1) First occurance: https://ci.chromium.org/p/chromium/builders/ci/Linux%20ChromiumOS%20MSan%20Tests/21455 and https://ci.chromium.org/p/chromium/builders/ci/Linux%20Chromium%20OS%20ASan%20LSan%20Tests%20%281%29/38935 Failed tests: GetCurrentBrowsingContextMediaDialogTest.DefaultAudioSelection GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWhenWindowClosed GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWhenWindowClosedWithoutCheckboxTicked GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWithAudioShare GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWithAudioShareFalse GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWithNoAudioShare GetCurrentBrowsingContextMediaDialogTest.DoubleTapOnShare GetCurrentBrowsingContextMediaDialogTest.ShareButtonAccepts Original change's description: > Implement the confirmation-box for getCurrentBrowsingContextMedia API. > > Rebased on top of https://chromium-review.googlesource.com/c/chromium/src/+/2502628 > > UI without audio capture: https://drive.google.com/file/d/1SA9vuDOkQjnioBfmAaiOjqoXGBUmVw22/view?usp=sharing > UI with audio capture: https://drive.google.com/file/d/1jcncgHsF6L_o3D5Jc3UJ6pkjwQAKtoVl/view?usp=sharing > > This change relates to the UI code that is added to support the new getCurrentBrowsingContextMedia API. > > This is an API for capturing the current tab. > > Design doc: > go/get-current-browsing-context-media > > Intent-to-Prototype: > https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ > > PR against spec: > w3c/mediacapture-screen-share#148 > > > Bug: 1136942 > > Change-Id: I8e72023d944df3d7e996ad3acea7527c34569868 > Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2489991 > Commit-Queue: Palak Agarwal <[email protected]> > Reviewed-by: Guido Urdaneta <[email protected]> > Reviewed-by: Peter Boström <[email protected]> > Reviewed-by: Elad Alon <[email protected]> > Reviewed-by: Elly Fong-Jones <[email protected]> > Cr-Commit-Position: refs/heads/master@{#831017} [email protected],[email protected],[email protected],[email protected],[email protected] Change-Id: I25b9e79d7eb61b5e43961df61999fd8c20954c8f No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 1136942 Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2560358 Reviewed-by: Maggie Cai <[email protected]> Commit-Queue: Maggie Cai <[email protected]> Cr-Commit-Position: refs/heads/master@{#831228}

jan-ivar · 2020-11-30T19:08:13Z

I agree with @annevk here on naming. The document is the largest object being captured.

If tab X captures itself, ...

@eladalon1983 When we say something "captures itself", that something is the document, not the tab. → getDocumentMedia.

Implementation influencing API: The method of capturing done by GetDisplayMedia (and this version) involves grabbing the framebuffer after the browsing context has been rendered, I believe.

@alvestrand Ah, I couldn't figure out why the API allowed an iframe to capture its embedder, when none of the use cases require it. → getTopLevelDocumentMedia would also be a mouthful.

I'd humbly suggest an iframe only capture itself. A simpler story, and a useful cropping tool to boot.

For implementations, cutting out the relevant rectangle from the framebuffer is hopefully not too difficult.

eladalon1983 · 2020-11-30T20:43:59Z

An iframe only capturing itself would not cover some interesting use cases. For example, a document being presented to a VC, by embedding an iframe of the VC's application inside the document-editor application. Or a game capturing footage of itself by embedding an iframe which also contains code and visible controls for managing the capture, annotating it, and uploading it to a remote server, possibly one which streams it to remote viewers.

…

On Mon, Nov 30, 2020 at 8:08 PM Jan-Ivar Bruaroey ***@***.***> wrote: I agree with @annevk <https://github.com/annevk> here on naming. The document is the largest object being captured. If tab X captures itself, ... @eladalon1983 <https://github.com/eladalon1983> When we say something "captures itself", that something is the document, not the tab. → getDocumentMedia. Implementation influencing API: The method of capturing done by GetDisplayMedia (and this version) involves grabbing the framebuffer after the browsing context has been rendered, I believe. @alvestrand <https://github.com/alvestrand> Ah, I couldn't figure out why the API allowed an iframe to capture its embedder, when none of the use cases require it. → getTopLevelDocumentMedia would also be a mouthful. I'd humbly suggest an iframe only capture itself. A simpler story, and a useful cropping tool to boot. For implementations, cutting out the relevant rectangle from the framebuffer is hopefully not too difficult. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIX22H2OPTYPG5XVQ6PDBDSSPUSXANCNFSM4SHNXW3A> .

annevk · 2020-12-01T08:30:44Z

Isn't the document of the frame being captured there? Or do you mean it would include nested documents? Again though, "browsing context" doesn't capture that at all. They're just an abstract holder of a sequence of documents, only one of which is currently active. (And depending on how history is revamped that model might change a bit still.)

eladalon1983 · 2020-12-01T14:38:34Z

I leave discussions of the API's name to those with more experience in this field than me. I have no strong opinions in this matter, other than a mild preference for brevity.

…

On Tue, Dec 1, 2020 at 9:31 AM Anne van Kesteren ***@***.***> wrote: Isn't the document of the frame being captured there? Or do you mean it would include nested documents? Again though, "browsing context" doesn't capture that at all. They're just an abstract holder of a sequence of documents, only one of which is currently active. (And depending on how history is revamped that model might change a bit still.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIX22GBWGEOOT4YQJFHTS3SSSSUHANCNFSM4SHNXW3A> .

jan-ivar · 2020-12-01T15:21:28Z

An iframe only capturing itself would not cover some interesting use cases. For example, a document being presented to a VC, by embedding an iframe of the VC's application inside the document-editor application.

@eladalon1983 A security property I like about "capture itself", is requiring explicit code in the capture target. Easy to grasp, ensures buy-in of the target, and no-one has to worry about being captured by other origins (except through the existing hyper-user-driven getDisplayMedia picker API).

I was even wondering about enforcing this by checking current global object == relevant global object, to enforce this:

await iframe.contentWindow.navigator.mediaDevices.getDocumentMediaWhatever() // SecurityError

Or a game capturing footage of itself by embedding an iframe which also contains code and visible controls for managing the capture, annotating it, and uploading it to a remote server, possibly one which streams it to remote viewers.

I don't understand this use of "itself". Can't it put code in the parent as well, use postMessage etc? We seem to be making much stronger assumptions about the capture target's involvement elsewhere, e.g. "Since the app can take for granted that the captured content is of itself, it knows how to crop sensibly".

I feel strongly we shouldn't be over-capturing, only to create a need for inventing cropping APIs next.

I think we can lean on apps to use iframes to capture exactly what they want to send and no more. This is the web after all, so I'd aim for stronger integration than the existing modality.

eladalon1983 · 2020-12-01T15:53:25Z

Cropping is a different issue that we happen to be interested in, and only one example of why it's useful for the application to know that it is capturing its own tab. Shelving it for the time being, let's please examine the scenario of a game running in the browser, and wanting to stream itself to a service like Twitch. The streaming service could "publish" an iframe or a script that can be embedded in a game, implementing that functionality. The alternatives I can think of are less preferable. They include: (a) Each game re-implementing the streaming capability, probably by importing the streaming service's code into their own codebase. I don't think that's a reasonable alternative, as the copied code would be running *same-origin*, and would have to be scrutinized before being imported; likely it either won't be scrutinized, or it would be running an older, already-scrutinized revision; also not very secure. (b) Passing frames and audio via postMessage, which would be prohibitively inefficient. Also, since iframes can embed additional iframes, I am not sure that "capture only this iframe" solves the issue of isolating what is captured. I think an opt-out mechanism for being captured by one's embedder would still be necessary. And once such a mechanism is provided (for other readers - I have suggested such a mechanism elsewhere), I think it can be used for making a safe implementation of capture-entire-tab.

youennf · 2020-12-01T17:48:18Z

I am still unclear about the goal of the API, which makes it hard to discuss the API surface.
Either we are talking of a privileged API, thus there is a prompt somewhere. In that case, we should investigate how much different it would be from getDisplayMedia, how the UI would be more intuitive and so on.
If we are talking about a no-prompt approach, this is another story where API could be at element level for instance like fullscreen, and we could constrain the element properties.
Can somebody clarify which approach is actually envisioned?

eladalon1983 · 2020-12-01T18:52:16Z

With a prompt. There is a mock in the explainer <https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI/edit?usp=sharing>. The UA is of course free to make the dialog even more "frictive," e.g. by presenting a preview image of the tab that will be captured in the dialog, possibly also requiring that it be selected prior to pressing the "Share" button (similarly to what Chrome currently does for selecting a full-desktop capture).

…

On Tue, Dec 1, 2020 at 6:48 PM youennf ***@***.***> wrote: I am still unclear about the goal of the API, which makes it hard to discuss the API surface. Either we are talking of a privileged API, thus there is a prompt somewhere. In that case, we should investigate how much different it would be from getDisplayMedia, how the UI would be more intuitive and so on. If we are talking about a no-prompt approach, this is another story where API could be at element level for instance like fullscreen, and we could constrain the element properties. Can somebody clarify which approach is actually envisioned? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#148 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFIX22A4QQZ24A3MFLE3DHTSSUT7DANCNFSM4SHNXW3A> .

jan-ivar · 2020-12-02T15:31:34Z

I don't think we should introduce artificial reasons to crop, nor assume cropping will ever be added as a feature since it's a slippery slope to image processing, something this WG appears to be leaning more toward raw media access to solve.

Shelving it for the time being, let's please examine the scenario of a game running in the browser, and wanting to stream itself to a service like Twitch. The streaming service could "publish" an iframe or a script that can be embedded in a game, implementing that functionality. The alternatives I can think of are less preferable. They
include:

(a) Each game re-implementing the streaming capability, probably by importing the streaming service's code into their own codebase. I don't think that's a reasonable alternative, as the copied code would be running same-origin, and would have to be scrutinized before being imported;

Exactly. To protect users from dubious information-harvesting JS libraries, I think I'd prefer this to receive the same level of scrutiny that a service provider performs to protect itself.

I don't think we should make it easier to export user trust to entities the service provider itself doesn't trust, because if a service provider doesn't trust a library then users probably shouldn't either.

eladalon1983 · 2020-12-02T15:50:10Z

In the scenario described, the service provider trusts the (specific) third-party enough to (1) embed it and (2) provide its iframe with allow=display-capture. The service provider should not, IMHO, be forced to the decision of either not trusting the (specific) third-party it at all, or trusting it enough to allow it to run same-origin.

eladalon1983 · 2020-12-02T16:06:27Z

I'd also like to clarify that cropping is not the main issue with capture-this-tab, but rather just an example. I see the advantages of capture-this-tab as being the following, and in this order: 1. [Main advantage:] Nearly eliminates the risk of users sharing the wrong thing (choosing the wrong thing in a chooser). IMHO, this is a big selling point. 2. [Important, but not main advantage:] Allows the application the knowledge of the contents of the capture, allowing processing of any sort, cropping (even in JS) being just one example. Other (hypothetical) examples include blurring part of the captured stream without blurring what the local user sees, censoring private information that's in the middle of the captured content (can't be cropped) without obscuring it for the local users, etc. This is just a short list of hypotheticals I could think of, however. 3. [Lower importance:] Provides a streamlined interface for sharing, making both users' and developers' lives easier. Contrast the "share this document" flow using getCurrentBrowsingContextMedia with the flow using getDisplayMedia. With getDisplayMedia, if the user chooses the wrong source in the picker, the application has to (a) detect it, (b) discard the MediaStream and (c) explain to the user what mistake he had made, and how to avoid it in the future, then (d) prompt the user to try again. Cumbersome.

jan-ivar · 2020-12-02T17:39:32Z

The service provider should not, IMHO, be forced to the decision of either not trusting the (specific) third-party it at all, or trusting it enough to allow it to run same-origin.

@eladalon1983 but this feature undermines many of the same-origin protections you get from iframing in the first place.

... Allows the application the knowledge of the contents of the capture, ...

Right, and this knowledge adds risk.

Whatabout getDisplayMedia?

While getDisplayMedia already has this problem IF the user chooses the same tab, at least that scope violation is 100% user-driven, and some form of social engineering is needed to make it a reliable exploit (it doesn't help that Chrome fails to warn about this when it should).

I think it's fair to say getDisplayMedia is flying close to the sun already and exists because it is backed by such a highly compelling use case (web conference presentations). If you were to ask me if I think it has too many security mitigations, I'd say no. That's why I don't find it compelling to remove any of them.

I do however find the idea of enabling apps to stream themselves into web conferences appealing, provided it can be integrated safely.

domenic · 2021-02-24T16:43:42Z

Where did the naming discussion on this land? The term "browsing context" has not been exposed to web developers in any web platform API so far, and it's a safe bet that web developers won't already know precisely what it is.

It's also worth pointing out that the definition of browsing context is somewhat in flux (precisely because it's not web-developer exposed); we're currently working on transitioning that single concept into three: browsing context, browsing session, and top-level navigable. You can learn more about that in whatwg/html#5767 and whatwg/html#6356.

eladalon1983 · 2021-03-16T15:49:22Z

Chrome Security is of the opinion that a confirmation-only flow for gCBCM (getCurrentBrowsingContextMedia) would require security measures similar to those which @jan-ivar has suggested. Namely, a combination of (a) site-isolation and (b) a new COEP-like header for opting in to capturability by embedder. I think we have the following issues to resolve, then:

Name

We are open-minded about this. I personally like navigator.mediaDevice.getCurrentTabMedia. I think it’s succinct and clear. I have also considered getThisTabMedia, but I find that it’s not as clear whether “this” refers to “tab” or to “media.” Wdyt?

Security Measures

I think we can continue this discussion in thread #155.

Behavior when security-measures do not hold

The application might try to call gCBCM from almost any context.

If gDM is not allowed to be called from that context, then gCBCM should also not be permitted. (For example, calling gCBCM from an iframe that does not have the prerequisite display-capture permission from its embedder.)

gCBCM introduces new requirements in addition to gDM’s - (a) site-isolation and (b) a new HTTP header (definition pending in separate thread). If either of these conditions does not hold at the time when gCBCM is called, we would like to specify that the user agent SHOULD (or MAY) fall back to gDM-like behavior. That is, display a dialog that does not limit the selection of the by the user to just this tab.

Rationale for Fallback
We expect the most common failure of gCBCM calls to occur due to a third-party iframe served without the new HTTP header. This will be especially true in the beginning, before the new header has had time to gain wide adoption, but even after adoption is widespread, it will be difficult for complex applications to ensure that all embedded content “plays nicely” at all times and includes the necessary header. Third-parties could be migrating between servers, falling back on misconfigured servers, etc. So failure to call gCBCM should be expected even in otherwise well-structured applications.

One possibility for applications is to inspect the failure reason of gCBCM calls, and if it’s due to the missing header, to call getDisplayMedia “manually” as a fallback. This is possible, but clunky. It’s also arguable that exposing the exact reason gCBCM was rejected is undesirable. (Normally, a top-level application would not be allowed to see what HTTP headers embedded content uses, let alone embedded content twice-removed. Normally, things either load or don’t.)

We believe that it will be helpful if we specify that the user agent SHOULD default to a gDM-like dialog (or possibly MAY). The only new problem introduced - uncertainty by the application over whether the user really chose the current tab - can be resolved in several ways. It may be left to the application (e.g., pixel test), or we could discuss more ergonomic solutions - using a returned value, using the label of MediaStreamTrack (Firefox currently uses windows’ titles as the label), etc.

Audio Playback Suppression

I will be using “playback-suppress” as shorthand for “mute the audio from the speaker’s point-of-view, but still make this audio capturable.”

While a user is capturing audio from a tab, it’s sometimes useful to prevent that tab from performing audio playback through the speakers. This is useful, for example, for performing echo cancellation, which works better if the audio captured on the playback-suppressed tab is sent back to some like is other participants’ audio, and is seen by the echo canceller as just another remote-sourced audio stream played out over the speakers. I see some options here:

Allow a parameter or a constraint to determine whether the captured tab is playback-suppressed or not. (Note that capturing is user-controlled, so malicious applications cannot just use this to mute other tabs willy-nilly.)
Reinterpret an existing control, constraint or setting to mean that the captured tab should be playback-suppressed (e.g. echoCancellation).
Specify that the behavior of capturing a tab playback-suppresses its audio, always, and that’s it.
The most ambitious change (goes beyond the scope of this WG), but IMHO the most desirable - we should note that atm, it is not possible for an app to suppress its own speaker-playback. Or an iframe’s playback. That means that by embedding an iframe, the application accepts whatever audio-playback the embedded frame happens to perform. It would be good if documents could playback-suppress (a) themselves or (b) specific embedded iframes.

annevk · 2021-03-16T16:02:34Z

Name: I think the concerns raised with browsing context apply equally to tab. This is much more narrow-scoped than a tab. "Page" might be okay and has some precedent in CSS.

(Also, thanks for the update!)

jan-ivar · 2021-03-23T21:34:26Z

Name: I think the concerns raised with browsing context apply equally to tab. This is much more narrow-scoped than a tab.

@annevk I think this is a case where "tab" is narrower-scoped than page, because a page may have a much larger surface area, and we only want to capture what the user sees at the moment. That is: the intersection of the top-level browsing context's viewport and the rendering boundaries of the requesting document, including any content overlaid by CSS (i.e. excluding any content 100% occluded by CSS at the moment):

...OR (to complicate matters) depending on the outcome of the above discussion on letting an iframe capture its parent:

In either case, this seems best for both privacy and efficiency (we don't want to have to re-render a page for this). If the user scrolls the page they may reveal more info.

So I've been calling this getTabMedia. Though perhaps getCurrentTabMedia is more precise? — In either case, the requester cannot outlive its target, so I wouldn't worry about "tab" implying capture past navigation.

jan-ivar · 2021-03-24T01:26:33Z

If either of these conditions does not hold at the time when gCBCM is called, we would like to specify that the user agent SHOULD (or MAY) fall back to gDM-like behavior.

I'd rather keep this separate from gDM, even using a separate permissions policy, since the security properties are quite different.

The callsites may not even be the same always. E.g. this is the target being told to capture itself and beam into a meeting, vs the main window where gDM may be called to present today.

I see no way around apps needing to check for errors. Especially if we go with the model where capture may terminate on a non-opt-in iframe loading. Apps would need to catch that error too, and respond appropriately.

annevk · 2021-03-24T06:28:33Z

That's a fair criticism of page, but I don't think tab really captures it either. This doesn't match the lifetime of a tab and tab is rather implementation-specific and might not exist on all platforms. (Also, tab in implementations is rather analogous to top-level browsing context (or browsing session, once we have that) and this clearly isn't that.)

dontcallmedom · 2021-03-24T06:33:10Z

re naming, wouldn't viewport be a better characterization of the target of the capture?

annevk · 2021-03-24T07:43:22Z

Yeah, I think viewport would work, though with the caveat that in theory nested documents have their own viewport and I don't think there is a clearly defined term for the composition of them.

jan-ivar · 2021-03-24T12:04:17Z

The lifetime of a capture target >= lifetime of a capture.

Sites today can capture the display (with getDisplayMedia) and the user (with getUserMedia), both of which (hopefully!) outlive the page and its capture of them. I don't feel users or developers are confused by that.

getViewportMedia could work, but is it distinct enough from getDisplayMedia?

FWIW, "screen", "window", and "tab" are the layterms around screen-sharing UX in browsers today.

AFAIK, screen-sharing isn't available yet on mobile, but I believe the term "tab" exists there as well as an organizing container/layterm for browsing context.

My objection to "browsing context" wasn't its scope, but it being technical under-the-hood term previously unexposed in the platform.

annevk · 2021-03-24T12:15:34Z

My objection is the scope. 😊 Both screen (has precedent with window.screen too) and user are clear, but with browsing context it's not clear that capture is canceled if the user navigates; navigation is just a detail of a browsing context. (I hope we don't expose "tab" anywhere as a thing either.) That navigation cancels capture suggests it's a smaller unit, which would be document/page/viewport.

(To add, I wouldn't find it problematic to expose browsingContext or something equivalent as a term, if there was concept that matched it.)

jan-ivar · 2021-03-24T12:41:48Z

Fair enough. Of those I think I'd pick viewport, to emphasize we're not necessarily capturing the whole document.

eladalon1983 · 2021-03-24T14:06:25Z

I also like viewport. There are certain browser UI elements that are bound to specific tabs, but which do not get captured. The developer console, for instance. Using "tab" does not make immediately clear that those are not captured. So I assume it's getViewportMedia, then? Or some variation thereof?

jan-ivar · 2021-03-24T21:11:21Z

A precedent for the term "viewport" being exposed to the web is window.visualViewport.

I guess it remains to be seen whether we'd capture the layout viewport or the visual viewport on mobile.

eladalon1983 · 2021-03-25T15:52:48Z

If either of these conditions does not hold at the time when gCBCM is called, we would like to specify that the user agent SHOULD (or MAY) fall back to gDM-like behavior.

I'd rather keep this separate from gDM, even using a separate permissions policy, since the security properties are quite different.

Requiring a separate permission policy is fine by me.

Let's assume that getViewportMedia is called from a context that has both the old display-capture permission that gates getDisplayMedia, as well as the new permission that we end up introducing. I think in this case, calls to getViewportMedia should still result in some gDM-like user-prompt if [either site-isolation or the new header] is missing.

Rationale - we expect this to happen >99% of the time, at least in the early days, and we don't believe the feature will be useful without it. The compromise that Chrome has reached internally between the demands from Security and the needs of potential feature-customers, is that a confirmation-only dialog is displayed if all of the new security requirements are satisfied, and an explicit-selection dialog is shown otherwise, which is generally gDM-like, but highlights in some spec-compliant way that the application would like to get the current tab. (For example, consider a UA that normally offers windows as the first option if gDM is called, but offers tabs as the first option if gVM-fallback-mode is used.)

At the bottom of my comment is an illustration of what Chrome thinks of using. I mention Chrome-specifics only so as to explain our motivation. Spec-wise, Chrome's specific dialog is of course out of scope. For the spec-change, I think the right way to go about it is to say that the user agent SHOULD/MAY fall back to any behavior that complies with the restrictions placed on gDM, but that this behavior MAY differ from the specific UA's usual gDM behavior. (Or maybe we can leave this "MAY differ..." part implicit.)

Lastly, we can make this fallback behavior temporary, giving sites time to adopt the security requirements that we introduce.

Here's what a gDM-like fall back can look like:

jan-ivar · 2021-03-25T20:30:25Z

a confirmation-only dialog is displayed if all of the new security requirements are satisfied, and an explicit-selection dialog is shown otherwise, ... but offers tabs as the first option if gVM-fallback-mode is used.)

This would make getViewportMedia a weakened version of getDisplayMedia, which seems problematic.

I don't think we can infer that an app calling one API wants to fall back to calling the other in all circumstances. This seems app-specific, and a few lines of code:

let stream;
try {
  stream = await navigator.mediaDevices.getViewportMedia();
} catch (e) {
  if (e.name != "SecurityError") throw;
  stream = await navigator.mediaDevices.getDisplayMedia(); // ¹
}

This would be well-tested, because as you say: "we expect this to happen >99% of the time, at least in the early days".

_{1) If Chrome wants to weaken the already strained security properties of getDisplayMedia, it can do so here without melding APIs together, by ignoring spec recommendations and detecting this situation.}

eladalon1983 · 2021-03-25T21:05:04Z

Why "weakened"? It presents a different choice that's equally unconstrained. It's not, IMHO, weaker. (And if called from the right kind of context, a completely different, confirmation-only choice.)
We know of at least one feature-customer that wants this - Google Slides. The code you present that calls gDM if gVM fails, would not serve their needs, because it would still produce the usual gDM media-picker when gVM fails and gDM is called immediately after. Unless, that is, we implement it as "calls to gDM immediately after a gVM failure yield the new gDM behavior." But it would be tricky to define "immediately" in a way that could satisfy all customers. So:
Another option is to add a parameter that controls this. Maybe something like any_surface_allowed. We could say that gVM with any_surface_allowed = true is allowed to be called from a context that does not satisfy [site-isolation and new-header], but MUST allow the user an unconstrained choice, and MAY present a different order of options compared to gDM. (The MAY part can be implicit.)

jan-ivar · 2021-03-25T21:59:36Z

Why "weaker"?

From the spec: "User Agents are encouraged to warn users against sharing browser display devices as well as monitor display devices where browser windows are visible, or otherwise try to discourage their selection on the basis that these represent a significantly higher risk when shared.".

See also crbug 920752.

it would be tricky to define "immediately" in a way that could satisfy all customers

I'm merely making the point that it's completely doable, from the same justifications made for wanting to standardize it. Except Chrome would (hopefully) be alone in making this convenience/safety tradeoff. So any definition would do, e.g. the same run of the event loop. If Chrome would rather not do it because it's non-standard, that would be understandable, and desirable from my point of view.

I'd be opposed to standardizing any parameter related to this, because I think it's bad for privacy for the reasons stated.

would not serve their needs,

We are standardizing gVM specifically to serve this need. I see no new information to revisit gDM.

eladalon1983 · 2021-03-26T00:09:36Z

I'd be opposed to standardizing any parameter related to this, because I think it's bad for privacy for the reasons stated.

The attacks we have discussed so far all required a single frame to perform. A malicious application can preload occluded cross-origin iframes and flash them to the screen for the duration of a single frame immediately after the user approves screen-capture. As soon as the user approves, it becomes too late to hide anything from the app. Switching tabs, minimizing windows, etc. - such steps do not offer protection from a malicious app. The decisive moment is when the user accepts.

Currently, Safari offers only the entire screen; Chrome and Edge offer screen/window/tab, with the first option on offer being screen. Most users have a single screen, and it's showing the current tab at the moment capture starts. Any danger that exists with capturing the current tab, also exists when capturing the current screen - and more (e.g. see titles of other tabs).

Dialogs offering unconstrained choice to the user, but with focus moved away from current-screen towards current-tab, are more secure than dialogs that push towards sharing the entire screen. Helping browsers move to more secure options creates a more secure Web. In order to be implemented, it helps if work on getViewportMedia in Chrome is motivated by a Google product that has immediate need for it. A product that intends to use it, and is therefore interested in funding that work.

If you can help me find a variation that satisfies everyone¹, or that can be an acceptable compromise for everyone, I would be very grateful. This can include any old/new idea, or any temporary compromise. I believe it will also be good for security and privacy on the Web.

¹ Including one customer for this feature which will only be able to adopt COOP+COEP in the mid-term future, and the new header only in the long-term future. And this customer is the motivation for our investment of headcount in this.

dontcallmedom · 2021-03-26T07:24:59Z

I don't think falling back to gDM when gVP fails is a good approach:

it assumes gDM will always be the right fallback; there may be cases when doing nothing or falling back to captureStream would be better alternative
as @jan-ivar says, the fallback option of gVP can't be a path to something that has weaker security characteristics than both gDM & gVP, esp since it's an easy to activate path for a possible attacker

Regarding the fact that sharing a full-screen is as or more dangerous than sharing a tab in a single screen scenario, I think part of the reasoning is that users understand much better than sharing your entire screen is potentially scary, whereas they might think it is benign to share the tab that is asking for screen sharing. So it is not that it is safer, but that users will have more accurate understanding of the risks. And conversely, because developers might expect users will be offered a scary choice, it makes it a less attractive option for attackers.

Separately, we've heard several times that the current lack of ability for developers to guide the capture surface in gDM leads to suboptimal UX - I wonder if we should look into reinstating a way for developers to give a hint, which UAs could choose whether and how to take into account (e.g. based on previous interactions of the user with the site, maybe based on the cross-origin isolation status of the tab if a tab is being requested, …) - but that would need to be a separate discussion.

youennf · 2021-03-26T10:01:59Z

As of gDM as a back-up, I think we should first let websites experiment with it themselves. UAs can also learn from it as well. It is easier to add additional parameters later if we think there is value based on that.

I see two requirements from that discussion:

We should gate gvp on transient activation like done for gdm
We should guarantee that even if gvp fails, transient activation stays valid in the reject handler so that gdm can be called right after

I wonder if we should look into reinstating a way for developers to give a hint

Before going down that road, I think UA implementors should investigate what they can do on their own.
There is nothing in the spec preventing UAs to specialise prompts based on what users actually did in the past, even though this might be tricky from a usability and security point of view. Adding parameters given by the web site to the mix might make things more complex.

jan-ivar · 2021-03-26T16:35:56Z

If you can help me find a variation that satisfies everyone¹, or that can be an acceptable compromise for everyone, I would be very grateful.

@eladalon1983 I showed how this customer can accomplish their workflow with a few lines of JS, and how browsers that want to can detect and optimize the UX flow in that case, even though we don't recommend it. Can you help me understand why they'd need this behavior standardized for all apps?

As to browser UX, specs have a hard time mandating it. Where strong recommendations have had teeth in the past, they're paired with good privacy or security arguments, to convince (or shame) implementers with. So proposing a parameter to induce UX (going against strong privacy recommendaetions) seems like a dead-end.

We should gate gvp on transient activation like done for gdm

@youennf Agreed, among a list of other things.

if gvp fails, transient activation stays valid in the reject handler so that gdm can be called right after

Transient activation is time-based, so we may not need to say anything about it since gvp would fail immediately.

youennf · 2021-03-26T17:10:07Z

Transient activation is time-based, so we may not need to say anything about it since gvp would fail immediately.

I do not see how it can be guaranteed today with a time-based approach.

The fact that gvp fails immediately is an implementation decision. Asynchronous queries might be required to actually make it fail. Even if it fails synchronously, there might be some time spent in doing this computation.
Also, even if gvp fails immediately, the reject handler will not be called immediately but later on, once other JS is executed.

eladalon1983 · 2021-03-26T22:35:00Z

(One more suggestion, probably the last one on this topic. The main difference here is that all security restrictions now always apply.)

What if any_surface_allowed did not result in a gDM-like fall-back, but rather, if it specified something along the lines of:

...all security restrictions to gVM apply. If any doesn't hold, abort. If they all hold, proceed.
In response to gVM, the user agent SHOULD use a confirmation-only dialog¹² (accept/reject).
If any_surface_allowed = true is specified, the user agent MAY present the user the ability to choose other sources. (The user agent still SHOULD start out with a confirmation-only dialog, but MAY display an option to "click here to select another source." Clicking that can then transition to a normal gDM dialog.)

With this, the standard does not allow gVM to become a different version of gDM. Rather, it allows gVM to become a normal version of gDM. It makes the process more user-driven. Wdyt?

Here is a mock where any_surface_allowed is unspecified or is set to false:

Here is a mock with any_surface_allowed set to true:

Important differences from previous suggestion:

Fails if the security restrictions are not met, for either any_surface_allowed=false/true.
Note that the user agent MAY do this.

It essentially restricts the user-selection to only the current tab.
Audio notwithstanding; this much the user should be able to refuse independently of video, as with gDM. E.g. using a checkbox.

jan-ivar · 2021-05-13T15:24:51Z

Let's close this and move getViewportMedia to https://github.com/w3c/mediacapture-screen-share/issues/155 since we've reached WG consensus (slide) to site-isolate the API.

This issue gets credit for birthing getViewportMedia, but has become a source of confusion. The (early) name "getCurrentBrowsingContextMedia" is both a Chrome origin trial (without site-isolation), and now also a different competing "hybrid" picker-based API proposal from Google without support from this WG.

TL;DR: * This is an API for capturing the current tab. * This CL handles the Blink part. Explainer: https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI Design doc: go/get-current-browsing-context-media Intent-to-Prototype: https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ PR against spec: w3c/mediacapture-screen-share#148 Next steps: * Implement the confirmation-box. * Implement unit-tests that rely on the confirmation-box. * Graduate this to an origin-trial. Bug: 1136942 Change-Id: I81333274075cd56d7e628a8a0eb025b1ae08645a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2500841 Reviewed-by: Daniel Cheng <[email protected]> Reviewed-by: Guido Urdaneta <[email protected]> Commit-Queue: Elad Alon <[email protected]> Cr-Commit-Position: refs/heads/master@{#823498} GitOrigin-RevId: bc949e9d94ea6496b15153f5486a12608db7152b

Update index.html

de2305e

aboba added the TPAC 2020 label Oct 8, 2020

ame1234 approved these changes Oct 25, 2020

View reviewed changes

eladalon1983 mentioned this pull request Apr 23, 2021

API for display-capturing the current tab w3ctag/design-reviews#625

Closed

1 task

jan-ivar closed this May 13, 2021

This was referenced Apr 21, 2022

Recognize safer & better-integrated web presentations in getDisplayMedia w3c/mediacapture-screen-share-extensions#9

Open

preferCurrentTab mozilla/standards-positions#538

Open

Add preferCurrentTab to getDisplayMedia WICG/proposals#32

Closed

Add getCurrentBrowsingContextMedia #148

Add getCurrentBrowsingContextMedia #148

Conversation

eladalon1983 commented Oct 7, 2020 • edited Loading

dontcallmedom commented Oct 7, 2020

henbos commented Oct 7, 2020

eladalon1983 commented Oct 7, 2020

dontcallmedom commented Oct 8, 2020

eladalon1983 commented Oct 8, 2020

annevk commented Oct 12, 2020

ame1234 left a comment

Choose a reason for hiding this comment

eladalon1983 commented Oct 25, 2020

annevk commented Oct 26, 2020

eladalon1983 commented Oct 26, 2020

annevk commented Oct 26, 2020

alvestrand commented Oct 26, 2020

annevk commented Oct 26, 2020 • edited Loading

jan-ivar commented Nov 30, 2020

eladalon1983 commented Nov 30, 2020 via email

annevk commented Dec 1, 2020

eladalon1983 commented Dec 1, 2020 via email

jan-ivar commented Dec 1, 2020

eladalon1983 commented Dec 1, 2020 via email • edited Loading

youennf commented Dec 1, 2020

eladalon1983 commented Dec 1, 2020 via email

jan-ivar commented Dec 2, 2020

eladalon1983 commented Dec 2, 2020 via email • edited Loading

eladalon1983 commented Dec 2, 2020 via email • edited Loading

jan-ivar commented Dec 2, 2020

Whatabout getDisplayMedia?

domenic commented Feb 24, 2021

eladalon1983 commented Mar 16, 2021 • edited Loading

Name

Security Measures

Behavior when security-measures do not hold

Audio Playback Suppression

annevk commented Mar 16, 2021

jan-ivar commented Mar 23, 2021

jan-ivar commented Mar 24, 2021 • edited Loading

annevk commented Mar 24, 2021

dontcallmedom commented Mar 24, 2021

annevk commented Mar 24, 2021

jan-ivar commented Mar 24, 2021

annevk commented Mar 24, 2021 • edited Loading

jan-ivar commented Mar 24, 2021

eladalon1983 commented Mar 24, 2021

jan-ivar commented Mar 24, 2021

eladalon1983 commented Mar 25, 2021

jan-ivar commented Mar 25, 2021

eladalon1983 commented Mar 25, 2021 • edited Loading

jan-ivar commented Mar 25, 2021 • edited Loading

eladalon1983 commented Mar 26, 2021

dontcallmedom commented Mar 26, 2021

youennf commented Mar 26, 2021

jan-ivar commented Mar 26, 2021

youennf commented Mar 26, 2021

eladalon1983 commented Mar 26, 2021 • edited Loading

jan-ivar commented May 13, 2021

eladalon1983 commented Oct 7, 2020 •

edited

Loading

annevk commented Oct 26, 2020 •

edited

Loading

eladalon1983 commented Dec 1, 2020 via email •

edited

Loading

eladalon1983 commented Dec 2, 2020 via email •

edited

Loading

eladalon1983 commented Dec 2, 2020 via email •

edited

Loading

eladalon1983 commented Mar 16, 2021 •

edited

Loading

jan-ivar commented Mar 24, 2021 •

edited

Loading

annevk commented Mar 24, 2021 •

edited

Loading

eladalon1983 commented Mar 25, 2021 •

edited

Loading

jan-ivar commented Mar 25, 2021 •

edited

Loading

eladalon1983 commented Mar 26, 2021 •

edited

Loading