Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getCurrentBrowsingContextMedia #148

Closed
wants to merge 1 commit into from
Closed

Add getCurrentBrowsingContextMedia #148

wants to merge 1 commit into from

Conversation

eladalon1983
Copy link
Member

@eladalon1983 eladalon1983 commented Oct 7, 2020

getCurrentBrowsingContextMedia is equivalent to getDisplayMedia, other than that it may only capture the tab from which it is called. This allows for a simpler selection to be displayed for the user - rather than an elaborate picker, a simple dialog box is used. This simplifies things for the user and reduces the risk of the user sharing something other than what they had intended.

See also:

  1. Explainer
  2. Public discussion

We think that the security properties of own-tab capture are no worse than the version that goes via the picker. We note that the application will have control over the surface that is being displayed, and that can cause some sharing of information that would otherwise be inaccessible, such as colors on visited links, or content of embedded frames, but we think that the risks are no bigger than for regular sharing, and that the proposed, simple, prompt is good enough to mitigate this concern.

This new API will be subject to access-permissions laid down by the display-capture feature-policy. (Support will be added in Chrome for this feature-policy as part of the work on this feature.)

@dontcallmedom
Copy link
Member

@eladalon1983 fwiw, the IPR checker is flagging this because Google needs to rejoin the group - cf https://lists.w3.org/Archives/Public/public-webrtc/2020Oct/0005.html

@henbos
Copy link
Contributor

henbos commented Oct 7, 2020

I filed an issue for this PR to be able to reference: #149

@eladalon1983
Copy link
Member Author

@dontcallmedom: Thanks for explaining. Until Google rejoins, do I want to skip the check by marking it non-substantive? Or possibly there's another way to avoid waiting for Google to rejoin?

@henbos: Thanks.

@dontcallmedom
Copy link
Member

@eladalon1983 do you know how long it might take for Google to rejoin? if this is only short term, I think the simplest approach is to wait until it happens; if there is a risk that it might take longer, I'll suggest an alternative approach.

@eladalon1983
Copy link
Member Author

I don't know how long it's going to take. Could potentially be a few days, I hear through the grapevine.

@aboba aboba added the TPAC 2020 label Oct 8, 2020
@annevk
Copy link
Member

annevk commented Oct 12, 2020

Does this allow for capturing the browsing context or the document/global within it? What if it's navigated across the origin boundary for instance?

Copy link

@ame1234 ame1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Authoring name is arlyss engebretson

@eladalon1983
Copy link
Member Author

@annevk, this allows capturing the browsing context, but I don't think navigation is an issue. If tab X captures itself, then navigates to another URL, then the app which captured tab X unloads and the capture ends. (I am not aware of a mechanism for ownership of the MediaStreams to be passed from the capturing application to another application/service-worker before it unloads.)

@annevk
Copy link
Member

annevk commented Oct 26, 2020

In that case it doesn't capture the browsing context. The browsing context typically outlives a navigation.

@eladalon1983
Copy link
Member Author

The browsing context outlives the navigation. It's the capturing entity which is unloaded when one navigates away, even though the captured entity remains.

@annevk
Copy link
Member

annevk commented Oct 26, 2020

I don't think that's really true, since it seems to me you are capturing a rendered document. A browsing context is just a container for a sequence of documents. It doesn't really have the capacity to be captured.

@alvestrand
Copy link
Contributor

Implementation influencing API: The method of capturing done by GetDisplayMedia (and this version) involves grabbing the framebuffer after the browsing context has been rendered, I believe. So ithe capturing operation doesn't see any DOM object within the document.

It's been suggested to put this API on the document object instead of on the navigator object, but I'm not sure that makes sense.

@annevk
Copy link
Member

annevk commented Oct 26, 2020

Right, the theoretical model is such that documents get rendered. Don't really have a strong opinion on which object you put it, but I don't think this qualifies for adding "browsing context" as a web developer-exposed term.

(To be clear, I understand @jan-ivar has other concerns with this API and if any of this is in conflict with him I'll happily defer.)

pull bot pushed a commit to Alan-love/chromium that referenced this pull request Nov 3, 2020
TL;DR:
* This is an API for capturing the current tab.
* This CL handles the Blink part.

Explainer:
https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI

Design doc:
go/get-current-browsing-context-media

Intent-to-Prototype:
https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ

PR against spec:
w3c/mediacapture-screen-share#148

Next steps:
* Implement the confirmation-box.
* Implement unit-tests that rely on the confirmation-box.
* Graduate this to an origin-trial.

Bug: 1136942
Change-Id: I81333274075cd56d7e628a8a0eb025b1ae08645a
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2500841
Reviewed-by: Daniel Cheng <[email protected]>
Reviewed-by: Guido Urdaneta <[email protected]>
Commit-Queue: Elad Alon <[email protected]>
Cr-Commit-Position: refs/heads/master@{#823498}
pull bot pushed a commit to Mu-L/chromium that referenced this pull request Nov 6, 2020
This is the second step in implementing getCurrentBrowsingContextMedia behind
a runtime flag.

TL;DR: This is an API for capturing the current tab.

Explainer:
https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI

Design doc:
go/get-current-browsing-context-media

Intent-to-Prototype:
https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ

PR against spec:
w3c/mediacapture-screen-share#148

Next steps:
* Implement the confirmation-box.
* Implement unit-tests that rely on the confirmation-box.

Bug: 1136942
Change-Id: I8b25baa85565999ec44ed2f1b0bd1e19d6f148c4
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2502628
Reviewed-by: Guido Urdaneta <[email protected]>
Reviewed-by: Yuri Wiitala <[email protected]>
Reviewed-by: Robert Kaplow <[email protected]>
Reviewed-by: Sergey Ulanov <[email protected]>
Commit-Queue: Sergey Ulanov <[email protected]>
Cr-Commit-Position: refs/heads/master@{#824534}
blueboxd pushed a commit to blueboxd/chromium-legacy that referenced this pull request Nov 26, 2020
…edia API."

This reverts commit 5aea604.

Reason for revert: This CL is likely the cause of build failure for Linux ChromiumOS MSan Tests and Linux Chromium OS ASan LSan Tests (1)
First occurance: https://ci.chromium.org/p/chromium/builders/ci/Linux%20ChromiumOS%20MSan%20Tests/21455 and https://ci.chromium.org/p/chromium/builders/ci/Linux%20Chromium%20OS%20ASan%20LSan%20Tests%20%281%29/38935
Failed tests: GetCurrentBrowsingContextMediaDialogTest.DefaultAudioSelection
GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWhenWindowClosed
GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWhenWindowClosedWithoutCheckboxTicked
GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWithAudioShare
GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWithAudioShareFalse
GetCurrentBrowsingContextMediaDialogTest.DoneCallbackCalledWithNoAudioShare
GetCurrentBrowsingContextMediaDialogTest.DoubleTapOnShare
GetCurrentBrowsingContextMediaDialogTest.ShareButtonAccepts

Original change's description:
> Implement the confirmation-box for getCurrentBrowsingContextMedia API.
>
> Rebased on top of https://chromium-review.googlesource.com/c/chromium/src/+/2502628
>
> UI without audio capture: https://drive.google.com/file/d/1SA9vuDOkQjnioBfmAaiOjqoXGBUmVw22/view?usp=sharing
> UI with audio capture: https://drive.google.com/file/d/1jcncgHsF6L_o3D5Jc3UJ6pkjwQAKtoVl/view?usp=sharing
>
> This change relates to the UI code that is added to support the new getCurrentBrowsingContextMedia API.
>
> This is an API for capturing the current tab.
>
> Design doc:
> go/get-current-browsing-context-media
>
> Intent-to-Prototype:
> https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ
>
> PR against spec:
> w3c/mediacapture-screen-share#148
>
>
> Bug: 1136942
>
> Change-Id: I8e72023d944df3d7e996ad3acea7527c34569868
> Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2489991
> Commit-Queue: Palak Agarwal <[email protected]>
> Reviewed-by: Guido Urdaneta <[email protected]>
> Reviewed-by: Peter Boström <[email protected]>
> Reviewed-by: Elad Alon <[email protected]>
> Reviewed-by: Elly Fong-Jones <[email protected]>
> Cr-Commit-Position: refs/heads/master@{#831017}

[email protected],[email protected],[email protected],[email protected],[email protected]

Change-Id: I25b9e79d7eb61b5e43961df61999fd8c20954c8f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: 1136942
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2560358
Reviewed-by: Maggie Cai <[email protected]>
Commit-Queue: Maggie Cai <[email protected]>
Cr-Commit-Position: refs/heads/master@{#831228}
@jan-ivar
Copy link
Member

I agree with @annevk here on naming. The document is the largest object being captured.

If tab X captures itself, ...

@eladalon1983 When we say something "captures itself", that something is the document, not the tab. → getDocumentMedia.

Implementation influencing API: The method of capturing done by GetDisplayMedia (and this version) involves grabbing the framebuffer after the browsing context has been rendered, I believe.

@alvestrand Ah, I couldn't figure out why the API allowed an iframe to capture its embedder, when none of the use cases require it. → getTopLevelDocumentMedia would also be a mouthful.

I'd humbly suggest an iframe only capture itself. A simpler story, and a useful cropping tool to boot.

For implementations, cutting out the relevant rectangle from the framebuffer is hopefully not too difficult.

@eladalon1983
Copy link
Member Author

eladalon1983 commented Nov 30, 2020 via email

@annevk
Copy link
Member

annevk commented Dec 1, 2020

Isn't the document of the frame being captured there? Or do you mean it would include nested documents? Again though, "browsing context" doesn't capture that at all. They're just an abstract holder of a sequence of documents, only one of which is currently active. (And depending on how history is revamped that model might change a bit still.)

@eladalon1983
Copy link
Member Author

eladalon1983 commented Dec 1, 2020 via email

@jan-ivar
Copy link
Member

jan-ivar commented Dec 1, 2020

An iframe only capturing itself would not cover some interesting use cases. For example, a document being presented to a VC, by embedding an iframe of the VC's application inside the document-editor application.

@eladalon1983 A security property I like about "capture itself", is requiring explicit code in the capture target. Easy to grasp, ensures buy-in of the target, and no-one has to worry about being captured by other origins (except through the existing hyper-user-driven getDisplayMedia picker API).

I was even wondering about enforcing this by checking current global object == relevant global object, to enforce this:

await iframe.contentWindow.navigator.mediaDevices.getDocumentMediaWhatever() // SecurityError

Or a game capturing footage of itself by embedding an iframe which also contains code and visible controls for managing the capture, annotating it, and uploading it to a remote server, possibly one which streams it to remote viewers.

I don't understand this use of "itself". Can't it put code in the parent as well, use postMessage etc? We seem to be making much stronger assumptions about the capture target's involvement elsewhere, e.g. "Since the app can take for granted that the captured content is of itself, it knows how to crop sensibly".

I feel strongly we shouldn't be over-capturing, only to create a need for inventing cropping APIs next.

I think we can lean on apps to use iframes to capture exactly what they want to send and no more. This is the web after all, so I'd aim for stronger integration than the existing modality.

@eladalon1983
Copy link
Member Author

eladalon1983 commented Dec 1, 2020 via email

@youennf
Copy link
Collaborator

youennf commented Dec 1, 2020

I am still unclear about the goal of the API, which makes it hard to discuss the API surface.
Either we are talking of a privileged API, thus there is a prompt somewhere. In that case, we should investigate how much different it would be from getDisplayMedia, how the UI would be more intuitive and so on.
If we are talking about a no-prompt approach, this is another story where API could be at element level for instance like fullscreen, and we could constrain the element properties.
Can somebody clarify which approach is actually envisioned?

@eladalon1983
Copy link
Member Author

eladalon1983 commented Dec 1, 2020 via email

@jan-ivar
Copy link
Member

jan-ivar commented Dec 2, 2020

I don't think we should introduce artificial reasons to crop, nor assume cropping will ever be added as a feature since it's a slippery slope to image processing, something this WG appears to be leaning more toward raw media access to solve.

Shelving it for the time being, let's please examine the scenario of a game running in the browser, and wanting to stream itself to a service like Twitch. The streaming service could "publish" an iframe or a script that can be embedded in a game, implementing that functionality. The alternatives I can think of are less preferable. They
include:

(a) Each game re-implementing the streaming capability, probably by importing the streaming service's code into their own codebase. I don't think that's a reasonable alternative, as the copied code would be running same-origin, and would have to be scrutinized before being imported;

Exactly. To protect users from dubious information-harvesting JS libraries, I think I'd prefer this to receive the same level of scrutiny that a service provider performs to protect itself.

I don't think we should make it easier to export user trust to entities the service provider itself doesn't trust, because if a service provider doesn't trust a library then users probably shouldn't either.

@eladalon1983
Copy link
Member Author

eladalon1983 commented Dec 2, 2020 via email

@eladalon1983
Copy link
Member Author

eladalon1983 commented Dec 2, 2020 via email

@jan-ivar
Copy link
Member

jan-ivar commented Dec 2, 2020

The service provider should not, IMHO, be forced to the decision of either not trusting the (specific) third-party it at all, or trusting it enough to allow it to run same-origin.

@eladalon1983 but this feature undermines many of the same-origin protections you get from iframing in the first place.

... Allows the application the knowledge of the contents of the capture, ...

Right, and this knowledge adds risk.

Whatabout getDisplayMedia?

While getDisplayMedia already has this problem IF the user chooses the same tab, at least that scope violation is 100% user-driven, and some form of social engineering is needed to make it a reliable exploit (it doesn't help that Chrome fails to warn about this when it should).

I think it's fair to say getDisplayMedia is flying close to the sun already and exists because it is backed by such a highly compelling use case (web conference presentations). If you were to ask me if I think it has too many security mitigations, I'd say no. That's why I don't find it compelling to remove any of them.

I do however find the idea of enabling apps to stream themselves into web conferences appealing, provided it can be integrated safely.

@domenic
Copy link

domenic commented Feb 24, 2021

Where did the naming discussion on this land? The term "browsing context" has not been exposed to web developers in any web platform API so far, and it's a safe bet that web developers won't already know precisely what it is.

It's also worth pointing out that the definition of browsing context is somewhat in flux (precisely because it's not web-developer exposed); we're currently working on transitioning that single concept into three: browsing context, browsing session, and top-level navigable. You can learn more about that in whatwg/html#5767 and whatwg/html#6356.

@eladalon1983
Copy link
Member Author

eladalon1983 commented Mar 16, 2021

Chrome Security is of the opinion that a confirmation-only flow for gCBCM (getCurrentBrowsingContextMedia) would require security measures similar to those which @jan-ivar has suggested. Namely, a combination of (a) site-isolation and (b) a new COEP-like header for opting in to capturability by embedder. I think we have the following issues to resolve, then:

Name

We are open-minded about this. I personally like navigator.mediaDevice.getCurrentTabMedia. I think it’s succinct and clear. I have also considered getThisTabMedia, but I find that it’s not as clear whether “this” refers to “tab” or to “media.” Wdyt?

Security Measures

I think we can continue this discussion in thread #155.

Behavior when security-measures do not hold

The application might try to call gCBCM from almost any context.

If gDM is not allowed to be called from that context, then gCBCM should also not be permitted. (For example, calling gCBCM from an iframe that does not have the prerequisite display-capture permission from its embedder.)

gCBCM introduces new requirements in addition to gDM’s - (a) site-isolation and (b) a new HTTP header (definition pending in separate thread). If either of these conditions does not hold at the time when gCBCM is called, we would like to specify that the user agent SHOULD (or MAY) fall back to gDM-like behavior. That is, display a dialog that does not limit the selection of the by the user to just this tab.

Rationale for Fallback
We expect the most common failure of gCBCM calls to occur due to a third-party iframe served without the new HTTP header. This will be especially true in the beginning, before the new header has had time to gain wide adoption, but even after adoption is widespread, it will be difficult for complex applications to ensure that all embedded content “plays nicely” at all times and includes the necessary header. Third-parties could be migrating between servers, falling back on misconfigured servers, etc. So failure to call gCBCM should be expected even in otherwise well-structured applications.

One possibility for applications is to inspect the failure reason of gCBCM calls, and if it’s due to the missing header, to call getDisplayMedia “manually” as a fallback. This is possible, but clunky. It’s also arguable that exposing the exact reason gCBCM was rejected is undesirable. (Normally, a top-level application would not be allowed to see what HTTP headers embedded content uses, let alone embedded content twice-removed. Normally, things either load or don’t.)

We believe that it will be helpful if we specify that the user agent SHOULD default to a gDM-like dialog (or possibly MAY). The only new problem introduced - uncertainty by the application over whether the user really chose the current tab - can be resolved in several ways. It may be left to the application (e.g., pixel test), or we could discuss more ergonomic solutions - using a returned value, using the label of MediaStreamTrack (Firefox currently uses windows’ titles as the label), etc.

Audio Playback Suppression

I will be using “playback-suppress” as shorthand for “mute the audio from the speaker’s point-of-view, but still make this audio capturable.”

While a user is capturing audio from a tab, it’s sometimes useful to prevent that tab from performing audio playback through the speakers. This is useful, for example, for performing echo cancellation, which works better if the audio captured on the playback-suppressed tab is sent back to some like is other participants’ audio, and is seen by the echo canceller as just another remote-sourced audio stream played out over the speakers. I see some options here:

  1. Allow a parameter or a constraint to determine whether the captured tab is playback-suppressed or not. (Note that capturing is user-controlled, so malicious applications cannot just use this to mute other tabs willy-nilly.)
  2. Reinterpret an existing control, constraint or setting to mean that the captured tab should be playback-suppressed (e.g. echoCancellation).
  3. Specify that the behavior of capturing a tab playback-suppresses its audio, always, and that’s it.
  4. The most ambitious change (goes beyond the scope of this WG), but IMHO the most desirable - we should note that atm, it is not possible for an app to suppress its own speaker-playback. Or an iframe’s playback. That means that by embedding an iframe, the application accepts whatever audio-playback the embedded frame happens to perform. It would be good if documents could playback-suppress (a) themselves or (b) specific embedded iframes.

@annevk
Copy link
Member

annevk commented Mar 16, 2021

Name: I think the concerns raised with browsing context apply equally to tab. This is much more narrow-scoped than a tab. "Page" might be okay and has some precedent in CSS.

(Also, thanks for the update!)

@jan-ivar
Copy link
Member

Name: I think the concerns raised with browsing context apply equally to tab. This is much more narrow-scoped than a tab.

@annevk I think this is a case where "tab" is narrower-scoped than page, because a page may have a much larger surface area, and we only want to capture what the user sees at the moment. That is: the intersection of the top-level browsing context's viewport and the rendering boundaries of the requesting document, including any content overlaid by CSS (i.e. excluding any content 100% occluded by CSS at the moment):

image

...OR (to complicate matters) depending on the outcome of the above discussion on letting an iframe capture its parent:

image

In either case, this seems best for both privacy and efficiency (we don't want to have to re-render a page for this). If the user scrolls the page they may reveal more info.

So I've been calling this getTabMedia. Though perhaps getCurrentTabMedia is more precise? — In either case, the requester cannot outlive its target, so I wouldn't worry about "tab" implying capture past navigation.

@jan-ivar
Copy link
Member

jan-ivar commented Mar 24, 2021

If either of these conditions does not hold at the time when gCBCM is called, we would like to specify that the user agent SHOULD (or MAY) fall back to gDM-like behavior.

I'd rather keep this separate from gDM, even using a separate permissions policy, since the security properties are quite different.

The callsites may not even be the same always. E.g. this is the target being told to capture itself and beam into a meeting, vs the main window where gDM may be called to present today.

I see no way around apps needing to check for errors. Especially if we go with the model where capture may terminate on a non-opt-in iframe loading. Apps would need to catch that error too, and respond appropriately.

@annevk
Copy link
Member

annevk commented Mar 24, 2021

That's a fair criticism of page, but I don't think tab really captures it either. This doesn't match the lifetime of a tab and tab is rather implementation-specific and might not exist on all platforms. (Also, tab in implementations is rather analogous to top-level browsing context (or browsing session, once we have that) and this clearly isn't that.)

@dontcallmedom
Copy link
Member

re naming, wouldn't viewport be a better characterization of the target of the capture?

@annevk
Copy link
Member

annevk commented Mar 24, 2021

Yeah, I think viewport would work, though with the caveat that in theory nested documents have their own viewport and I don't think there is a clearly defined term for the composition of them.

@jan-ivar
Copy link
Member

The lifetime of a capture target >= lifetime of a capture.

Sites today can capture the display (with getDisplayMedia) and the user (with getUserMedia), both of which (hopefully!) outlive the page and its capture of them. I don't feel users or developers are confused by that.

getViewportMedia could work, but is it distinct enough from getDisplayMedia?

FWIW, "screen", "window", and "tab" are the layterms around screen-sharing UX in browsers today.

AFAIK, screen-sharing isn't available yet on mobile, but I believe the term "tab" exists there as well as an organizing container/layterm for browsing context.

My objection to "browsing context" wasn't its scope, but it being technical under-the-hood term previously unexposed in the platform.

@annevk
Copy link
Member

annevk commented Mar 24, 2021

My objection is the scope. 😊 Both screen (has precedent with window.screen too) and user are clear, but with browsing context it's not clear that capture is canceled if the user navigates; navigation is just a detail of a browsing context. (I hope we don't expose "tab" anywhere as a thing either.) That navigation cancels capture suggests it's a smaller unit, which would be document/page/viewport.

(To add, I wouldn't find it problematic to expose browsingContext or something equivalent as a term, if there was concept that matched it.)

@jan-ivar
Copy link
Member

Fair enough. Of those I think I'd pick viewport, to emphasize we're not necessarily capturing the whole document.

@eladalon1983
Copy link
Member Author

I also like viewport. There are certain browser UI elements that are bound to specific tabs, but which do not get captured. The developer console, for instance. Using "tab" does not make immediately clear that those are not captured. So I assume it's getViewportMedia, then? Or some variation thereof?

@jan-ivar
Copy link
Member

A precedent for the term "viewport" being exposed to the web is window.visualViewport.

I guess it remains to be seen whether we'd capture the layout viewport or the visual viewport on mobile.

@eladalon1983
Copy link
Member Author

If either of these conditions does not hold at the time when gCBCM is called, we would like to specify that the user agent SHOULD (or MAY) fall back to gDM-like behavior.

I'd rather keep this separate from gDM, even using a separate permissions policy, since the security properties are quite different.

Requiring a separate permission policy is fine by me.

Let's assume that getViewportMedia is called from a context that has both the old display-capture permission that gates getDisplayMedia, as well as the new permission that we end up introducing. I think in this case, calls to getViewportMedia should still result in some gDM-like user-prompt if [either site-isolation or the new header] is missing.

Rationale - we expect this to happen >99% of the time, at least in the early days, and we don't believe the feature will be useful without it. The compromise that Chrome has reached internally between the demands from Security and the needs of potential feature-customers, is that a confirmation-only dialog is displayed if all of the new security requirements are satisfied, and an explicit-selection dialog is shown otherwise, which is generally gDM-like, but highlights in some spec-compliant way that the application would like to get the current tab. (For example, consider a UA that normally offers windows as the first option if gDM is called, but offers tabs as the first option if gVM-fallback-mode is used.)

At the bottom of my comment is an illustration of what Chrome thinks of using. I mention Chrome-specifics only so as to explain our motivation. Spec-wise, Chrome's specific dialog is of course out of scope. For the spec-change, I think the right way to go about it is to say that the user agent SHOULD/MAY fall back to any behavior that complies with the restrictions placed on gDM, but that this behavior MAY differ from the specific UA's usual gDM behavior. (Or maybe we can leave this "MAY differ..." part implicit.)

Lastly, we can make this fallback behavior temporary, giving sites time to adopt the security requirements that we introduce.

Here's what a gDM-like fall back can look like:
Screen Shot 2021-03-25 at 16 38 20

@jan-ivar
Copy link
Member

a confirmation-only dialog is displayed if all of the new security requirements are satisfied, and an explicit-selection dialog is shown otherwise, ... but offers tabs as the first option if gVM-fallback-mode is used.)

This would make getViewportMedia a weakened version of getDisplayMedia, which seems problematic.

I don't think we can infer that an app calling one API wants to fall back to calling the other in all circumstances. This seems app-specific, and a few lines of code:

let stream;
try {
  stream = await navigator.mediaDevices.getViewportMedia();
} catch (e) {
  if (e.name != "SecurityError") throw;
  stream = await navigator.mediaDevices.getDisplayMedia(); // ¹
}

This would be well-tested, because as you say: "we expect this to happen >99% of the time, at least in the early days".


1) If Chrome wants to weaken the already strained security properties of getDisplayMedia, it can do so here without melding APIs together, by ignoring spec recommendations and detecting this situation.

@eladalon1983
Copy link
Member Author

eladalon1983 commented Mar 25, 2021

  1. Why "weakened"? It presents a different choice that's equally unconstrained. It's not, IMHO, weaker. (And if called from the right kind of context, a completely different, confirmation-only choice.)
  2. We know of at least one feature-customer that wants this - Google Slides. The code you present that calls gDM if gVM fails, would not serve their needs, because it would still produce the usual gDM media-picker when gVM fails and gDM is called immediately after. Unless, that is, we implement it as "calls to gDM immediately after a gVM failure yield the new gDM behavior." But it would be tricky to define "immediately" in a way that could satisfy all customers. So:
  3. Another option is to add a parameter that controls this. Maybe something like any_surface_allowed. We could say that gVM with any_surface_allowed = true is allowed to be called from a context that does not satisfy [site-isolation and new-header], but MUST allow the user an unconstrained choice, and MAY present a different order of options compared to gDM. (The MAY part can be implicit.)

@jan-ivar
Copy link
Member

jan-ivar commented Mar 25, 2021

Why "weaker"?

From the spec: "User Agents are encouraged to warn users against sharing browser display devices as well as monitor display devices where browser windows are visible, or otherwise try to discourage their selection on the basis that these represent a significantly higher risk when shared.".

See also crbug 920752.

it would be tricky to define "immediately" in a way that could satisfy all customers

I'm merely making the point that it's completely doable, from the same justifications made for wanting to standardize it. Except Chrome would (hopefully) be alone in making this convenience/safety tradeoff. So any definition would do, e.g. the same run of the event loop. If Chrome would rather not do it because it's non-standard, that would be understandable, and desirable from my point of view.

I'd be opposed to standardizing any parameter related to this, because I think it's bad for privacy for the reasons stated.

would not serve their needs,

We are standardizing gVM specifically to serve this need. I see no new information to revisit gDM.

@eladalon1983
Copy link
Member Author

I'd be opposed to standardizing any parameter related to this, because I think it's bad for privacy for the reasons stated.

The attacks we have discussed so far all required a single frame to perform. A malicious application can preload occluded cross-origin iframes and flash them to the screen for the duration of a single frame immediately after the user approves screen-capture. As soon as the user approves, it becomes too late to hide anything from the app. Switching tabs, minimizing windows, etc. - such steps do not offer protection from a malicious app. The decisive moment is when the user accepts.

Currently, Safari offers only the entire screen; Chrome and Edge offer screen/window/tab, with the first option on offer being screen. Most users have a single screen, and it's showing the current tab at the moment capture starts. Any danger that exists with capturing the current tab, also exists when capturing the current screen - and more (e.g. see titles of other tabs).

Dialogs offering unconstrained choice to the user, but with focus moved away from current-screen towards current-tab, are more secure than dialogs that push towards sharing the entire screen. Helping browsers move to more secure options creates a more secure Web. In order to be implemented, it helps if work on getViewportMedia in Chrome is motivated by a Google product that has immediate need for it. A product that intends to use it, and is therefore interested in funding that work.

If you can help me find a variation that satisfies everyone¹, or that can be an acceptable compromise for everyone, I would be very grateful. This can include any old/new idea, or any temporary compromise. I believe it will also be good for security and privacy on the Web.


¹ Including one customer for this feature which will only be able to adopt COOP+COEP in the mid-term future, and the new header only in the long-term future. And this customer is the motivation for our investment of headcount in this.

@dontcallmedom
Copy link
Member

I don't think falling back to gDM when gVP fails is a good approach:

  • it assumes gDM will always be the right fallback; there may be cases when doing nothing or falling back to captureStream would be better alternative
  • as @jan-ivar says, the fallback option of gVP can't be a path to something that has weaker security characteristics than both gDM & gVP, esp since it's an easy to activate path for a possible attacker

Regarding the fact that sharing a full-screen is as or more dangerous than sharing a tab in a single screen scenario, I think part of the reasoning is that users understand much better than sharing your entire screen is potentially scary, whereas they might think it is benign to share the tab that is asking for screen sharing. So it is not that it is safer, but that users will have more accurate understanding of the risks. And conversely, because developers might expect users will be offered a scary choice, it makes it a less attractive option for attackers.

Separately, we've heard several times that the current lack of ability for developers to guide the capture surface in gDM leads to suboptimal UX - I wonder if we should look into reinstating a way for developers to give a hint, which UAs could choose whether and how to take into account (e.g. based on previous interactions of the user with the site, maybe based on the cross-origin isolation status of the tab if a tab is being requested, …) - but that would need to be a separate discussion.

@youennf
Copy link
Collaborator

youennf commented Mar 26, 2021

As of gDM as a back-up, I think we should first let websites experiment with it themselves. UAs can also learn from it as well. It is easier to add additional parameters later if we think there is value based on that.

I see two requirements from that discussion:

  • We should gate gvp on transient activation like done for gdm
  • We should guarantee that even if gvp fails, transient activation stays valid in the reject handler so that gdm can be called right after

I wonder if we should look into reinstating a way for developers to give a hint

Before going down that road, I think UA implementors should investigate what they can do on their own.
There is nothing in the spec preventing UAs to specialise prompts based on what users actually did in the past, even though this might be tricky from a usability and security point of view. Adding parameters given by the web site to the mix might make things more complex.

@jan-ivar
Copy link
Member

If you can help me find a variation that satisfies everyone¹, or that can be an acceptable compromise for everyone, I would be very grateful.

@eladalon1983 I showed how this customer can accomplish their workflow with a few lines of JS, and how browsers that want to can detect and optimize the UX flow in that case, even though we don't recommend it. Can you help me understand why they'd need this behavior standardized for all apps?

As to browser UX, specs have a hard time mandating it. Where strong recommendations have had teeth in the past, they're paired with good privacy or security arguments, to convince (or shame) implementers with. So proposing a parameter to induce UX (going against strong privacy recommendaetions) seems like a dead-end.

We should gate gvp on transient activation like done for gdm

@youennf Agreed, among a list of other things.

if gvp fails, transient activation stays valid in the reject handler so that gdm can be called right after

Transient activation is time-based, so we may not need to say anything about it since gvp would fail immediately.

@youennf
Copy link
Collaborator

youennf commented Mar 26, 2021

Transient activation is time-based, so we may not need to say anything about it since gvp would fail immediately.

I do not see how it can be guaranteed today with a time-based approach.

The fact that gvp fails immediately is an implementation decision. Asynchronous queries might be required to actually make it fail. Even if it fails synchronously, there might be some time spent in doing this computation.
Also, even if gvp fails immediately, the reject handler will not be called immediately but later on, once other JS is executed.

@eladalon1983
Copy link
Member Author

eladalon1983 commented Mar 26, 2021

(One more suggestion, probably the last one on this topic. The main difference here is that all security restrictions now always apply.)

What if any_surface_allowed did not result in a gDM-like fall-back, but rather, if it specified something along the lines of:

  • ...all security restrictions to gVM apply. If any doesn't hold, abort. If they all hold, proceed.
  • In response to gVM, the user agent SHOULD use a confirmation-only dialog¹² (accept/reject).
  • If any_surface_allowed = true is specified, the user agent MAY present the user the ability to choose other sources. (The user agent still SHOULD start out with a confirmation-only dialog, but MAY display an option to "click here to select another source." Clicking that can then transition to a normal gDM dialog.)

With this, the standard does not allow gVM to become a different version of gDM. Rather, it allows gVM to become a normal version of gDM. It makes the process more user-driven. Wdyt?

Here is a mock where any_surface_allowed is unspecified or is set to false:
base

Here is a mock with any_surface_allowed set to true:
any_source

Important differences from previous suggestion:

  1. Fails if the security restrictions are not met, for either any_surface_allowed=false/true.
  2. Note that the user agent MAY do this.

  1. It essentially restricts the user-selection to only the current tab.
  2. Audio notwithstanding; this much the user should be able to refuse independently of video, as with gDM. E.g. using a checkbox.

@jan-ivar
Copy link
Member

Let's close this and move getViewportMedia to https://github.com/w3c/mediacapture-screen-share/issues/155 since we've reached WG consensus (slide) to site-isolate the API.

This issue gets credit for birthing getViewportMedia, but has become a source of confusion. The (early) name "getCurrentBrowsingContextMedia" is both a Chrome origin trial (without site-isolation), and now also a different competing "hybrid" picker-based API proposal from Google without support from this WG.

@jan-ivar jan-ivar closed this May 13, 2021
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this pull request Oct 14, 2022
TL;DR:
* This is an API for capturing the current tab.
* This CL handles the Blink part.

Explainer:
https://docs.google.com/document/d/1CIQH2ygvw7eTGO__Pcds_D46Gcn-iPAqESQqOsHHfkI

Design doc:
go/get-current-browsing-context-media

Intent-to-Prototype:
https://groups.google.com/u/3/a/chromium.org/g/blink-dev/c/NYVbRRBlABI/m/MJEzcyEUCQAJ

PR against spec:
w3c/mediacapture-screen-share#148

Next steps:
* Implement the confirmation-box.
* Implement unit-tests that rely on the confirmation-box.
* Graduate this to an origin-trial.

Bug: 1136942
Change-Id: I81333274075cd56d7e628a8a0eb025b1ae08645a
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2500841
Reviewed-by: Daniel Cheng <[email protected]>
Reviewed-by: Guido Urdaneta <[email protected]>
Commit-Queue: Elad Alon <[email protected]>
Cr-Commit-Position: refs/heads/master@{#823498}
GitOrigin-RevId: bc949e9d94ea6496b15153f5486a12608db7152b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.