Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinstate strong language on permission ending when tracks stop, lost by editorial mistake #387

Closed
jan-ivar opened this issue Aug 26, 2016 · 36 comments

Comments

@jan-ivar
Copy link
Member

In February, the spec said:

"When all tracks using a source have been stopped or ended by some other means, the source is stopped. Unless there is a stored permission for the source in question, the given permission is revoked and the User Agent SHOULD also remove the "permission granted" indicator for the source."

This predates editorial attempts at integrating with the permissions spec, and says that while removing the indicator is optional, revoking permission is not, unless it's been "stored".

Compare to today:

"When all tracks using a source have been stopped or ended by some other means, the source is stopped. If there is no stored permission for the source, the User Agent SHOULD also remove the "permission granted" indicator for the source. If the data is being generated from a live source (e.g., a microphone or camera), then the User Agent SHOULD remove any active "on-air" indicator for that source."

This no longer says anything should happen to the permission, and makes it sounds like stopping all tracks SHOULD revoke permisson. We should clarify that this is not the case with stronger language closer to the February consensus.

@jyasskin
Copy link
Member

jyasskin commented Aug 30, 2016

Dropping permission when all tracks are stopped means that a videochat site with a "mute video" button has to choose between

  1. Leaving the hardware camera light on while video is "muted", which means users have to blindly trust that the site's no longer transmitting, or
  2. Needing to re-prompt for permission when the user unmutes the video.

Neither of these is desirable. However, since:

  • the spec allows a "stored permission" for HTTPS sites,
  • such "storage" can live for just the lifetime of the realm if the UA wants, and
  • all videochat applications should be HTTPS anyway,

this doesn't affect much in practice.

@jan-ivar
Copy link
Member Author

(I think you'll find that "stored permission" meant "persistent permission to origin across sessions" in those days. I think there's evidence of that in the spec. I don't think anyone considered realm).

I agree staring at the camera light on mute is undesirable, and simultaneously I think we'd ideally want track.enabled to be viable here (because that's the right API).

The mediacapture spec actually says that "When all tracks connected to a source are muted or disabled, the "on-air" or "recording" indicator for that source can be turned off; when the track is no longer muted or disabled, it MUST be turned back on."

No browser seems to implement this today. For us, there's a UX challenge in showing that the microphone is off, yet the site still has permission to turn it back on (most cameras have lights that can be turned off while leaving the in-browser recording indicator intact, but mics have no light).

The purpose of "revoke-on-stop" was to guarantee that when the indicator went out, it couldn't come back on, as opposed to "blindly trusting" it wouldn't, so I'd still like to see that guarantee for that the spec.

I wouldn't rule out (2) as entirely undesirable in a privacy-obsessed browser/browsing mode, but I'd nonetheless like to see us converged on track.enabled as the API for this use-case.

@jan-ivar
Copy link
Member Author

I've filed bug 1299515 for turning off device(s) during mute in Firefox. If someone wants to file a similar bug on Chrome that'd be great!

@jyasskin
Copy link
Member

https://crbug.com/642785

@alvestrand
Copy link
Contributor

I filed spec bug #389 to get the camera light question resolved (triggered by the Chrome bug).

@alvestrand
Copy link
Contributor

As to the original intent: I think we changed the language to say that when you check whether you can open a device, you first check whether it is already open, and if it is, you skip the check against permission. This preserves the previous behavior, but avoids using the word "permission" both for what the permissions spec calls permission and for "whether opening the device will succeed".

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 2, 2016

@alvestrand I'm missing how any language on open can preserve that permission (unless persisted) must end after .stop().

@alvestrand
Copy link
Contributor

@jan-ivar that's because you persist in using the term "permission" both for permissions that are the result of the UA divining the user's intention (the thing the permissions spec talks about) and the ability to open the device.

As I said on another thread, I've tried to not do that, because it left me confused even when I wrote the text myself. So instead, I removed the word "permission" when I was not talking about the thing the permissions spec talks about.

@jyasskin
Copy link
Member

jyasskin commented Sep 2, 2016

I wonder if it would be reasonable to remove the distinction between "stored" and non-stored permission entirely. It's confusing that remembering a decision for the lifetime of a single realm/global object is considered "storing" it, even though it's only stored in memory in that case.

Instead, you could use the relatively new language in the permissions spec as follows:

If there is an active track from a device with id activeDeviceId, then the permission state for {name: name, deviceId: activeDeviceId}, where name is "camera" or "microphone", must be "granted".

This tells sites when a call won't show a prompt.

The UA MUST NOT use any information from the user's interaction with a non-secure context to affect its impression of the user's intent for a different environment settings object.

This provides the security guarantee that https://tools.ietf.org/html/draft-ietf-rtcweb-security-arch-12#section-5.2 wants.

This loses the requirement that permission state stop being "granted" when the last track is stopped, but that seems like it inappropriately enshrines Firefox's UI decisions in the spec anyway. You could get it back with:

In a non-secure context, if there is no active track from a device with id activeDeviceId, then the permission state for {name: name, deviceId: activeDeviceId}, where name is "camera" or "microphone", must not be "granted".

You can drive the "permission granted" or "device accessible" indicator directly from the permission state instead of changing it in getUserMedia() and stop().

@stefhak
Copy link
Contributor

stefhak commented Sep 5, 2016

In a non-secure context, if there is no active track from a device with id activeDeviceId, then the permission state for {name: name, deviceId: activeDeviceId}, where name is "camera" or "microphone", must not be "granted".

My understanding is that Firefox wants to do this also for secure contexts unless the user says it's OK to remember between sessions.

@alvestrand
Copy link
Contributor

alvestrand commented Sep 5, 2016

Re stefhak's comment: Firefox is free to do this. The sentence doesn't forbid it.
The spec can't require it, since there's no consensus that this is the right thing always.

@stefhak
Copy link
Contributor

stefhak commented Sep 5, 2016

@alvestrand yes, that's right (now that I read it again!)

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 6, 2016

As to the original intent: I think we changed the language to say that when you check whether you can open a device, you first check whether it is already open, and if it is, you skip the check against permission. This preserves the previous behavior,

@alvestrand It does not. Checking whether a device is already open affects overlapping gUM requests only (i.e. the app requests gUM again before calling stream.stop(), a case we agree on, but a separate issue). It does not preserve the February behavior that requesting gUM after stream.stop() MUST cause a prompt "unless there is a stored permission" (stored by the user).

If this edit was a mistake, it should be undone. If it wasn't, then I can't find the necessary group consensus that we meant to change this required behavior.

@jyasskin Thanks for the link, but I disagree http vulnerability is the spec's only concern here (the other being privacy). As I mention in #389 (comment), the spec establishes privacy as a concern when it suggests users should see a "device accessible" indicator (separate from "live recording"), and introduces concepts to imbue such an indicator with the necessary power, chiefly the "stored permission".

This is clearly echoed under Best Practices: Stored permissions where it says "the choice [to store permission] needs to be apparent to the user",

and

"When permission is not stored, permission will last only until such time as all MediaStreamTracks sourced from that device have been stopped."

Combined, this guarantees users that access cannot resume once the "device accessible" indicator goes away, unless they've made an "apparent" "choice" to allow such a thing.

This is consistent with the February language, but the edit is not.

@alvestrand
Copy link
Contributor

Good that we agree that the gUM behavior when a stream is open is present in the spec!
The February language referenced an undefined term ("stored permission") - linking the word "permission" strictly to the permissions spec makes it well defined.

If requesting access to the device doesn't imply changing the stored permissions, I think we have consistent specifications (and agreement on intended behavior).

@jyasskin should we change the name of the "request permission" operation to "request access"? I believe it should retain the langauge about "new information about the user's intent may be gathered here", but it shouldn't mandate that the decision to grant access results in a change in a stored permission's status.

@jyasskin
Copy link
Member

jyasskin commented Sep 6, 2016

@alvestrand re "request access", the operation's full name is "request permission to use a descriptor", and I think both "request permission" and "request access" are shorthands for that. (It'd be even more correct to say "request permission to use the feature described by a descriptor", if anyone really wants to spell it all out.)

Getting permission to access/use a thing does not imply that this permission will be "stored". More precisely, a site getting permission or access in one realm/tab/whatever should not imply that the site will have permission/access in a different realm/tab/whatever.

The Permissions spec used to assume that all permission was remembered across realms, and there might still be some remnants of that assumption, but I've been trying to excise it everywhere I've found it. Any permissions that want to force cross-realm grants have to say so explicitly, like https://w3c.github.io/permissions/#persistent-storage.

Definitely let me know about, or send PRs to fix places in the Permissions spec that imply that granting permission always stores that permission.

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 7, 2016

If requesting access to the device doesn't imply changing the stored permissions, I think we have consistent specifications (and agreement on intended behavior).

@alvestrand I think I see what you mean now, that by saying we skip the check for permission when a device is already open, we avoid talking about any access implied by a successful request, or the scope of such implied access.

But I worry we need to be explicit about this implied access, or people will get confused.

I've used your "access" term above, but I don't think people will get it. "Permission" and "access" are synonymous to most people, and the distinction you're making was too subtle for me until today.

It's confusing terminology anyway to say a request can succeed without any permission having been given.

Besides, according to @jyasskin :

Getting permission to access/use a thing does not imply that this permission will be "stored".

So he says permission is implicit in any successful request, even when never stored, which matches how people talk. What got lost from the February text ("Unless there is a stored permission for the source in question, the given permission is revoked") was the revocation of this implicit "given permission" we got from the request algorithm. I don't know the perfect wording, but without explicitly stating when it must end, it seems unclear - at least to me - whether e.g. Edge's https behavior is compliant.

@stefhak
Copy link
Contributor

stefhak commented Sep 7, 2016

Reading the Editor's draft I find the following blocks relevant:

In the sections about MediaStreamTrack:
A script can indicate that a MediaStreamTrack object no longer needs its source with the stop() method. When all tracks using a source have been stopped or ended by some other means, the source is stopped. If there is no stored permission for the source, the User Agent SHOULD also remove the "permission granted" indicator for the source. If the data is being generated from a live source (e.g., a microphone or camera), then the User Agent SHOULD remove any active "on-air" indicator for that source. An implementation may use a per-source reference count to keep track of source usage, but the specifics are out of scope for this specification.

and

A muted or disabled MediaStreamTrack renders either silence (audio), black frames (video), or a zero-information-content equivalent. For example, a video element sourced by a muted or disabled MediaStreamTrack (contained within a MediaStream ), is playing but the rendered content is the muted output. When all tracks connected to a source are muted or disabled, the "on-air" or "recording" indicator for that source can be turned off; when the track is no longer muted or disabled, it MUST be turned back on.

In the section about gUM:
Retrieve the permission state for all candidate devices in candidateSet that are not attached to a live MediaStreamTrack in the current browsing context. Remove from candidateSet any device for which the permission state is "denied".

and

For the origin identified by originIdentifier, request permission for use of the devices, while considering all devices attached to a live MediaStreamTrack in the current browsing context to have permission status "granted", resulting in a set of provided media.

and

If the result of the request is "granted", User Agents are encouraged to include a prominent indicator that the devices are "hot" (i.e. an "on-air" or "recording" indicator), as well as a "device accessible" indicator indicating that the page has been granted access to the source.

To me it seems pretty clear. "request permission" links to the permission spec, which (if I parse it correctly) makes the UA deny, prompt the user or grant the request depending on the situation regarding for example stored permissions.

If all tracks are stopped followed by a gUM call, there would be a prompt if there was no stored permission.

That said, perhaps there would be a gain in modernizing the language as @jyasskin proposes (at the very least we should decide if we say '"device accessible" indicator' or '"permission granted" indicator').

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 8, 2016

To me it seems pretty clear. "request permission" links to the permission spec, which (if I parse it correctly) makes the UA deny, prompt the user or grant the request depending on the situation regarding for example stored permissions.

If all tracks are stopped followed by a gUM call, there would be a prompt if there was no stored permission.

@stefhak I think you're parsing it incorrectly. permission state says:

"2. If there was a previous invocation of this algorithm with the same descriptor and current settings object, returning previousResult, and the UA has not received new information about the user’s intent since that invocation, return previousResult. "

In other words, the definition we're now leaning on is a permission state that inherently lingers within the current realm, making any "stored" distinction meaningless. While the definition of "user's intent" is quite broad, it's hard to see how it would cover a script deciding to stop all its tracks.

That this is unintentional seems clear, as much of the language in mediacapture, such as the tracking of whether devices are open, becomes largely redundant in this new model, as it now serves only to override "user intent" (because the only time it'd have an effect would be to counter a user's intent to actively change "granted" back to "prompt").

This is a clash of models, and preserving intent through a transition of models is tricky, especially when the meaning of language shifts.

To do so, we need to take a snapshot of behaviors "before" (February), and "after", and compare. Do we agree that in February, Edge's https behavior was non-compliant, whereas today it's the norm? That's a bug, or a change in outcome I think warrants new WG discussion.

@stefhak
Copy link
Contributor

stefhak commented Sep 8, 2016

@jyasskin could you comment? "same descriptor and current settings object" is not 100% clear to me. Does it mean for the current "session", or even beyond?

And even if it means only the current session, it would be a change compared to what we had earlier. Is this intended? It is clear that we want to be able to allow "only once" for camera and microphone even if the page is not reloaded, but would that not be the case for other things too? For example, if I allow geoLocation once, close the lid of my lap top, drive for a couple of hours, should it be able to get my new location when I stop and open the lid?

@jan-ivar I am not sure what you mean by "Edge's https behaviors". @ShijunS this discussion may be something you are interested in.

@alvestrand
Copy link
Contributor

Reading the above, I'm concluding that we (media capture) need to have a "request permission" algorithm (which we are currently referencing from the permissions spec) that checks stored permissions, and on having the return be "prompt", prompts for permission, and returens "granted" or "denied", but does NOT (necessarily) cause "check for permission" to change its result.

That is:
Request permission (5.2)
-> check permission according to 5.1 (returns "prompt") (A)
-> prompt and return what the user wanted this time (B)

(time passes)

Request permission
-> check permission (C)

At point (C), unless new information has been gathered, "check permission" should return the result that is consistent with point (A). What's returned at point (B) is irrelevant (unless the UA has concluded that the user has given permission for future access).

That is, "this algorithm" from the part Jan-Ivar quoted refers to section 5.1 "Check permission", which is invoked by "request permission". It does not refer to "request permission".

This seems like it needs clarification in the permissions spec, since Jan-Ivar and I read it differently.
(Or I might be reading it wrong, in which case it needs changing.)

@stefhak
Copy link
Contributor

stefhak commented Sep 8, 2016

@alvestrand If your interpretation of the permission spec is correct we should indeed be fine. Waiting for @jyasskin (or someone else) to confirm. And I agree to that the permission spec should be clarified in this respect.

@jyasskin
Copy link
Member

jyasskin commented Sep 8, 2016

The "current settings object" comes from HTML, and it'll be different on a different tab or iframe, or after a reload. Realms, settings objects, and globals are 1-1-1. There's a PR waiting for an LGTM to make the use of settings objects a little more explicit: w3c/permissions#114.

Request permission to use depends on permission state, but it doesn't update the permission state unless the UA thinks that reflects user intent. That allows Firefox's geolocation behavior, where you can get a location back, and then get a prompt on the next request. @stefhak's example of closing the lid, driving for a while, and then wanting geolocation to prompt again, would have to be handled through "new information about the user's intent".

As @jan-ivar said, the getUserMedia() behavior of allowing subsequent calls to succeed, until the script calls .stop(), at which point calls prompt again, doesn't automatically fall out of the permissions spec's wording. I think you handle that with wording like revoke() uses: you explicitly say that "calls to .stop() are considered to be new information that the user intends to revoke permission to use this camera/mic/whatever" (on the assumption that they're in response to interaction with webpage UI). w3c/permissions@4ed0e25#diff-ec9cfa5f3f35ec1f84feb2e59686c34dR582 has some updated wording for revoke() that you should also look at before wording this.


I think Edge's https behavior is allowed in the February version of the spec. They have a "stored permission" that happens to be auto-revoked when the realm goes away. Even if we treat Best Practice 2: Stored Permissions as normative, it says

It is a User Agent choice whether it offers functionality to store permission to each device separately, all devices of a given class, or all devices; the choice needs to be apparent to the user, and permission must have been granted for the entire set whose permission is being stored, …

That says that it needs to be apparent to the user whether the UA is going to store each device separately, a class at a time, vs all devices together, and doesn't apply to the part of Edge's https behavior that we're talking about.

@ShijunS
Copy link
Contributor

ShijunS commented Sep 8, 2016

@jan-ivar, @stefhak, Just want to clarify that the "Edge's https behaviors" is an implementation decision while we still don't have full support to "stored" permission. I don't see conflict with the current spec. As I mentioned in another comment, the Edge model "is still evolving". It is likely we will simplify the logic to largely match the Chrome behavior when we implement the persistent or stored permission. Don't want to side track the discussion here, but welcome feedback in private channel on Edge implementation decision.

@alvestrand
Copy link
Contributor

@jyasskin the "behavior of allowing subsequent calls to succeed, until the script calls .stop()" is covered by current text in the getusermedia spec (not asking for permission if the device is already open). It's what happens after that where the permissions spec gets invoked again.
Chrome's behavior (always storing on granted permission) needs to be permitted too, so I'm fine with a reading that after the .stop(), we fall back to whatever the UA decided was the user's desire.

@jyasskin
Copy link
Member

jyasskin commented Sep 8, 2016

@alvestrand Oh right. Yeah, that's fine too. You might be able to simplify it by leaning more heavily on the permissions spec, but then you'd need the wording about stop() being new information.

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 8, 2016

Request permission to use depends on permission state, but it doesn't update the permission state unless the UA thinks that reflects user intent.

@jyasskin Thanks, I missed that part. Might it be clearer if we add (in bold):

"4. If the user grants permission, return "granted"; otherwise return "denied". If the user’s interaction indicates they intend this choice to apply to this or other realms, then treat this as new information about the user’s intent for other realms with the same origin." ?

I sort of read "other realms" as an implication that this automatically affected the current realm, when it sounds like it only affects the current request.

That says that it needs to be apparent to the user whether the UA is going to store each device separately, a class at a time, vs all devices together, and doesn't apply to the part of Edge's https behavior that we're talking about.

Ah, I read it differently, as the paragraph starts with a choice as well, so there are two "choices" here (showing whole paragraph):

"When permission is requested for a device, the User Agent may _choose_ to store that permission, if granted, for later use by the same origin, so that the user does not need to grant permission again at a later time. [RTCWEB-SECURITY-ARCH] Section 5.2 requires that such storing MUST only be done when the page is secure (served over HTTPS and having no mixed content). It is a User Agent _choice_ whether it _offers functionality to store_ permission to each device separately, all devices of a given class, or all devices; the _choice_ needs to be apparent to the user, and permission must have been granted for the entire set whose permission is being stored, e.g., to store permission to use all cameras the user must have given permission to use all cameras and not just one."

On rereading it, it probably refers to the latter, though how odd to require a detail like what parts to store to be apparent, when whether it is being stored is not?

Then there's the wording of the latter choice itself: "[UA] offers functionality to store", which could be read as offering storage as a user option only, making the whole thing inherently apparent.

There's additional non-normative support under Privacy and Security Considerations which would disqualify Edge's https same-realm behavior:

"In the case of persistent authorization via a stored permission, it is important that it is easy to find the list of granted permissions and revoke permissions that the user wishes to revoke.

Once permission has been granted, the User Agent should make two things readily apparent to the user:

  1. That the page has access to the devices for which permission is given
  2. Whether or not any of the devices are presently recording ("on air") indicator"

I checked, and Edge has no permission indicator or list after streams have stopped, while the page remains open, so this "temporary stored permission" would not qualify.

Then again, this is all non-normative, and would disqualify Chrome (persistence is not user apparent) and Firefox ("Always Allow" extends to all same-kind devices) as well.

If Edge wants to pursue this instead of going "full-Chrome" I would probably support it. :)

@jyasskin
Copy link
Member

jyasskin commented Sep 8, 2016

Thanks, I'll update that part of the permissions spec: w3c/permissions#127.

In general, if the media spec disqualifies all the existing implementations, that's probably a bug in the spec rather than the implementations. 😉 Revoking is, at least, easy with Edge's current behavior: you just reload.

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 9, 2016

@alvestrand, @ShijunS A side-effect of handling permission for overlapping gUM calls in mediacapture is that navigator.permission.query() would yield "prompt" rather than "granted" during active gUM use by default in Firefox and with http in Edge. Is this desirable?

If we go with @jyasskin 's approach it would instead return "granted" in those cases and revert to "prompt" once all tracks stop.

@ShijunS
Copy link
Contributor

ShijunS commented Sep 9, 2016

return "granted" in those cases and revert to "prompt" once all tracks stop

@jan-ivar. That seems acceptable, although it could be a bit more complicated if any implementation has to consider the one-time permission is per device rather than per device type.

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 9, 2016

Right. I think only the state of e.g. navigator.permission.query({name: "camera", deviceId: relevantDeviceId}) might actually change in such an implementation, while navigator.permission.query({name: "camera"}) could still return "prompt" (would it depend on the user having more than one camera?) but that seems like an artifact of the model if we accept it.

@alvestrand
Copy link
Contributor

This matches an earlier draft of mine which I didn't like because it forced interaction with the permission store at every close as well as at every open.

It's also interesting if this sequence can exist:

getUserMedia(device=x) => track A returned, temporary permission to device X
getUserMedia(device=x) => track B returned, permanent permission to device X
track A.stop()
permissions.check(device=x) => what?

The permission status resulting from this should be "granted". But information about that can't be stored in the UA's information about track A. So the procedure to close has to access global information about the device, and permissions.check(device=x) isn't helpful.

Add more tracks and revocations for more complicated cases.

I got lost in complexity when trying to describe that, so I backed out of it.
I'll be happy to review proposed language to go back down that route.

@stefhak
Copy link
Contributor

stefhak commented Sep 9, 2016

What if we removed the text saying that if device x is currently being used permission is "granted" from mediacapture-main? We talked about this being natural since the app can anyway just clone the track, but maybe it's overly complex to implement in some cases (is that what you're saying @alvestrand) ?

UAs could still act this way (i.e. directly grant access at gUM call) since it can be said that "UA has received new information about the user’s intent since last invocation" (referring to step 3 of permission state) since the source is already used by another track.

@jan-ivar
Copy link
Member Author

jan-ivar commented Sep 9, 2016

@alvestrand I'm missing the complication. It's not every close, but the close of the last track from a source. permissions.check(device=x) would be "granted" in your case. UAs can't store this info in the tracks, never could.

@alvestrand
Copy link
Contributor

Part of the permissions display ("Indicator states") cluster.

@alvestrand
Copy link
Contributor

This should be ready to be closed once #411 is merged.

@alvestrand
Copy link
Contributor

Solved by #421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants