Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the purpose of requiring a successful gUM call before enumerateDevices? #1019

Open
guidou opened this issue Oct 15, 2024 · 10 comments
Open
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response.

Comments

@guidou
Copy link
Contributor

guidou commented Oct 15, 2024

In the past, the requirement for enumerateDevices was to have the corresponding permissions granted.
Now a successful gUM call is required, which is a stronger requirement.

I think the old version was better, since that one allowed implementations to have the gUM requirement by making the permission nonpersistent.

Forcing the gUM requirement on implementations with persistent permissions makes persistent permissions largely useless, since one of the objectives of persistent permissions is to avoid constant prompts for frequent users of VC applications.

I agree that always prompting might have some privacy benefits, but this comes at the cost of making things more difficult or inconvenient for frequent users of VC applications. I think this is an area where it is better to let UAs decide the tradeoff that works better for their users.

The old version of the spec also makes it easier to achieve interoperability between UAs without breaking compatibility with existing applications.

@jan-ivar
Copy link
Member

jan-ivar commented Nov 19, 2024

I think there's some misunderstanding here.

Forcing the gUM requirement on implementations with persistent permissions makes persistent permissions largely useless, since one of the objectives of persistent permissions is to avoid constant prompts for frequent users of VC applications.

100% agree the objective of persistent permissions is to avoid constant prompts.

But users who have persisted permissions won't incur any prompt when a fresh gUM call is made.

So how does trimming enumerateDevices ahead of camera and microphone being turned on lead to "constant prompts"?

I agree that always prompting might have some privacy benefits, but this comes at the cost of making things more difficult or inconvenient for frequent users of VC applications.

100% agree.

I think this is an area where it is better to let UAs decide the tradeoff that works better for their users.

The spec allows UAs to implement persistent permissions, one-time permissions, even both.

The purpose of not returning detailed device information without a recent gUM call is twofold:

  1. prevent tracking users across websites
  2. ensure interop between permission models

A website that expects to list devices on pageload won't work in Firefox or Safari. This is not hypothetical. Sadly it seems it's not uncommon to test against a single browser.

Both Safari and Firefox let users grant one-time permissions to camera and microphone by default. Chrome is experimenting with them as well:

image

@guidou
Copy link
Contributor Author

guidou commented Nov 20, 2024

I think there's some misunderstanding here.

Certainly. The issue is that the requirement of gUM before enumerateDevices is incompatible with the way many Web applications have been written for a long time, which was to assume enumerateDevices results (among other things) are gated by permissions.
Chrome tried to deploy gUM-before-enumerateDevices and had to roll it back due to incompatibility with existing applications, hence this issue.

Forcing the gUM requirement on implementations with persistent permissions makes persistent permissions largely useless, since one of the objectives of persistent permissions is to avoid constant prompts for frequent users of VC applications.

100% agree the objective of persistent permissions is to avoid constant prompts.

But users who have persisted permissions won't incur any prompt when a fresh gUM call is made.

The claim refers to the fact that the only practical way to implement the gUM-before-eD in a way that is fully compatible with existing applications is to do it with non-persistent permissions (or at least, making the browser always reply 'prompt' to permission queries before the first gUM, even if gUM won't actually prompt).

So how does trimming enumerateDevices ahead of camera and microphone being turned on lead to "constant prompts"?

Because the practical way to implement this in a way that doesn't break existing applications is to use non-persistent permissions, which would prompt often, like Safari does.
An alternative would be to implement the UA such that it always replies 'prompt' before the first gUM, and not actually prompt if the permission was actually persisted. But even this hack would still break some existing applications that would unnecessarily show app-level explanatory dialogs asking the user to authorize camera and mic via a gUM call specific for this purpose.

I agree that always prompting might have some privacy benefits, but this comes at the cost of making things more difficult or inconvenient for frequent users of VC applications.

100% agree.

I think this is an area where it is better to let UAs decide the tradeoff that works better for their users.

The spec allows UAs to implement persistent permissions, one-time permissions, even both.

The concern is not what is possible or not, but compatibility with existing real-world applications.

The purpose of not returning detailed device information without a recent gUM call is twofold:

  1. prevent tracking users across websites

The benefit of gUM over permissions here is quite limited, as trackers will need to call gUM anyway to get permissions. For cross-origin tracking, each top-level origin needs to call gUM successfully at least once to get permissions and successfully participate in cross-origin tracking.

  1. ensure interop between permission models

gUM-before-eD does not give any interoperability assurances over permissions-before-eD.

A website that expects to list devices on pageload won't work in Firefox or Safari. This is not hypothetical. Sadly it seems it's not uncommon to test against a single browser.

It won't work on Chrome either (unless they have permissions, which requires gUM). The reason Safari has not seen any compatibility problems due to gUM-before-eD is that their permissions are non-persistent, so applications know they have to call gUM.

Both Safari and Firefox let users grant one-time permissions to camera and microphone by default. Even Chrome is experimenting with them as well:

The problem is not the various permission models browsers implement and modify over time.
The problem is that existing applications break with gUM-before-eD because they were written against the previous spec which stated permissions-before-eD.

@jan-ivar
Copy link
Member

jan-ivar commented Nov 20, 2024

TL;DR: please read my answer in web-platform-tests/interop#849 (comment).

Certainly. The issue is that the requirement of gUM before enumerateDevices is incompatible with the way many Web applications have been written for a long time, which was to assume enumerateDevices results (among other things) are gated by permissions.

This isn't accurate. AFAIK only labels were gated on permission in the old spec.

Chrome tried to deploy gUM-before-enumerateDevices and had to roll it back due to incompatibility with existing applications, hence this issue.

It might be time to try again.

Forcing the gUM requirement on implementations with persistent permissions makes persistent permissions largely useless, since one of the objectives of persistent permissions is to avoid constant prompts for frequent users of VC applications.

100% agree the objective of persistent permissions is to avoid constant prompts.
But users who have persisted permissions won't incur any prompt when a fresh gUM call is made.

The claim refers to the fact that the only practical way to implement the gUM-before-eD in a way that is fully compatible with existing applications is to do it with non-persistent permissions (or at least, making the browser always reply 'prompt' to permission queries before the first gUM, even if gUM won't actually prompt).

So how does trimming enumerateDevices ahead of camera and microphone being turned on lead to "constant prompts"?

Because the practical way to implement this in a way that doesn't break existing applications is to use non-persistent permissions, which would prompt often, like Safari does. An alternative would be to implement the UA such that it always replies 'prompt' before the first gUM, and not actually prompt if the permission was actually persisted. But even this hack would still break some existing applications that would unnecessarily show app-level explanatory dialogs asking the user to authorize camera and mic via a gUM call specific for this purpose.

What existing applications? There must be some misunderstanding as I don't follow this at all. Firefox supports persistent permission through the ✅ Remember this decision and I assure you we didn't need to do any of this, whatever this is.

The spec allows UAs to implement persistent permissions, one-time permissions, even both.

The concern is not what is possible or not, but compatibility with existing real-world applications.

The purpose of not returning detailed device information without a recent gUM call is twofold:

  1. prevent tracking users across websites

The benefit of gUM over permissions here is quite limited, as trackers will need to call gUM anyway to get permissions. For cross-origin tracking, each top-level origin needs to call gUM successfully at least once to get permissions and successfully participate in cross-origin tracking.

  1. ensure interop between permission models

gUM-before-eD does not give any interoperability assurances over permissions-before-eD.

Of course it does. It's the predominant video conferencing model that works in all browsers. Most users don't have multiple cameras facing them. So websites ask for the default camera (or remember the previously one) and add a ⚙️ settings page for making changes.

A website that expects to list devices on pageload won't work in Firefox or Safari. This is not hypothetical. Sadly it seems it's not uncommon to test against a single browser.

It won't work on Chrome either (unless they have permissions, which requires gUM).

A website can prime for permission just once in Chrome, and assume device ennumeration on pageload works forevermore.

The reason Safari has not seen any compatibility problems due to gUM-before-eD is that their permissions are non-persistent, so applications know they have to call gUM.

Why has Firefox not seen any compatibility issues?

Both Safari and Firefox let users grant one-time permissions to camera and microphone by default. Even Chrome is experimenting with them as well:

The problem is not the various permission models browsers implement and modify over time. The problem is that existing applications break with gUM-before-eD because they were written against the previous spec which stated permissions-before-eD.

I'd like to see these existing applications.

The previous spec never specified permissions-before-eD, it specified permissions-before-eD-with-labels. You didn't need permission for eD. So any compatibility issue tied to permission seems homemade.

@guidou
Copy link
Contributor Author

guidou commented Nov 21, 2024

TL;DR: please read my answer in web-platform-tests/interop#849 (comment).

Certainly. The issue is that the requirement of gUM before enumerateDevices is incompatible with the way many Web applications have been written for a long time, which was to assume enumerateDevices results (among other things) are gated by permissions.

This isn't accurate. AFAIK only labels were gated on permission in the old spec.

Yes, when I say enumerateDevices, I actually mean full enumerateDevices results containing labels, useful for an in-app device picker.
VC applications did nothing with the results without labels, so when we changed those to include empty device IDs and with no more than one device per class, no VC applications broke, as they did nothing with the pre-permissions eD results.

Chrome tried to deploy gUM-before-enumerateDevices and had to roll it back due to incompatibility with existing applications, hence this issue.

It might be time to try again.

Maybe it's time to change the spec instead.

What existing applications? There must be some misunderstanding as I don't follow this at all. Firefox supports persistent permission through the ✅ Remember this decision and I assure you we didn't need to do any of this, whatever this is.

In your own words, by 2023, this was Most video conferencing sites.

More specifically, you literally said in 2023:

Most video conferencing sites offer a smoother user experience to returning Chrome users than to returning users in other browsers, because they basically ignore past non-persisted permissions entirely.

You even specifically mentioned whereby.com and indicated this code pattern:

const perm = await navigator.permissions.query({name: "camera"});
if (perm.state == "prompt") {
  nagTheUserAboutEnablingPermission();
}

This matches our observations as well, so we are in full agreement here.

Chrome certainly does not want to break the smoother user experience persistent permissions provide.

gUM-before-eD does not give any interoperability assurances over permissions-before-eD.

Of course it does. It's the predominant video conferencing model that works in all browsers. Most users don't have multiple cameras facing them. So websites ask for the default camera (or remember the previously one) and add a ⚙️ settings page for making changes.

I meant conceptually. If Firefox switches to permissions-before-eD-with-labels it will be interoperable with Chrome and Safari. Safari does not need to change anything because in their case permissions-before-eD-with-labels is the same as gUM-before-eD-with-labels.

A website can prime for permission just once in Chrome, and assume device ennumeration on pageload works forevermore.

This is not accurate. The user can revoke permissions at any time, so permissions are not "forevermore" in Chrome.
Chrome is also experimenting with optional ephemeral permissions.
Therefore, applications cannot assume eD on pageload will always return full results just because they got permission in the past.
What applications normally do is something similar to the code snippet you presented, which consists of a permission check to decide if a gUM call to request permissions is necessary or not.

The reason Safari has not seen any compatibility problems due to gUM-before-eD is that their permissions are non-persistent, so applications know they have to call gUM.

Why has Firefox not seen any compatibility issues?

No idea. But I'm a bit confused.

Just yesterday you said:

  • Google Voice tripped over a related change to permissions.query in 132, for which an intervention is already in place, and they've promised a fix this week

  • Slack huddles broke before Firefox 132 shipped the measure. That website still requires firefox users to persist permission to microphone and then refresh the page_

In 2023 you said Most video conferencing sites offer a smoother user experience to returning Chrome users, which means a worse experience for Firefox users. Have you confirmed that most video conferencing sites now offer the smoother experience for Firefox users as well?

And you also said yesterday that there is a "huge interoperability issue"?

But today you say Firefox has not seen any compatibility issues. If there are no such issues, then what is the "huge" part?

Note also that any application written against the new spec will work fine with Chrome, as permissions-before-eD-labels is forwards-compatible with gUM-before-eD-labels. So Chrome is not preventing applications to move to the new spec, but it will not risk breaking compatibility to force such a move.

The problem is not the various permission models browsers implement and modify over time. The problem is that existing applications break with gUM-before-eD because they were written against the previous spec which stated permissions-before-eD.

I'd like to see these existing applications.

Last year you said they were ** Most video conferencing sites** and you mentioned whereby.com.
That matches our observations.

Yesterday you said Slack huddles requires some workaround, which I presume is not necessary with Chrome and Safari (both of which implement permissions-before-eD-with-labels).

The previous spec never specified permissions-before-eD, it specified permissions-before-eD-with-labels. You didn't need permission for eD. So any compatibility issue tied to permission seems homemade.

When I said permissions-before-eD I meant permissions-before-eD-with-labels, which is indeed what the old spec said, is exactly what Chrome implements, and is what most applications are written to. eD has always returned some result when called without permissions, with both the old and new specs.

I said permissions-before-eD just to make it shorter, since the pre-permissions results without labels were unused by VC applications but were useful for trackers in the old spec. The PING changes Chrome implemented kept the pre-permissions results useless for VC applications but also made them useless for trackers.

@youennf
Copy link
Contributor

youennf commented Nov 22, 2024

FWIW, I do not think we can go back and weaken the privacy story.
We can be creative in ways that keep the same level of privacy but mitigate breakage.
Interventions are also fine to make progress.
@guidou, have you considered these possibilities?

To make progress on mitigations, it would be good to qualify a few things:

  1. Which websites get broken
  2. How they are broken

Based on this, various ideas could be tested, for instance:

  1. Before a successful gum call, consider deviceId to be an ideal constraint even if marked as exact. This would mitigate the issue of web applications using exact deviceId constraints. And this would beef up our privacy story as well.
  2. Before a successful gum call, enumerateDevices could expose devices with deviceId equal to default instead of "". default given to getUserMedia would be treated as the empty string aka no deviceId constraint (like we might do for setSinkId).
  3. After a successful gum call, device labels can be exposed. Fire a devicechange event so that web applications update their picker/states.
  4. Before a successful gum call, enumerateDevices could expose devices with label equal to fixed values (say default or default camera...) instead of "". This might mitigate web applications not reacting to devicechange events.
  5. Add permission granted status to the mitigation heuristics.

These are only ideas though, the first thing is to understand the various breakages.

@guidou
Copy link
Contributor Author

guidou commented Nov 23, 2024

FWIW, I do not think we can go back and weaken the privacy story.

Gating eD labels to permissions does not weaken the privacy story IMO.
For example, Safari's model is already this one since permissions are ephemeral, so a UA can implement the same guarantees if it wishes to do so.

I would argue that forcing a gUM call can be detrimental to user privacy in some cases.

Consider the following case:

  1. A frequent user of a VC application has given persistent permissions to avoid being prompted frequently.
  2. This user starts the browser, and instructs the VC to join a meeting with the camera and mic off for privacy reasons. But wants the ability to turn them on in the middle of the meeting.
  3. The user has several devices (e.g., two or more microphones or cameras)
  4. In the middle of the meeting the user decides to participate in the meeting, but wants to select the right microphone and/or camera. This should occur without the UA prompting the user because the user gave persistent permissions for all the media devices, so device selection occurs via an in-app device picker.

permissions-before-eD supports this use case without problems.

gUM-before-eD cannot support this use case correctly. The application has to choose between a broken UI or violating the user's privacy to produce a proper UI.

This use case is common for VC applications and it should be possible for UAs to support it if they want to support it.

We can be creative in ways that keep the same level of privacy but mitigate breakage. Interventions are also fine to make progress. @guidou, have you considered these possibilities?

To make progress on mitigations, it would be good to qualify a few things:

  1. Which websites get broken
  2. How they are broken

Some examples:

  • For Zoom I think it is the default for joining an existing meeting, and it is a prominent feature to start a new meeting with video off. For returning users with persistent permissions, device pickers are broken.
  • Whereby before joining a meeting presents a dialog to choose camera and microphone. On Firefox an extra click is required for returning users with persistent permissions to make an initial gUM call so that the device pickers can be initialized. On Chrome the user goes directly to the UI with the correct device pickers.
  • On Dialpad meetings, you can configure it to start with camera/mic off. For returning users with persistent permissions it opens the camera/mic and closes it quickly in order to provide a good UI. In this case the application has to go against the user setting in order to respect the user's wishes. They choose the text carefully in the settings "Turn off camera while joining a meeting?" to inform the user that they actually open the camera. While working as intended in a strict sense, they should be able to do it without turning the camera off if that is the user's preference, but gUM-before-eD makes it impossible.
  • Zoho.com has behavior similar to Whereby. An extra click is required on Firefox for returning users for the gUM call, presumably to be able to get device information. I didn't see pickers, though. This application can also remember a setting to have the camera off by default, but, like Dialpad, opens and closes the camera and mic quickly.
  • According to this documentation, scheduled meetings on Microsoft Teams start with video off. I have not tried this feature, so not sure if it breaks, but it might.

Based on this, various ideas could be tested, for instance:

  1. Before a successful gum call, consider deviceId to be an ideal constraint even if marked as exact. This would mitigate the issue of web applications using exact deviceId constraints. And this would beef up our privacy story as well.
  2. Before a successful gum call, enumerateDevices could expose devices with deviceId equal to default instead of "". default given to getUserMedia would be treated as the empty string aka no deviceId constraint (like we might do for setSinkId).
  3. After a successful gum call, device labels can be exposed. Fire a devicechange event so that web applications update their picker/states.
  4. Before a successful gum call, enumerateDevices could expose devices with label equal to fixed values (say default or default camera...) instead of "". This might mitigate web applications not reacting to devicechange events.
  5. Add permission granted status to the mitigation heuristics.

These are only ideas though, the first thing is to understand the various breakages.

I think these ideas (and others) are worth exploring, hence why this issue has been filed.
It is clear that gUM-before-eD breaks existing applications and, for some use cases, forces applications to choose between proper UI and properly respecting user's privacy settings.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 3, 2024

FWIW, I do not think we can go back and weaken the privacy story.

I agree with this.

Gating eD labels to permissions does not weaken the privacy story IMO.

It weakens the privacy story: a non-conferencing website can obtain camera/mic access a single time for a seemingly benign reason like scanning a QR code or snapping a listing photo. Users who have granted persistent permission to such websites — the sole choice in Chrome release still — unwittingly allow those websites to delegate permission to third-parties to help fingerprint them. Having to call gUM deters this.

Also, it's not just labels anymore, but the number of cameras and microphones the user has as well.

For example, Safari's model is already this one since permissions are ephemeral, so a UA can implement the same guarantees if it wishes to do so.

Firefox shouldn't have to remove its persistent permission feature to provide the same guarantees as Safari. Spec agrees.

I would argue that forcing a gUM call can be detrimental to user privacy in some cases.

Consider the following case:

  1. A frequent user of a VC application has given persistent permissions to avoid being prompted frequently.

Use case error: we should solve for all users, not just those who have persisted permission.

  1. This user starts the browser, and instructs the VC to join a meeting with the camera and mic off for privacy reasons. But wants the ability to turn them on in the middle of the meeting.

  2. The user has several devices (e.g., two or more microphones or cameras)

  3. In the middle of the meeting the user decides to participate in the meeting, but wants to select the right microphone and/or camera. This should occur without the UA prompting the user because the user gave persistent permissions for all the media devices, so device selection occurs via an in-app device picker.

This is entirely solvable for all users, because a website can turn on camera or microphone ahead of transmission.

Some examples:

  • For Zoom I think it is the default for joining an existing meeting, and it is a prominent feature to start a new meeting with video off. For returning users with persistent permissions, device pickers are broken.

Said differently: For returning users, device pickers aren't populated. Why single out a pity-group? What you call "broken" is the behavior everyone else gets today.

We've discussed Zoom at length. Zoom is handling it, so calling it "breakage" seems a stretch. Preferential treatment is no longer given to a sub-group of users.

This can be a good thing, challenging websites to solve this without discriminating or weakening privacy.

Here's a demo page employing multiple strategies for some ideas (it caches devices and calls gUM on selection).

  • Whereby before joining a meeting presents a dialog to choose camera and microphone. On Firefox an extra click is required for returning users with persistent permissions to make an initial gUM call so that the device pickers can be initialized. On Chrome the user goes directly to the UI with the correct device pickers.

Invalid, as explained in web-platform-tests/interop#849 (comment). Whereby does not enumerate ahead of gUM.

  • On Dialpad meetings, you can configure it to start with camera/mic off. For returning users with persistent permissions it opens the camera/mic and closes it quickly in order to provide a good UI. In this case the application has to go against the user setting in order to respect the user's wishes. They choose the text carefully in the settings "Turn off camera while joining a meeting?" to inform the user that they actually open the camera. While working as intended in a strict sense, they should be able to do it without turning the camera off if that is the user's preference, but gUM-before-eD makes it impossible.

From this description it sounds unaffected? I can try it tomorrow.

  • Zoho.com has behavior similar to Whereby. An extra click is required on Firefox for returning users for the gUM call, presumably to be able to get device information. I didn't see pickers, though. This application can also remember a setting to have the camera off by default, but, like Dialpad, opens and closes the camera and mic quickly.

If it's similar to Whereby then it's invalid here.

  • According to this documentation, scheduled meetings on Microsoft Teams start with video off. I have not tried this feature, so not sure if it breaks, but it might.

I use MS teams in Firefox regularly. No breakage observed. Though I confess, like probably 99% of users I have just a single front-facing camera.

@fippo
Copy link
Contributor

fippo commented Dec 4, 2024

Invalid, as explained in web-platform-tests/interop#849 (comment).

Your statement is in conflict with custom logging added to Chromium, similar to the one below.

I use MS teams in Firefox regularly

Well, here is what Teams does in the modified Chromium:

[6147:1:1204/073335.016084:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073337.533984:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073337.599033:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073337.617704:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073338.145701:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073338.372615:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073338.468623:ERROR:media_devices.cc(420)] ENUMERATE
[6147:1:1204/073338.902420:ERROR:user_media_request.cc(393)] GUM

I have asked folks to take a look because polling is not great which might yield a statement about why they do ED-before-GUM too.

Said differently: For returning users, device pickers aren't populated. Why single out a pity-group? What you call "broken" is the behavior everyone else gets today.

That is a question for a product manager (in this case at Zoom), they are unlikely to respond in this venue.

@jan-ivar
Copy link
Member

jan-ivar commented Dec 4, 2024

There are valid reasons to call eD before gUM: to detect users without a cameras or microphone. So logging is meaningless.

The relevant question is: which websites present users with pickers ahead of gUM? Zoom does, Whereby does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response.
Projects
None yet
Development

No branches or pull requests

6 participants