Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3916: Authentication for media #3916

Merged
merged 25 commits into from
Jun 10, 2024
Merged
Changes from 17 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
7606e53
MSC3916: Authentication for media
richvdh Oct 23, 2022
3076de0
minor edits
richvdh Oct 23, 2022
55303b5
fix some links
richvdh Mar 10, 2023
d601637
Prevent further spread of unauthenticated media
turt2live Apr 5, 2024
c2ae25e
Address review feedback
turt2live Apr 22, 2024
8351ebe
Drop federation thumbnails
turt2live Apr 22, 2024
106ce55
Add comparisons
turt2live Apr 23, 2024
a14c4af
Clarify that query string auth is forbidden
turt2live May 14, 2024
92eba2c
Document `/create` changing namespace too
turt2live May 21, 2024
a76d97f
Clarify what is happening to `/upload`
turt2live May 21, 2024
8bb5159
Minor wording clarifications, primarily around using HTTPS for idp icons
turt2live May 21, 2024
e1e8a6a
Drop `serverName` in new federation download endpoint
turt2live May 21, 2024
71b8db4
Move `/create` to unmodified
turt2live May 28, 2024
1c864a3
Mention cookies
turt2live May 28, 2024
e5c9316
Forgot a mention of `/create`
turt2live May 28, 2024
41d2aa2
Add `Location` header support to federation `/download`
turt2live May 28, 2024
d73025b
Update proposals/3916-authentication-for-media.md
turt2live May 29, 2024
656dfb8
Update proposals/3916-authentication-for-media.md
turt2live May 29, 2024
87c08e0
Move allow_redirect behaviour fully into dedicated point
turt2live May 29, 2024
1fe8d71
Clarify backwards compatibilty/freezing
turt2live May 29, 2024
aac1909
Clarify that access token auth is permitted, but not recommended
turt2live May 29, 2024
0b4f2c9
Clarify header behaviour around Location
turt2live May 29, 2024
f1777c2
Fix social sign-on icons
turt2live May 29, 2024
4fe23e5
Update proposals/3916-authentication-for-media.md
turt2live May 30, 2024
db90b92
Reintroduce federation /thumnail
turt2live Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
367 changes: 367 additions & 0 deletions proposals/3916-authentication-for-media.md
turt2live marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewers: Please compare this MSC against #2461 , which is proposed for rejection/closure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turt2live I think we can resolve this since #2461 is closed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still useful to compare the MSCs, given the ideas and concepts of 2461 are meant to be incorporated here.

Original file line number Diff line number Diff line change
@@ -0,0 +1,367 @@
# MSC3916: Authentication for media access, and new endpoint names

Currently, access to media in Matrix has a number of problems including the following:

* The only protection for media is the obscurity of the URL, and URLs are
easily leaked (eg accidental sharing, access
logs). [synapse#2150](https://github.com/matrix-org/synapse/issues/2150)
* Anybody (including non-matrix users) can cause a homeserver to copy media
into its local
store. [synapse#2133](https://github.com/matrix-org/synapse/issues/2133)
* When a media event is redacted, the media it used remains visible to all.
[synapse#1263](https://github.com/matrix-org/synapse/issues/1263)
* There is currently no way to delete
media. [matrix-spec#226](https://github.com/matrix-org/matrix-spec/issues/226)
* If a user requests GDPR erasure, their media remains visible to all.
* When all users leave a room, their media is not deleted from the server.

These problems are all difficult to address currently, because access to media
is entirely unauthenticated. The first step for a solution is to require user
authentication. Once that is done, it will be possible to impose authorization
requirements to address the problems mentioned above. (See, for example,
[MSC3911](https://github.com/matrix-org/matrix-spec-proposals/pull/3911) which
builds on top of this MSC.)

This proposal supersedes [MSC1902](https://github.com/matrix-org/matrix-spec-proposals/pull/1902).

## Proposal

1. New endpoints

The existing `/_matrix/media/v3/` endpoints become deprecated, and new
endpoints under the `/_matrix/client` and `/_matrix/federation`
hierarchies are introduced. Removal of the deprecated endpoints would be a
later MSC under [the deprecation policy](https://spec.matrix.org/v1.10/#deprecation-policy).

The following table below shows a mapping between deprecated and new endpoint:

| Deprecated | Client-Server | Federation |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------------------------- |
| [`GET /_matrix/media/v3/preview_url`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3preview_url) | `GET /_matrix/client/v1/media/preview_url` | - |
| [`GET /_matrix/media/v3/config`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3config) | `GET /_matrix/client/v1/media/config` | - |
| [`GET /_matrix/media/v3/download/{serverName}/{mediaId}`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3downloadservernamemediaid) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}` | `GET /_matrix/federation/v1/media/download/{mediaId}` |
| [`GET /_matrix/media/v3/download/{serverName}/{mediaId}/{fileName}`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3downloadservernamemediaidfilename) | `GET /_matrix/client/v1/media/download/{serverName}/{mediaId}/{fileName}` | - |
| [`GET /_matrix/media/v3/thumbnail/{serverName}/{mediaId}`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3thumbnailservernamemediaid) | `GET /_matrix/client/v1/media/thumbnail/{serverName}/{mediaId}` | - |
turt2live marked this conversation as resolved.
Show resolved Hide resolved

**Note**: [`POST /_matrix/media/v3/upload`](https://spec.matrix.org/v1.6/client-server-api/#post_matrixmediav3upload)
and [`GET /_matrix/media/v1/create`](https://spec.matrix.org/v1.10/client-server-api/#post_matrixmediav1create)
turt2live marked this conversation as resolved.
Show resolved Hide resolved
are **not** modified or deprecated by this MSC: it is intended that they be brought into line with the other
endpoints by a future MSC, such as [MSC3911](https://github.com/matrix-org/matrix-spec-proposals/pull/3911).

**Note**: `/thumbnail` does not have a federation endpoint. It appears as though
no servers request thumbnails over federation, and so it is not supported here.
A later MSC may introduce such an endpoint.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

The new `/download` and `/thumbnail` endpoints additionally drop the `?allow_redirect`
query parameters. Instead, the endpoints behave as though `allow_redirect=true` was
set, regardless of actual value. See [this comment on MSC3860](https://github.com/matrix-org/matrix-spec-proposals/pull/3860/files#r1005176480)
for details.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

After this proposal is released in a stable version of the specification, servers
which support the new `download` and `thumbnail` endpoints SHOULD cease to serve
newly uploaded media from the unauthenticated versions. This includes media
turt2live marked this conversation as resolved.
Show resolved Hide resolved
uploaded by local users and requests for not-yet-cached remote media. This is
done with a 404 `M_NOT_FOUND` error, as though the media doesn't exist. Servers
SHOULD consider their local ecosystem impact before freezing the endpoints. For
example, ensuring that common bridges and clients will continue to work, and
encouraging updates to those affected projects as needed.

2. Removal of `allow_remote` parameter from `/download`

The current
[`/download`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3downloadservernamemediaid)
and
[`/thumbnail`](https://spec.matrix.org/v1.6/client-server-api/#get_matrixmediav3thumbnailservernamemediaid)
endpoints take an `allow_remote` query parameter, indicating whether the
server should request remote media from other servers. This is redundant
with the new endpoints, so will not be supported.

Servers MUST NOT return remote media from `GET /_matrix/federation/v1/media/download`. The
`serverName` is omitted from the endpoint's path to strongly enforce this - the `mediaId` in
a request is assumed to be scoped to the target server.

`/_matrix/client/v1/media/download` and
`/_matrix/client/v1/media/thumbnail` return remote media as normal.

3. Authentication on all endpoints

Currently, the `/download` and `/thumbnail` endpoints have no authentication
requirements. Under this proposal, the new endpoints will be authenticated
the same way as other endpoints: they will require an `Authorization` header
which must be `Bearer {accessToken}` for `/_matrix/client`, or the signature
turt2live marked this conversation as resolved.
Show resolved Hide resolved
for `/_matrix/federation`.
turt2live marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved

The query string `?access_token` approach is not supported on the new endpoints,
as it is [deprecated](https://github.com/matrix-org/matrix-spec-proposals/blob/main/proposals/4126-deprecate-query-string-auth.md)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier for a reader to go from the PR to the doc than the other way around, so could we do:

Suggested change
as it is [deprecated](https://github.com/matrix-org/matrix-spec-proposals/blob/main/proposals/4126-deprecate-query-string-auth.md)
as it is [deprecated](https://github.com/matrix-org/matrix-spec-proposals/pull/4126)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR doesn't show any changes to accepted MSCs, unfortunately. Ideally this would be a link to the spec, but unstable has the same longevity problems.

and [pending removal](https://github.com/matrix-org/matrix-spec-proposals/pull/4127).
See those MSCs for details.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

**Note**: This fixes [matrix-spec#313](https://github.com/matrix-org/matrix-spec/issues/313).

4. Updated response format

* For the new `/_matrix/client` endpoints, the response format is the same as
the corresponding original endpoints.

* To enable future expansion, for the new `/_matrix/federation` endpoints,
the response is
[`multipart/mixed`](https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html)
turt2live marked this conversation as resolved.
Show resolved Hide resolved
content with exactly two parts: the first MUST be a JSON object (and should have a
`Content-type: application/json` header), and the second MUST be the media item
as per the original endpoints.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

No properties are yet specified for the JSON object to be returned. One
possible use is described by [MSC3911](https://github.com/matrix-org/matrix-spec-proposals/pull/3911).

An example response:

```
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p

--gc0p4Jq0M2Yt08jU534c0p
Content-Type: application/json

{}

--gc0p4Jq0M2Yt08jU534c0p
Content-Type: text/plain

This media is plain text. Maybe somebody used it as a paste bin.

--gc0p4Jq0M2Yt08jU534c0p
```
turt2live marked this conversation as resolved.
Show resolved Hide resolved

The second part (media item bytes) MAY include a [`Location` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Location)
to point to the raw media object instead of having bytes itself. Servers
SHOULD NOT cache the `Location` header's value as the responding server may
have applied time limits on its validity. Servers which don't immediately
download the media from the provided URL should re-request the media and
metadata from the `/download` endpoint when ready for the media bytes.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

The `Location` header's URL does *not* require authentication, as it will
typically be served by a CDN or other non-matrix server (thus being unable
to verify any `X-Matrix` signatures, for example).

An example response with a `Location` redirect would be:

```
Content-Type: multipart/mixed; boundary=gc0p4Jq0M2Yt08jU534c0p

--gc0p4Jq0M2Yt08jU534c0p
Content-Type: application/json

{}

--gc0p4Jq0M2Yt08jU534c0p
Content-Type: text/plain
turt2live marked this conversation as resolved.
Show resolved Hide resolved
Location: https://cdn.example.org/ab/c1/2345.txt
tulir marked this conversation as resolved.
Show resolved Hide resolved

--gc0p4Jq0M2Yt08jU534c0p
```

If the server were to `curl https://cdn.example.org/ab/c1/2345.txt`, it'd
get:

```
This media is plain text. Maybe somebody used it as a paste bin.
```

5. Backwards compatibility mechanisms

a. Backwards compatibility with older servers: if a client or requesting server
receives a 404 error with `M_UNRECOGNIZED` error code in response to a request
using the new endpoints, they may retry the request using the deprecated
endpoint. Servers and clients should note the [`M_UNRECOGNIZED`](https://spec.matrix.org/v1.10/client-server-api/#common-error-codes)
error code semantics.

b. Backwards compatibility with older clients and federating servers: mentioned
in Part 1 of this proposal, servers *may* freeze unauthenticated media access
once stable authenticated endpoints are available. This may lead to client and
server errors for new media. Both clients and servers are strongly encouraged
to update as soon as possible, before servers freeze unauthenticated media
access.

6. Removal of `allow_redirect` parameter from `/download` and `/thumbnail`

Clients MUST expect a 307 or 308 redirect when calling the new `/download`
and `/thumbnail` Client-Server API endpoints.

Servers MUST expect the `Location` header in the media part of the new Server-Server
API `/download` endpoint. Servers MUST NOT respond with a 307 or 308 redirect at
the top level for the endpoint - they can only redirect within the media part
itself.

### Effects on client applications

Naturally, implementations will be required to provide `Authorization` headers
when accessing the new endpoints. This will be simple in some cases, but rather
more involved in others. This section considers some of those cases.

#### IRC/XMPP bridges

Possibly the largest impact will be on IRC and XMPP bridges. Since IRC and
XMPP have no media repository of their own, these bridges currently transform
`mxc:` URIs into `https://<server>/_matrix/media/v3/download/` URIs and forward
those links to the remote platform. This will no longer be a viable option.

One potential solution is for the bridges to provide a proxy.

In this scenario, the bridge would have a secret HMAC key. When it
receives a matrix event referencing a piece of media, it should create a new URI
referencing the media, include an HMAC to prevent tampering. For example:

```
https://<bridge_server>/media/{originServerName}/{mediaId}?mac={hmac}
```

When the bridge later receives a request to that URI, it checks the hmac,
and proxies the request to the homeserver, using its AS access
token in the `Authorization` header.

The bridge might also choose to embed information such as the room that
referenced the media, and the time that the link was generated, in the URL.
Such mechanisms would allow the bridge to impose controls such as:

* Limiting the time a media link is valid for. Doing so would help prevent
visibility to users who weren't participating in the chat.

* Rate-limiting the amount of media being shared in a particular room (in other
words, avoiding the use of Matrix as a Warez distribution system).

#### Icons for "social login" flows

When a server supports multiple login providers, it provides the client with
icons for the login providers as `mxc:` media URIs. These must be accessible
without authentication (because the client has no access token at the time the
icons are displayed).

This remains a somewhat unsolved problem. Possibly the clients can continue
turt2live marked this conversation as resolved.
Show resolved Hide resolved
to call the legacy `/_matrix/media/v3/download` URI for now: ultimately this
problem will be solved by the transition to OIDC. Alternatively, a dedicated
API could be provided or permission to use HTTP(S) URLs to access the icons.
This support would come from a different MSC.

(This was previously discussed in
[MSC2858](https://github.com/matrix-org/matrix-spec-proposals/pull/2858#discussion_r543513811).)

## Potential issues
turt2live marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved
turt2live marked this conversation as resolved.
Show resolved Hide resolved

* Setting the `Authorization` header is particularly annoying for web clients.
Service workers are seemingly the best option, though other options include
locally-cached `blob:` URIs. Clients should note that caching media can lead
to significant memory usage, particularly for large media. Service workers by
comparison allow for proxy-like behaviour.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

Cookies are a plausible mechanism for sharing session information between
requests without having to set headers, though would be a relatively bespoke
authentication method for Matrix. Additionally, many Matrix users have cookies
disabled due to the advertising and tracking use cases common across the web.
Comment on lines +289 to +290
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really true? What data do you have to back this up? It seems like a fairly elegant solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not hard data, but for the last 6 years it's felt like every third user for Element Web has disabled these things and more :p

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, the latter seems spurious since cookies and localstorage will almost always be enabled/disabled as a pair, and Element has been relying on localstorage for years. The former is true in that setting a cookie for the homeserver would be a little weird. In my mind, the pros / cons are:

  • Cookie would make all media requests magically work on that browser session, even if the user opened something (non-encrypted) in a tab manually.
  • The above could cause them to think the link will work if sent to someone else, which it won't.
  • It would be a slightly odd way of using cookies since this is a federated world and the HS is not necessarily the same endpoint as your client.
  • It would cause every other C/S API request to also get the cookie sent, confusing auth.
  • Simpler if all C/S API auth works the same way.

I'm not going to make this a concern as I don't think it's blocking, but it might be worth a quick sanity check to make sure we really are committing to the right thing.


* Users will be unable to copy links to media from web clients to share out of
band. This is considered a feature, not a bug.

* Over federation, the use of the `Range` request header on `/download` becomes
unclear as it could affect either or both parts of the response. There does not
appear to be formal guidance in [RFC 9110](https://www.rfc-editor.org/rfc/rfc9110#field.range)
either. There are arguments for affecting both and either part equally. Typically,
such a header would be used to resume failed downloads, though servers are
already likely to discard received data and fail the associated client requests
when the federation request fails. Therefore, servers are unlikely to use `Range`
at all. As such, this proposal does not make a determination on how `Range`
should be handled, and leaves it as an HTTP specification interpretation problem
instead.

* The `Location` header support on the new `/download` endpoint could add a bit
of complexity to servers, though given the alternative of supporting CDNs and
similar is to place complexity into "edge workers" to mutate the response value.
Though the Matrix spec would be "simpler", the edge worker setup would be
fragmented where we have an opportunity for a common standard.

turt2live marked this conversation as resolved.
Show resolved Hide resolved
## Alternatives
turt2live marked this conversation as resolved.
Show resolved Hide resolved

* Allow clients to upload media which does not require authentication (for
turt2live marked this conversation as resolved.
Show resolved Hide resolved
example via a `public=true` query parameter). This might be particularly
useful for IRC/XMPP bridges, which could upload any media they encounter to
the homeserver's repository.

The danger with this is that is that there's little stopping clients
continuing to upload media as "public", negating all of the benefits in this
MSC. It might be ok if media upload it was restricted to certain privileged
users, or applied after the fact by a server administrator.

* We could simply require that `Authorization` headers be given when calling
the existing endpoints. However, doing so would make it harder to evaluate
the proportion of clients which have been updated, and it is a good
opportunity to bring these endpoints into line with the rest of the
client-server and federation APIs.

* There's no real need to rename `GET /_matrix/media/v3/preview_url` and `GET
/_matrix/media/v3/config` at present, and we could just leave them in
place. However, changing them at the same time makes the API more consistent.

Conversely, we should make sure to rename `POST
/_matrix/media/v3/upload` and `GET /_matrix/media/v3/create`. The reason to
delay doing so is because MSC3911 will make more substantial changes to these
endpoints, requiring another rename, and it is expected that both proposals
will be mergeed near to the same time as each other (so a double rename will
turt2live marked this conversation as resolved.
Show resolved Hide resolved
be confusing and unnecessary). However, if MSC3911 is delayed or rejected, we
should reconsider this.
Comment on lines +339 to +340
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still planning to fast-follow with MSC3911, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, yes. It won't be in the same spec release though.


* Rather than messing with multipart content, have a separate endpoint for
servers to get the metadata for a media item. That would mean two requests,
but might make more sense than `/download` providing the info directly.
turt2live marked this conversation as resolved.
Show resolved Hide resolved

This is a plausible approach with no significant upsides or downsides when
compared to multipart responses.

Similarly, custom headers could be used to carry the metadata on the response,
though again, there are no significant upsides or downsides to doing so.

Readers may wish to refer to [this thread](https://github.com/matrix-org/matrix-spec-proposals/pull/3916/files#r1586878787)
on the MSC which covers the majority of the pros and cons for all 3 approaches.

### Compared to MSC3796 (MSC701)

[MSC701/3796](https://github.com/matrix-org/matrix-spec-proposals/issues/3796)
introduces a concept of "content tokens" which have authentication tie-in to
prevent anonymous users from accessing media. This is a similar problem space
to this proposal, though deals more in the event-to-media linking space instead.
Although the MSC is an early sketch, it's unclear if the problems noted on the
MSC itself are feasibly resolvable.

### Compared to MSC2461

[MSC2461](https://github.com/matrix-org/matrix-spec-proposals/pull/2461) adds
authentication to the existing media endpoints, which as noted here in the
Alternatives is not likely to roll out quickly and leaves an inconsistency in
the spec. MSC2461 also introduces a client-visible flag for which kinds of media
may require authentication, making it similar to the alternative listed above
where on the federation side we could have two endpoints: one for information
and one for the media itself. MSC2461 simply makes the information client-visible
instead of server-visible.

## Unstable prefix

While this proposal is in development, the new endpoints should be named as follows:

* `GET /_matrix/client/unstable/org.matrix.msc3916/media/preview_url`
* `GET /_matrix/client/unstable/org.matrix.msc3916/media/config`
* `GET /_matrix/client/unstable/org.matrix.msc3916/media/download/{serverName}/{mediaId}`
* `GET /_matrix/client/unstable/org.matrix.msc3916/media/download/{serverName}/{mediaId}/{fileName}`
* `GET /_matrix/client/unstable/org.matrix.msc3916/media/thumbnail/{serverName}/{mediaId}`
* `GET /_matrix/federation/unstable/org.matrix.msc3916.v2/media/download/{mediaId}`
* **Note**: This endpoint has a `.v2` in its unstable identifier due to the MSC changing after
initial implementation. The original unstable endpoint has a `serverName` and may still be
supported by some servers: `GET /_matrix/federation/unstable/org.matrix.msc3916/media/download/{serverName}/{mediaId}`

The `serverName` was later dropped in favour of explicit scoping. See `allow_remote` details
in the MSC body for details.

In a prior version of this proposal, the federation API included a thumbnail endpoint.
It was removed due to lack of perceived usage. Servers which implemented the unstable
version will have done so under `GET /_matrix/federation/unstable/org.matrix.msc3916/media/thumbnail/{serverName}/{mediaId}`.
The client-server thumbnail endpoint is unaffected by this change.

turt2live marked this conversation as resolved.
Show resolved Hide resolved
## Dependencies

None.