-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC1597: Better spec for matrix identifiers #1597
base: old_master
Are you sure you want to change the base?
Changes from all commits
00f8470
180ba14
900be84
3dc6302
cff4275
2abfb1f
d6b704a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,271 @@ | ||||||
# Grammars for identifiers in the Matrix protocol | ||||||
|
||||||
## Background | ||||||
|
||||||
Matrix uses client- or server-generated identifiers in a number of | ||||||
places. Historically the grammars for these have been underspecified, which | ||||||
leads to confusion about what is or is not a valid identifier with the | ||||||
possibility of incompatability between implementations. | ||||||
|
||||||
This proposal presents tightly-specified grammars for a number of | ||||||
identifiers. | ||||||
|
||||||
## Common Identifiers | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/appendices.html#common-identifier-format) | ||||||
|
||||||
Proposal: | ||||||
|
||||||
> `localpart` may not include `:`. When parsing a Common Identifier, it should | ||||||
> be split at the *leftmost* `:`. | ||||||
|
||||||
Rationale: server names may contain multiple `:`s (think IPv6 literals), so the | ||||||
first colon is the only sane place to split them. This is a Known Thing, but I | ||||||
don't think we spell it out anywhere in the spec. | ||||||
|
||||||
## User IDs | ||||||
|
||||||
User IDs are | ||||||
[well-specified](https://matrix.org/docs/spec/appendices.html#user-identifiers), | ||||||
however we should consider dropping `/` from the list of allowed characters, | ||||||
because HTTP proxies might rewrite | ||||||
`/_matrix/client/r0/profile/@foo%25bar:matrix.org/displayname` to | ||||||
`/_matrix/client/r0/profile/@foo/bar:matrix.org/displayname`, messing things | ||||||
up. | ||||||
|
||||||
History: `/` was introduced with the intention of acting as a hierarchical | ||||||
namespacing character, particularly with consideration to the gitter protocol | ||||||
which uses it as a hierarchical separator. However, this was not as effective | ||||||
as hoped because `@foo/bar:example.com` looks like the ID is partitioned into | ||||||
`@foo` and `bar:example.com`. | ||||||
|
||||||
Proposal: | ||||||
|
||||||
> Remove `/` from the list of allowed characters in User IDs. | ||||||
|
||||||
`/` will of course be maintained under the grammar of "historical user | ||||||
IDs". Sorting out that mess is a longer-term project. | ||||||
|
||||||
## Room IDs and Event IDs | ||||||
|
||||||
[Issue](https://github.com/matrix-org/matrix-doc/issues/667) | ||||||
[Spec](https://matrix.org/docs/spec/appendices.html#room-ids-and-event-ids) | ||||||
|
||||||
These currently have similar formats, though it is likely that event ids will | ||||||
be replaced with something else due to | ||||||
[#1127](https://github.com/matrix-org/matrix-doc/issues/1127). | ||||||
|
||||||
Currently they are both specified as ``?opaque_id:domain``, without clues as to | ||||||
what the opaque_id should be. | ||||||
|
||||||
Synapse uses: `[A-Za-z]{18}`. | ||||||
[Dendrite](https://github.com/matrix-org/dendrite/blob/b71d922/src/github.com/matrix-org/dendrite/clientapi/routing/createroom.go#L125) | ||||||
uses (I think) `[A-Za-z0-9]{16}` via | ||||||
[json.go](https://github.com/matrix-org/util/blob/master/json.go#L185). However, | ||||||
some server implementations/forks are known to generate event IDs (and possibly | ||||||
room IDs) using a wide alphabet, which means that there exist rooms that | ||||||
include unusual event IDs. | ||||||
|
||||||
Proposal: | ||||||
|
||||||
> The opaque_id part must not be empty, and must consist entirely of the | ||||||
> characters `[0-9a-zA-Z._~-]`. | ||||||
> | ||||||
> The total length (including sigil and domain) must not exceed 255 characters. | ||||||
> | ||||||
> This is only enforced for v2 rooms - servers and clients wishing to support | ||||||
> v1 rooms should be more tolerant. | ||||||
|
||||||
|
||||||
## Key IDs (for federation, e2e, and identity servers) | ||||||
|
||||||
These are always of the form `<algorithm>:<tok>`. | ||||||
|
||||||
Valid algorithms are defined at | ||||||
https://matrix.org/docs/spec/client_server/unstable.html#key-algorithms, though | ||||||
we should define the alphabet for future algorithms. | ||||||
|
||||||
Proposal: | ||||||
|
||||||
> Future algorithm identifiers will be assigned from the alphabet `[a-z0-9_.]` | ||||||
> and will be at most 31 characters in length. | ||||||
|
||||||
For federation keys, | ||||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/config/key.py#L159) | ||||||
generates key ids as `ed25519:a_[A-Za-z]{4}`, though an HS admin can configure | ||||||
them manually to be anything without whitespace. | ||||||
|
||||||
Key IDs end up in an Authorization header which looks like `X-Matrix | ||||||
origin=origin.example.com,key="keyId",sig="ABCDEF..."`. The Synapse | ||||||
implementation splits on `,` and `=` without regard to quoting so this | ||||||
currently precludes the use of `,` or `=` in a key ID. | ||||||
|
||||||
For e2e, device keys have a `tok` corresponding to the device id, whilst | ||||||
one-time keys are generated by libolm, which uses a base64-encoded 32-bit int, ie | ||||||
`[A-Za-z0-9+/]{6}`. | ||||||
|
||||||
A key ID needs to be unique over the lifetime of the server (for federation) or | ||||||
the device (for e2e). However, they are used fairly widely, so making them long | ||||||
is unattractive as they could significantly increase the amount of data being | ||||||
transmitted. Let's limit the 'tok' part of the key to 31 characters too. | ||||||
|
||||||
Proposal: | ||||||
|
||||||
> Key IDs use the following BNF grammar: | ||||||
> | ||||||
> ``` | ||||||
> key_id = algorithm ":" tok | ||||||
> | ||||||
> algorithm = 1*31 alg_chars | ||||||
> | ||||||
> tok = 1*31 tok_chars | ||||||
> | ||||||
> alg_chars = %x61-7a / %30-39 / "_" / "." | ||||||
> ; a-z 0-9 _ . | ||||||
> | ||||||
> tok_chars = ALPHA / DIGIT / "." / "~" / "_" / "-" | ||||||
> ; A-Z a-z 0-9 . ~ _ - | ||||||
> ``` | ||||||
> | ||||||
|
||||||
Note that enforcing this grammar will mean: | ||||||
|
||||||
* Making libolm not put + and / characters in key IDs (easy enough, but there | ||||||
will be a bunch of malformed unique keys out there in the wild. Possibly they | ||||||
would just get thrown away. Servers may need to continue to tolerate `+` and | ||||||
`/` in e2e keys for a while.) | ||||||
|
||||||
## Opaque IDs | ||||||
|
||||||
[Issue](https://github.com/matrix-org/matrix-doc/issues/666) | ||||||
|
||||||
This is a class of identifier types where nobody is really meant to parse any | ||||||
part of the ID - they are just unique identifiers (with varying scopes of | ||||||
uniqueness). See below for discussion on what is currently in use. | ||||||
|
||||||
I propose to specify the almost the same grammar for all of these, for | ||||||
simplicity and consistency. | ||||||
|
||||||
Proposal: | ||||||
|
||||||
> Opaque IDs must be strings consisting entirely of the characters | ||||||
> `[0-9a-zA-Z._~-]`. Their length must not exceed 255 characters and they must | ||||||
> not be empty. | ||||||
|
||||||
For almost all of the current implementations I have looked at (listed below), | ||||||
the grammar above is a superset of the generated identifiers, and a subset of | ||||||
the understood identifiers. There should therefore be no | ||||||
backwards-compatibility problems with its introduction. | ||||||
|
||||||
The exception is transaction IDs generated by some clients. I think that we'll | ||||||
just have to fix those clients and accept that old versions may not work with | ||||||
future servers. | ||||||
|
||||||
### Call IDs | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#m-call-invite) | ||||||
|
||||||
These are only used within the body of `m.call.*` events, as far as I am | ||||||
aware. They should be unique within the lifetime of a room. (Some | ||||||
implementations currently treat them as globally unique, but that is considered | ||||||
an implementation bug.) | ||||||
|
||||||
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/4d310cd4618db4e98a8e6b5eb812480102ee4dee/src/webrtc/call.js#L72) uses `c[0-9.]{32}`. | ||||||
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/5c6f785e53632e7b6fb3f3859a90c3d85b040e7f/matrix-sdk/src/main/java/org/matrix/androidsdk/call/MXWebRtcCall.java#L221) uses `c[0-9]{13}`. | ||||||
|
||||||
Additional proposal: | ||||||
|
||||||
> Call IDs should be long enough to make clashes unlikely. | ||||||
Comment on lines
+164
to
+178
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. something similar has now been specced at https://spec.matrix.org/v1.7/client-server-api/#grammar-for-voip-ids. |
||||||
|
||||||
### Media IDs | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cross-linking matrix-org/matrix-spec#503 (comment) for updated state of the world. |
||||||
|
||||||
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#id67) | ||||||
|
||||||
These are generated by the server on upload, and then embedded in `mxc://` URIs | ||||||
and used in the C-S API and the S-S API. | ||||||
|
||||||
They must be URI-safe to be sensibly embedded in `mxc://` URIs. | ||||||
|
||||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/media_repository.py#L153) | ||||||
uses `[A-Za-z]{24}`, though it also uses `[0-9A-Za-z_-]{27}` for | ||||||
[URL | ||||||
previews](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/rest/media/v1/preview_url_resource.py#L285). | ||||||
|
||||||
[matrix-media-repo](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/controllers/upload_controller/upload_controller.go#L50) | ||||||
uses `[A-Za-z0-9]{32}`, via [random.go](https://github.com/turt2live/matrix-media-repo/blob/539f25ee75ba6cdbb0410314b29978f4b8b1d7fe/src/github.com/turt2live/matrix-media-repo/util/random.go#L18-L27). | ||||||
|
||||||
### Filter IDs | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#post-matrix-client-r0-user-userid-filter) | ||||||
|
||||||
These are generated by the server and then used in the CS API. They are only | ||||||
required to be unique for a given user. `{` is already forbidden by the spec. | ||||||
|
||||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/storage/filtering.py#L70-L73) | ||||||
uses a stringified int. | ||||||
|
||||||
### Auth Session IDs | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/client_server/r0.3.0.html#user-interactive-authentication-api) | ||||||
|
||||||
These are generated by the server during auth, and then used in the CS | ||||||
API. However, they need to be unique for a given server. | ||||||
|
||||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/auth.py#L494) uses `[A-Za-z]{24}`. | ||||||
|
||||||
### Transaction IDs (for federation) | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/server_server/unstable.html#put-matrix-federation-v1-send-txnid) | ||||||
|
||||||
Generated by sending server. Needs to be unique for a given pair of servers. | ||||||
|
||||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/federation/transaction_queue.py#L593) uses a stringified int and accepts pretty much anything. | ||||||
|
||||||
### Transaction IDs (for C-S API) | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#put-matrix-client-r0-rooms-roomid-send-eventtype-txnid) | ||||||
|
||||||
These are generated by the client. They only need to be unique within the | ||||||
context of a single access_token/device. | ||||||
|
||||||
Synapse doesn't appear to do any sanity-checking here currently. | ||||||
|
||||||
[matrix-js-sdk](https://github.com/matrix-org/matrix-js-sdk/blob/c6b500bc09994ab5924ef8aab9bd10fc7ded5dae/src/base-apis.js#L123) | ||||||
uses `m[0-9]{13}.[0-9]{1,}`. | ||||||
[matrix-android-sdk](https://github.com/matrix-org/matrix-android-sdk/blob/088414fb187cae341690c3a01493b87d97f0169f/matrix-sdk/src/main/java/org/matrix/androidsdk/rest/model/Event.java#L503) | ||||||
uses a room ID plus a timestamp, hence kinda could be anything, but certainly | ||||||
will include a `!`. | ||||||
|
||||||
### Device IDs | ||||||
|
||||||
[Spec](https://matrix.org/docs/spec/client_server/unstable.html#relationship-between-access-tokens-and-devices) | ||||||
|
||||||
These are normally generated by the server on login. It's possible for clients | ||||||
to present their own device_ids, but we're not aware of this feature being | ||||||
widely used. | ||||||
|
||||||
They are used between users and across federation for E2E and to-device | ||||||
messages. They need to be unique for a particular user. They also appear in key | ||||||
IDs and must therefore be a subset of that grammar. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this means that the Device ID would appear within the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
[Synapse](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/device.py#L89) | ||||||
generates device IDs with `[A-Z]{10}`. It appears to do little sanity-checking | ||||||
of client-generated device IDs currently. | ||||||
|
||||||
Additional proposal: | ||||||
|
||||||
> Device IDs must not exceed 31 characters in length. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I counter-propose that at least 36 characters should be allowed so that Furthermore adopting a minimum length of 10 characters could be sensible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why 10 and not 1 as a minimum? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thinking was that it would encourage a degree of uniqueness. |
||||||
|
||||||
### Message IDs | ||||||
|
||||||
These are used in the server-server API for | ||||||
[Send-to-device messaging](https://matrix.org/docs/spec/server_server/unstable.html#send-to-device-messaging). | ||||||
|
||||||
Synapse uses `[A-Za-z]{16}`, and accepts anything that fits in a postgres TEXT | ||||||
field. Ref: [devicemessage.py](https://github.com/matrix-org/synapse/blob/74854a97191191b08101821753c2672efc2a65fd/synapse/handlers/devicemessage.py#L102). | ||||||
|
||||||
|
||||||
## Room Aliases | ||||||
|
||||||
These are a complex topic and are discussed in [MSC | ||||||
1608](https://github.com/matrix-org/matrix-doc/issues/1608). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With exclusive appservice namespaces, it is currently possible for appservices that are registered to unknowingly overlap with existing user IDs, room aliases, (and possibly in the future, media IDs). Ideally there should be some chars or some sigil reserved solely for usage with appservices to prevent this issue in the future.
So perhaps this could be addressed here.