-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC2703: Media ID grammar #2703
Conversation
## Proposal | ||
|
||
Media IDs must be treated and represented as opaque identifiers. Only characters in the | ||
[RFC 2986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if media ids are not allowed to be .
or ..
. Those can lead to issues with local client media caches. While a client should never rely on a path being safe, I think those 2 should be explicitly disallowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would this be a problem for client caches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you cache the files on a traditional file system, a media id like .
and ..
would be a reserved filename on many systems mapping to current directory or parent directory respectively. Since in your proposal the media id can't contain /
, the use of that is limited, if you want to escape a directory and overwrite other files, but you still wouldn't be able to create files like that. So you would need to work around those being valid ids and need to find alternative mappings. I think all other character combinations should be fine though. That way you can just cache files like <server part>/<media id>
in the filesystem, making the client side caches much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a super bad idea to cache the media IDs as-is on a traditional file system regardless of this MSC - don't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's effectively untrusted user input and file storage 101: don't use the name given to you when storing files. Collisions, directory issues, etc are all possible which can cause problems for people.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you validate, that the id only contains ALPHA / DIGIT / "-" / "." / "_" / "~"
and is not one of the special filenames, you only have collision issues now. If 2 media ids collide, you already have bigger issues, so I'm not sure, if that applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd hopefully not be storing just the bare media ID and would instead be accompanying it with a domain, which can theoretically have the same problem: we don't actually specify that the origin
needs to be a server, just heavily imply it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a super bad idea to cache the media IDs as-is on a traditional file system regardless of this MSC - don't do that.
Yeah. Synapse definitely wouldn't do that 🤦
## Proposal | ||
|
||
Media IDs must be treated and represented as opaque identifiers. Only characters in the | ||
[RFC 2986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[RFC 2986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. | |
[RFC 3986 Unreserved Characters](https://tools.ietf.org/html/rfc3986#section-2.3) set can be used. |
Also, maybe relax the list to pchar
instead?
this looks like an attempt at fixing matrix-org/matrix-spec#503. An alternative proposal was made in MSC1597 (https://github.com/matrix-org/matrix-spec-proposals/blob/rav/proposals/id_grammar/proposals/1597-id-grammar.md#opaque-ids), though I think it says exactly the same thing. |
I haven't checked, but I can believe that. From memory, the plan here was to try and get something through a bit faster than the general case, but considering that clearly hasn't happened, let's just close this one. |
personally I'd say the reason that MSC1597 hasn't gone anywhere is because it is trying to fix everything at once. Splitting it up seems like a good idea to me... |
fair :D We can always ressurect this MSC or make a new one - I don't really have bandwidth in any case to properly give attention to this one (and honestly forgot it existed...) |
Rendered