-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Initial APIs for storing E2E room keys #2731
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, doc comments are a thing that you should do
synapse/api/errors.py
Outdated
"""A client has tried to upload to a non-current version of the room_keys store | ||
""" | ||
def __init__(self, code=403, msg="Wrong room_keys version", current_version=None, | ||
errcode=Codes.WRONG_ROOM_KEYS_VERSION): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want the code
, msg
and errcode
to be configurable?
Do we not want to force the caller to include a current_version
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have made code
, msg
& errcode
unconfigurable, and de-optionalised current_version.
synapse/handlers/e2e_room_keys.py
Outdated
|
||
|
||
class E2eRoomKeysHandler(object): | ||
def __init__(self, hs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a quick comment on the handler to say what its responsible for. A two line description would probably be good enough,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
synapse/handlers/e2e_room_keys.py
Outdated
|
||
@defer.inlineCallbacks | ||
def delete_room_keys(self, user_id, version, room_id, session_id): | ||
yield self.store.delete_e2e_room_keys(user_id, version, room_id, session_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops; added for consistency with uploads now
synapse/handlers/e2e_room_keys.py
Outdated
# go through the room_keys | ||
for room_id in room_keys['rooms']: | ||
for session_id in room_keys['rooms'][room_id]['sessions']: | ||
room_key = room_keys['rooms'][room_id]['sessions'][session_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do the loop as something like:
for room_id, entry in room_keys['rooms'].iteritems():
for session_id in entry['sessions']:
etc...
So that we don't have to constantly look the key up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
synapse/handlers/e2e_room_keys.py
Outdated
|
||
yield self._upload_room_key( | ||
user_id, version, room_id, session_id, room_key | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be tempted to do some of the different rooms concurrently, given its in a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this is preemie optimisation; we don't even have a flow yet where the bulk upload is used. so have added a comment instead.
synapse/storage/e2e_room_keys.py
Outdated
|
||
sessions = AutoVivification() | ||
for row in rows: | ||
sessions['rooms'][row['room_id']]['sessions'][row['session_id']] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be written as:
sessions['rooms'].setdefault(row['room_id'], {"sessions":{}})['sessions'][row['session_id']] = { ... }
Or if you don't mind that we use an extra line:
room_entry = sessions['rooms'].setdefault(row['room_id'], {"sessions":{}})
room_entry['sessions'][row['session_id']] = { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks; done
synapse/storage/e2e_room_keys.py
Outdated
if room_id: | ||
keyvalues['room_id'] = room_id | ||
if session_id: | ||
keyvalues['session_id'] = session_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If room_id
and session_id
aren't required I think it would be polite to make them optional arguments, even if that isn't used. It makes it a lot clearer just from the signature line that the params are optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does it make sense to have a session_id without a room_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
CREATE UNIQUE INDEX e2e_room_keys_user_idx ON e2e_room_keys(user_id); | ||
CREATE UNIQUE INDEX e2e_room_keys_room_idx ON e2e_room_keys(room_id); | ||
CREATE UNIQUE INDEX e2e_room_keys_session_idx ON e2e_room_keys(session_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ermh, are you sure that you want these to be unique indices? What if you have multiple users uploading keys for the same room?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code I think we can get away with a single index of (user_id, room_id, session_id)
, which will be much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixed.
auth_data TEXT NOT NULL | ||
); | ||
|
||
CREATE UNIQUE INDEX e2e_room_key_user_idx ON e2e_room_keys(user_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not what you mean, I assume:
CREATE UNIQUE INDEX e2e_room_key_versions_idx ON e2e_room_key_versions(user_id);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, fixed
CREATE UNIQUE INDEX e2e_room_keys_session_idx ON e2e_room_keys(session_id); | ||
|
||
-- the metadata for each generation of encrypted e2e session backups | ||
CREATE TABLE e2e_room_key_versions ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Room keys? There isn't a room_id in the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed to be called e2e_room_keys_versions
which is more grammatically correct, given we are talking about versions of sets of backup of multiple room_keys here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we don't have a new line at the start of functions, which you seem to be doing a lot for some reason.
|
||
info = yield self.e2e_room_keys_handler.get_version_info( | ||
user_id, version | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't bother checking that version was supplied; i think it will explode fairly sensibly if it isn't?
My hunch is that it'd 500 in a fairly non obvious way tbh, or possibly succeed as the database ends up executing SELECT * FROM versions WHERE version = null
. Not hugely friendly.
Not sure what the point of splitting into POST and GET/DELETE would be?
So you have a GET and DELETE that only work if a version is supplied, and a POST that only works if you don't supply a version. To save having to do checks everywhere, and to make it clearer when a version is required or disallowed, having this as separate classes seems much clearer.
And shouldn't POST have the same path anyway?
They don't really share a path as it is. If you think that POST /room_keys/version
is clearer than POST /room_keys/create_version
then fair enough, twas but a thought.
And yes, turns out naming APIs is hard.
synapse/storage/e2e_room_keys.py
Outdated
|
||
# XXX: this isn't currently used and isn't tested anywhere | ||
# it could be used in future for bulk-uploading new versions of room_keys | ||
# for a user or something though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not have to review code that isn't used, and dead code just adds bloat to the code base and makes understanding things more annoying.
Also, I think I would ask you to rewrite.
synapse/storage/e2e_room_keys.py
Outdated
def get_e2e_room_keys( | ||
self, user_id, version, room_id=None, session_id=None | ||
): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its probably worth documenting what happens in the presence of a null room_id and null session_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
synapse/storage/e2e_room_keys.py
Outdated
def delete_e2e_room_keys( | ||
self, user_id, version, room_id=None, session_id=None | ||
): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto with docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@erikjohnston I've:
In other news (cc @richvdh @erikjohnston), this felt really painful to write - and not just because I'm a bit rusty. In the end, this is an incredibly trivial API, for simple CRUD of some JSON objects. The only subtlety is the merge semantics when updating, but that's a whole 5 lines of code. But meanwhile, the overall PR has ended up being a 1,511 line diff (including tests), which feels insanely heavyweight for a relatively non-verbose language like Python. Stuff which felt particularly "gah we're doing this wrong" included:
Anyway, though I'd mention this, as if it's painful for me, I imagine it's also painful for other contributors... |
synapse/storage/e2e_room_keys.py
Outdated
"SELECT MAX(version) FROM e2e_room_keys_versions WHERE user_id=?", | ||
(user_id,) | ||
) | ||
this_version = txn.fetchone()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there isn't a version for this userID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
info = yield self.e2e_room_keys_handler.get_version_info( | ||
user_id, version | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, the thought of changing the API shape was simply a tangential suggestion. The actual suggestion was to change the code layout to better avoid having to manually handle null values of version
.
Having GET /version
that returns the most recent version works, but what does it return if there isn't a version for that user? What does DELETE /version
do? I note that we still don't seem to be checking if the version is null in that case.
keyvalues = { | ||
"user_id": user_id, | ||
"version": version, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a separate variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no idea. fixed.
|
||
requester = yield self.auth.get_user_by_req(request, allow_guest=False) | ||
user_id = requester.user.to_string() | ||
version = request.args.get("version")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For query params its probably better to use parse_string
and co
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
TBH I wouldn't classify this as simple CRUD, given the merging semantics and linearization that is going on.
The separation has been really quite useful to split out the parsing of the request versus the actual functionality. Merging rest/handlers quickly become unwieldy in most cases, and it makes skimming the rest layer to figure out what the APIs are really easy, which is something I do a lot to get back up to speed with how some bit of synapse works.
This is a nuisance, yes, though its perhaps worth noting that the documentation on the rest layer is describing the APIs in detail, while the handler functions are describing exactly what each bit is doing under the hood. In general I've found that even without explicit layering you end up with duplicated documentation, unless you have a single mammoth function that does everything.
The structure of the glow database was quite complicated, iirc, as you need to somehow magically have all the correct indices in place. It also didn't support common queries, e.g. If you simply want to store JSON in the database then that's easy─and we do it all over the place─just use Other than not requiring you to define a table and what indices you want, I'm not sure that it would actually buy you very much here, since you'd still have all the merging and validation logic.
The flip side is that it makes the code so much nicer to work on. Most developers nowadays are so used to having linters that its second nature. |
Not much to add to your pain points beyond what erik has already written here, all of which I agree with completely. FWIW I see a 1366 line diff rather than 1511, of which a quarter is tests (for which yay), and, eyeballing it, it looks like fully half of it is either doc-comments (for which yay again) or copyright notices (meh). I think that one of the reasons that synapse has been so hard to work on in the past is that we have not done well with doc-comments, which leads to a lot of spelunking if you want to change existing code. The fact that this code is now clearly documented is something to be proud of, not be sad about. |
@erikjohnston ptal. |
Ironically, there are some problems which have turned up with this whilst implementing the clientside, so I'm going to avoid merging this until I'm sure it's on the right track. Need to:
|
let's reopen this when it's ready, then |
class EndToEndRoomKeyStore(SQLBaseStore): | ||
|
||
@defer.inlineCallbacks | ||
def get_e2e_room_key(self, user_id, version, room_id, session_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is only used when checking to see if we should overwrite a key. Would it be better to just convert that bit of code to use get_e2e_room_keys
instead, and get rid of this function?
Implementing the proposal in https://docs.google.com/document/d/1MOoIA9qEKIhUQ3UmKZG-loqA8e0BzgWKKlKRUGMynVc/edit#
Now ready for final review