Skip to content
This repository has been archived by the owner on Apr 12, 2022. It is now read-only.

Stuck in a loop generating and failing to upload one-time keys #1289

Open
richvdh opened this issue Jun 6, 2017 · 8 comments
Open

Stuck in a loop generating and failing to upload one-time keys #1289

richvdh opened this issue Jun 6, 2017 · 8 comments

Comments

@richvdh
Copy link
Member

richvdh commented Jun 6, 2017

https://github.com/matrix-org/riot-android-rageshakes/issues/186 includes two instances where the user receives an incoming Olm session, but the session cannot be established due to BAD_MESSAGE_KEY_ID.

This means that the user's device didn't recognise the one-time-key the sender used to establish the session. That might be due to one of several things:

I guess the latter is more likely. It's very hard to tell from the logs, though.

(At least some of these failures seem to be the first message sent from the other side, which means it isn't due to Coffee's device receiving the first message, setting up the session, deleting the one time key, then forgetting the session.)

@richvdh
Copy link
Member Author

richvdh commented Jun 6, 2017

https://github.com/matrix-org/riot-android-rageshakes/issues/194 shows another example (from the same device)

@richvdh
Copy link
Member Author

richvdh commented Jun 6, 2017

I was able to reproduce this when initiating a new olm session with Coffee. It appears that his device had forgotten a load of the one-time-keys that had been published to the database.

On 14 May (08:53:41 UTC) his device appears to have published at least 100 new signed_curve25519 keys to the server, with key_ids ranging from AAAAWg (0x5a) to AAAAvQ (0xbd) - I'd like to investigate what might have caused that.

@richvdh
Copy link
Member Author

richvdh commented Jun 6, 2017

Inspection of the server logs provides at least a partial answer.

On 11 May (11:16:15), Coffee's device decided to upload a new one-time key to the server; it added key id AAAAWA (0x58).

At 11:52:37, it decides to upload another, but also gives the new key id AAAAWA. The server rejects the request.

For the next few days, it gets stuck in a loop: every 60 seconds, it:

  • checks how many keys are on the server; lets say there are N
  • observes there are too few and generates new ones (50-N, with a limit of 5)
  • tries to upload all of the keys it has generated so far but not yet successfully uploaded, including the (new) AAAAWA.

Eventually, on 14th May, it has generated so many new keys which it hasn't uploaded (specifically, 100) that it starts forgetting about some, starting with AAAAWA - which means that the next upload request succeeds and uploads the 100 brand-new keys. Meanwhile there are still 40 or so unused one-time keys on the server, waiting to be claimed by other users.


The initial problem here is that the device tries to upload two (different) instances of AAAAWA. I'll look into what could have caused that, but what happened afterwards is an absolute catalogue of fail:

  • If the server doesn't let us upload a key, maybe we should throw said key away rather than trying again forever.
  • If we decide we want one more key on the server, and we already have one which we haven't uploaded, we should upload it, rather than generating another one and uploading two.
  • When we rotate out one-time keys because we hit the local limit of 100, we should really make sure that we don't leave the rotated keys sitting on the server waiting to be claimed. This is https://github.com/vector-im/riot-web/issues/3309.

@richvdh
Copy link
Member Author

richvdh commented Jun 7, 2017

I guess the double-upload of AAAAWA may have been a variant of element-hq/element-web#1209.

@richvdh richvdh changed the title BAD_MESSAGE_KEY_ID on olm session setup Stuck in a loop generating and failing to upload one-time keys Jun 7, 2017
@ghost
Copy link

ghost commented Jun 10, 2017

It's interesting that all keys fail because one key fails. Does it upload multiple keys in a single request, which is then rejected wholesale, or does it simply get stuck on the one key, and never tries to upload the other keys?

@richvdh
Copy link
Member Author

richvdh commented Jun 10, 2017

it tries to upload all keys in one request.

@ghost
Copy link

ghost commented Jun 12, 2017

Would it make sense to get a more granular response back from the server? ("These keys were accepted, these keys were rejected for reason x and these keys were rejected for reason y.")

@richvdh
Copy link
Member Author

richvdh commented Jun 12, 2017

there woudn't be any harm in having the server generate a more helpful response, but it's deliberately transactional currently - either all the keys are accepted or none of them are. It's probably easier to throw away all the keys in the request if it gets rejected than go through picking and choosing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant