Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an instance of failed to decrypt error when an in flight /keys/query fails. #3486

Merged
merged 6 commits into from
Jul 4, 2023

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Jun 19, 2023

Specifically, when checking the event sender matches who sent us the session keys we skip waiting for pending device list updates if we already know who owns the session key.

This mitigates some of element-hq/element-web#24682.

I've also stuck an try/catch across the downloadKeys(..) call, on the assumption that it would make it easier to track this failure mode down. I've completely made up the new OLM_BAD_SENDER_CHECK_FAILED error code.

Ideally, I think instead of doing it like this we'd instead specifically track who sent us the session keys (or at least, who we believe session keys should be from), and match against those when we do the decryption. But that's a bigger job and I think this should help a bunch while this gets fixed properly.


Here's what your changelog entry will look like:

🐛 Bug Fixes

  • Fix an instance of failed to decrypt error when an in flight /keys/query fails. (#3486).

Specifically, when checking the event sender matches who sent us the
session keys we skip waiting for pending device list updates if we
already know who owns the session key.
src/crypto/algorithms/olm.ts Outdated Show resolved Hide resolved
src/crypto/algorithms/olm.ts Outdated Show resolved Hide resolved
src/crypto/algorithms/olm.ts Outdated Show resolved Hide resolved
Comment on lines +215 to +224
try {
await this.crypto.deviceList.downloadKeys([event.getSender()!], false);
} catch (e) {
throw new DecryptionError("OLM_BAD_SENDER_CHECK_FAILED", "Could not verify sender identity", {
sender: deviceKey,
err: e as Error,
});
}

senderKeyUser = this.crypto.deviceList.getUserByIdentityKey(olmlib.OLM_ALGORITHM, deviceKey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it rather feels like the value of this is pretty limited, given that we're happy to accept messages from unknown devices.

The only time this might do something different is if a key-fetch for another user happens to complete before that for the sender, and it turns out that other user is the rightful owner of the device. That seems a pretty remote chance to me.

@BillCarsonFr wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think that logic is correct, though I'm hesitant to remove it without understanding why we added it in the first place given this is a security thing)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the logic. Here we are receiving a megolm key by succesfully decrypting an olm message.
So from that we have a megolm session that is:

  • owned by a curveKey
  • claimed to be owned by a mxid
  • claimed to be owned by a ed key

=> What we want is to validate the things that are claimed (and also sanity validate the homeserver controlled fields that are in the original event)

The only way to do that is to download the device (/keys/query), we only accepted devices correctly signed (mxid|curve|ed|device_id are signed by the ed_key). This signature "binds" the curve/ed/mxid together

So once we have the device we can check if the claimed stuff match what's in the key/query.

And typically the first time someone adds you and send you an encrypted to device you won't have yet downloaded his keys. => Makes it impossible to link to mxId at time of reception of the key.

Regarding the solution:
We need for m.room.keys to delay these checks, so not download anything at all. And instead at time of decryption check that, or when the keys are finally downloaded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also notice that currently we can't make the difference between an not known yet device or a delete device, or a very short lived device.. (hence the red warning for message sent from a deleted device)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to do that is to download the device (/keys/query),

If we have previously stored a signed copy of the device, surely there is no need to download it again?

Regarding the solution: We need for m.room.keys to delay these checks, so not download anything at all.

Yes, that's the ideal solution, but that's really not the point here: this PR is trying to make an incremental improvement to the existing implementation.

There are two proposed improvements here:

  • Erik's change as it currently stands: if we already have a signed copy of the device, there is no need to wait for any ongoing /keys/query requests to complete. Can we agree this is uncontroversial?
  • My later comment: given that - in this code - we treat an unknown device identically to a correctly-signed device, I don't really see the value in doing a /keys/query request anyway. Supposing the incoming message is from a bogus device, we're not going to get any extra information by requesting the device list from the claimed user: the bogus device isn't going to be in the list, so it is treated as "unknown".

This code has been observed to cause significant numbers of UISIs. I don't believe the current implementation offers any value, and Erik is proposing a quick win here. I think it would be a mistake to refuse it because it's not the ideal solution.

Co-authored-by: Richard van der Hoff <[email protected]>
Comment on lines +215 to +224
try {
await this.crypto.deviceList.downloadKeys([event.getSender()!], false);
} catch (e) {
throw new DecryptionError("OLM_BAD_SENDER_CHECK_FAILED", "Could not verify sender identity", {
sender: deviceKey,
err: e as Error,
});
}

senderKeyUser = this.crypto.deviceList.getUserByIdentityKey(olmlib.OLM_ALGORITHM, deviceKey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the logic. Here we are receiving a megolm key by succesfully decrypting an olm message.
So from that we have a megolm session that is:

  • owned by a curveKey
  • claimed to be owned by a mxid
  • claimed to be owned by a ed key

=> What we want is to validate the things that are claimed (and also sanity validate the homeserver controlled fields that are in the original event)

The only way to do that is to download the device (/keys/query), we only accepted devices correctly signed (mxid|curve|ed|device_id are signed by the ed_key). This signature "binds" the curve/ed/mxid together

So once we have the device we can check if the claimed stuff match what's in the key/query.

And typically the first time someone adds you and send you an encrypted to device you won't have yet downloaded his keys. => Makes it impossible to link to mxId at time of reception of the key.

Regarding the solution:
We need for m.room.keys to delay these checks, so not download anything at all. And instead at time of decryption check that, or when the keys are finally downloaded

Comment on lines +215 to +224
try {
await this.crypto.deviceList.downloadKeys([event.getSender()!], false);
} catch (e) {
throw new DecryptionError("OLM_BAD_SENDER_CHECK_FAILED", "Could not verify sender identity", {
sender: deviceKey,
err: e as Error,
});
}

senderKeyUser = this.crypto.deviceList.getUserByIdentityKey(olmlib.OLM_ALGORITHM, deviceKey);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also notice that currently we can't make the difference between an not known yet device or a delete device, or a very short lived device.. (hence the red warning for message sent from a deleted device)

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have spoken to @BillCarsonFr and we think it's fine to merge this as-is. Per my earlier comments, I think we could go even further, but in the interests of expediency maybe we should just land it.

@richvdh richvdh added this pull request to the merge queue Jul 4, 2023
Merged via the queue into develop with commit 5be4548 Jul 4, 2023
@richvdh richvdh deleted the erikj/avoid_waiting_download branch July 4, 2023 11:46
SimonBrandner added a commit that referenced this pull request Jul 10, 2023
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <[email protected]>
Co-authored-by: Michael Telatynski <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: RiotRobot <[email protected]>
Co-authored-by: Florian Duros <[email protected]>
Co-authored-by: Kerry <[email protected]>
Co-authored-by: David Baker <[email protected]>
Co-authored-by: Erik Johnston <[email protected]>
Co-authored-by: Valere <[email protected]>
Co-authored-by: Hubert Chathi <[email protected]>
Close IDB database before deleting it to prevent spurious unexpected close errors (#3478)
Fix export type `GeneratedSecretStorageKey` (#3479)
Fix order of things in `crypto-api.ts` (#3491)
Fix bug where switching media caused media in subsequent calls to fail (#3489)
fixes (#3515)
fix the integ tests, where #3509 etc fix the unit tests.
fix breakage on node 16 (#3527)
Fix an instance of failed to decrypt error when an in flight `/keys/query` fails. (#3486)
Fix `TypedEventEmitter::removeAllListeners(void)` not working (#3561)
github-merge-queue bot pushed a commit that referenced this pull request Sep 12, 2023
* Add hacky option to disable the actual calling part of group calls.

So we can try using livekit instead.

* Put LiveKit info into the `m.call` state event (#3522)

* Put LK info into state

Signed-off-by: Šimon Brandner <[email protected]>

* Update to the new way the LK service works

Signed-off-by: Šimon Brandner <[email protected]>

---------

Signed-off-by: Šimon Brandner <[email protected]>

* Send 'contentLoaded' event

As per comment, so we can start digging ourselves out of the widget
API hole we're currently in.

* Add comment on updating the livekit service URL

* Appease CI on `livekit` branch (#3566)

* Update codeowners on `livekit` branch (#3567)

* add getOpenIdToken to embedded client backend

Signed-off-by: Timo K <[email protected]>

* add test and update comment

Signed-off-by: Timo K <[email protected]>

* Merge `develop` into `livekit` (#3569)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Richard van der Hoff <[email protected]>
Co-authored-by: Michael Telatynski <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: RiotRobot <[email protected]>
Co-authored-by: Florian Duros <[email protected]>
Co-authored-by: Kerry <[email protected]>
Co-authored-by: David Baker <[email protected]>
Co-authored-by: Erik Johnston <[email protected]>
Co-authored-by: Valere <[email protected]>
Co-authored-by: Hubert Chathi <[email protected]>
Close IDB database before deleting it to prevent spurious unexpected close errors (#3478)
Fix export type `GeneratedSecretStorageKey` (#3479)
Fix order of things in `crypto-api.ts` (#3491)
Fix bug where switching media caused media in subsequent calls to fail (#3489)
fixes (#3515)
fix the integ tests, where #3509 etc fix the unit tests.
fix breakage on node 16 (#3527)
Fix an instance of failed to decrypt error when an in flight `/keys/query` fails. (#3486)
Fix `TypedEventEmitter::removeAllListeners(void)` not working (#3561)

* Revert "Merge `develop` into `livekit`" (#3572)

* Don't update calls with no livekit URL & expose method to update it instead

and generally simplify a bit: change it to a single string rather than
an array of structs.

* Fix other instances of passing focusInfo / livekit url

* Add temporary setter

* WIP refactor for removing m.call events

* Always remember rtcsessions since we need to only have one instance

* Fix tests

* Fix import loop

* Fix more cyclic imports & tests

* Test session joining

* Attempt to make tests happy

* Always leave calls in the tests to clean up

* comment + desperate attempt to work out what's failing

* More test debugging

* Okay, so these ones are fine?

* Stop more timers and hopefully have happy tests

* Test no rejoin

* Test malformed m.call.member events

* Test event emitting

and also move some code to a more sensible place in the file

* Test getActiveFoci()

* Test event emitting (and also fix it)

* Test membership updating & pruning on join

* Test getOldestMembership()

* Test member event renewal

* Don't start the rtc manager until the client has synced

Then we can initialise from the state once it's completed.

* Fix type

* Remove listeners added in constructor

* Stop the client here too

* Stop the client here also also

* ARGH. Disable tests to work out which one is causing the exception

* Disable everything

* Re-jig to avoid setting listeners in the constructor

and re-enable tests

* No need to rename this anymore

* argh, remove the right listener

* Is it this test???

* Re-enable some tests

* Try mocking getRooms to return something valid

* Re-enable other tests

* Give up trying to get the tests to work sensibly and deal with getRooms() returning nothing

* Oops, don't enable the ones that were skipped before

* One more try at the sensible way

* Didn't work, go back to the hack way.

* Log when we manage to send the member event update

* Support `getOpenIdToken()` in embedded mode (#3676)

* Call `sendContentLoaded()` (#3677)

* Start MatrixRTC in embedded mode (#3679)

* Reschedule the membership event check

* Bump widget api version

* Add mock for sendContentLoaded()

* More log detail

* Fix tests

and also better assert because the tests were passing undefined which
was considered fine because we were only checking for null.

* Simplify updateCallMembershipEvent a bit

* Split up updateCallMembershipEvent some more

* Typo

Co-authored-by: Daniel Abramov <[email protected]>

* Expand comment

* Add comment

* More comments

* Better comment

* Sesson

* Rename some variables

* Comment

* Remove unused method

* Wrap updatecallMembershipEvent so it only runs one at a time

* Do another update if another one is triggered while the update happens

* Make triggerCallMembershipEventUpdate async

* Fix test & some missed timer removals

* Mark session manager as unstable

---------

Signed-off-by: Šimon Brandner <[email protected]>
Signed-off-by: Timo K <[email protected]>
Co-authored-by: Šimon Brandner <[email protected]>
Co-authored-by: Timo K <[email protected]>
Co-authored-by: Timo <[email protected]>
Co-authored-by: Daniel Abramov <[email protected]>
su-ex added a commit to SchildiChat/matrix-js-sdk that referenced this pull request Feb 24, 2024
* Drop support for Node 16 ([\matrix-org#3533](matrix-org#3533)).
* Improve types around login, registration, UIA and identity servers ([\matrix-org#3537](matrix-org#3537)).
* **The Browserify artifact is being deprecated, scheduled for removal in the October 10th release cycle. (matrix-org#3189)**
* Simplify `MatrixClient::setPowerLevel` API ([\matrix-org#3570](matrix-org#3570)). Fixes element-hq/element-web#13900 and matrix-org#1844.
* Deprecate `VerificationRequest.getQRCodeBytes` and replace it with the asynchronous `generateQRCode`. ([\matrix-org#3562](matrix-org#3562)).
* Deprecate `VerificationRequest.beginKeyVerification()` in favour of `VerificationRequest.startVerification()`. ([\matrix-org#3528](matrix-org#3528)).
* Deprecate `Crypto.VerificationRequest` application event, replacing it with `Crypto.VerificationRequestReceived`. ([\matrix-org#3514](matrix-org#3514)).
* Throw saner error when peeking has its room pulled out from under it ([\matrix-org#3577](matrix-org#3577)). Fixes element-hq/element-web#18679.
* OIDC: Log in ([\matrix-org#3554](matrix-org#3554)). Contributed by @kerryarchibald.
* Prevent threads code from making identical simultaneous API hits ([\matrix-org#3541](matrix-org#3541)). Fixes element-hq/element-web#25395.
* Update IUnsigned type to be extensible ([\matrix-org#3547](matrix-org#3547)).
* add stop() api to BackupManager for clean shutdown ([\matrix-org#3553](matrix-org#3553)).
* Log the message ID of any undecryptable to-device messages ([\matrix-org#3543](matrix-org#3543)).
* Ignore thread relations on state events for consistency with edits ([\matrix-org#3540](matrix-org#3540)).
* OIDC: validate id token ([\matrix-org#3531](matrix-org#3531)). Contributed by @kerryarchibald.
* Fix read receipt sending behaviour around thread roots ([\matrix-org#3600](matrix-org#3600)).
* Fix `TypedEventEmitter::removeAllListeners(void)` not working ([\matrix-org#3561](matrix-org#3561)).
* Don't allow Olm unwedging rate-limiting to race ([\matrix-org#3549](matrix-org#3549)). Fixes element-hq/element-web#25716.
* Fix an instance of failed to decrypt error when an in flight `/keys/query` fails. ([\matrix-org#3486](matrix-org#3486)).
* Use the right anchor emoji for SAS verification ([\matrix-org#3534](matrix-org#3534)).
* fix a bug which caused the wrong emoji to be shown during SAS device verification. ([\matrix-org#3523](matrix-org#3523)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants