Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending encrypted events is slow and takes ~60s in a room with 1000 devices #15476

Closed
turt2live opened this issue Oct 15, 2020 · 18 comments · Fixed by matrix-org/matrix-react-sdk#12630
Labels
A-E2EE A-Performance O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Team: Crypto Z-Awaiting-Element-R Issues in code which will be replaced by the port of Element's crypto layer to Rust Z-Chronic Z-Legacy-Crypto Issues affecting the legacy crypto stack

Comments

@turt2live
Copy link
Member

turt2live commented Oct 15, 2020

Note: this is distinct to #24612, in which events never get encrypted at all.

In relatively simple rooms (30 people from the company) it takes anywhere between 30 and 120 seconds extra to send a message due to device list changes.

@jryans
Copy link
Collaborator

jryans commented Oct 21, 2020

@turt2live Wouldn't this more likely be resolved on the server? Or what would you imagine the client would change here?

@jryans jryans added A-E2EE X-Needs-Info This issue is blocked awaiting information from the reporter labels Oct 21, 2020
@turt2live
Copy link
Member Author

It feels like it's just code execution time on the client side, and unrelated to the server. Just need to make the loops go faster :)

@jryans jryans added T-Defect P1 A-Performance S-Major Severely degrades major functionality or product features, with no satisfactory workaround and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Oct 22, 2020
@jryans
Copy link
Collaborator

jryans commented Dec 9, 2021

@turt2live Is this still an issue for you?

@kittykat kittykat added X-Needs-Info This issue is blocked awaiting information from the reporter and removed X-Needs-Community-Testing labels Dec 9, 2021
@turt2live
Copy link
Member Author

Yes. Messages in the company chat take 2-3 minutes to encrypt.

@SimonBrandner SimonBrandner added O-Occasional Affects or can be seen by some users regularly or most users rarely and removed P1 labels Dec 9, 2021
@davidak
Copy link

davidak commented Feb 23, 2022

I joined a room with 3 members and sent 2 messages. They are in state "Encrypting your message..." for an hour now!

When i open a group members profile on the right i see a loading icon under "security", even for myself.

Screenshot from 2022-02-23 21-05-29

This is not working at all.

Update: I suspended the computer over night and when i resumed today, the message was sent and i see my sessions in the profile.

The 4 users have 33 sessions in total (4+9+14+6) that need to get encrypted.

My browser is Google Chrome 97.0.4692.99 on a linux computer with an Intel i9-9900K CPU. Hardware performance should not be an issue. Actually the browser tab process used only 5% CPU when encrypting!

Screenshot from 2022-02-24 09-32-01

The "Today" is repeated 7 times in the chat. Can you fix that? The software feels broken and unreliable.

@richvdh richvdh changed the title Messages stay green (encrypting) for too long Encrypting events is slow Feb 14, 2023
@richvdh
Copy link
Member

richvdh commented Feb 14, 2023

There are two separate issues here. The original report is "it takes anywhere between 30 and 120 seconds extra to send a message." There is some investigation of this at #24153, but the TL;DR is that we don't plan to do much about it before #21972.

If it gets completely stuck - and stays green for many minutes - that is a separate issue. Please (a) ensure you are using the latest release of Element-Web, and if so open a new bug and file a new rageshake.

@richvdh richvdh changed the title Encrypting events is slow Sending encrypted events takes ~60s in a room with 1000 devices Feb 14, 2023
@germain-gg germain-gg added O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience and removed O-Occasional Affects or can be seen by some users regularly or most users rarely labels Feb 14, 2023
@kittykat kittykat added Z-Chronic and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Feb 16, 2023
@richvdh
Copy link
Member

richvdh commented Feb 17, 2023

I compared a recent rageshake from Matthew showing encryption taking 78 seconds for a room with 1288 devices, with one from October 2022 which shows @thibaultamartin sending a message in a room with 1132 devices in 49 seconds. Both the initial load of olm sessions (5s), and the sharing of the keys (44s), is much quicker.

On the other hand, a rageshake from my own device just now (https://github.com/matrix-org/element-web-rageshakes/issues/20175) shows the whole thing happening in 12 seconds.

So I think we're comparing apples and oranges by comparing these two rageshakes: the fact that Thib was able to encrypt in 49 seconds back in October doesn't indicate that Matthew would have been able to, or that there has been any sort of regression.

I'm a bit reluctant to speculate on why it is so slow for Matthew - or even why it took as long as 49 seconds for Thib - without proper profiling information, and again the whole thing seems rather moot since we're planning to throw all this code away.

@richvdh
Copy link
Member

richvdh commented Feb 22, 2023

If it gets completely stuck - and stays green for many minutes - that is a separate issue. Please (a) ensure you are using the latest release of Element-Web, and if so open a new bug and file a new rageshake.

See #24612 for this.

@richvdh richvdh added the Z-Awaiting-Element-R Issues in code which will be replaced by the port of Element's crypto layer to Rust label Mar 15, 2023
@richvdh
Copy link
Member

richvdh commented Mar 15, 2023

I don't think this affects "most users" so I am downgrading its severity occurrence.

@richvdh richvdh added O-Occasional Affects or can be seen by some users regularly or most users rarely and removed O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience labels Mar 15, 2023
@pmaier1
Copy link

pmaier1 commented Apr 4, 2023

We expect this to be improved a lot with the introduction of Element R which is just around the corner. Ref. for more background matrix-org/matrix-rust-sdk#170 (comment).

@ara4n
Copy link
Member

ara4n commented May 5, 2023

So I just got a really bad instance of this - 96s to encrypt a message in a smallish room (with only ~100 devices present). It looks like:

  • Encryption was happening in parallel with sharing a new megolm session for a room with 1000 devices - and seems to have got blocked behind it, despite the small room finishing first (and starting later).
  • I didn't do anything to trigger encrypting for the big room.

I think this might be the regression we've been chasing since ~Feb. I'm flagging it in case ER doesn't help (when it lands), which would make sense if the actual problem here is that we're incorrectly preemptively sharing megolm sessions for massive rooms, and blocking smaller rooms.

@ara4n
Copy link
Member

ara4n commented May 5, 2023

see rageshake for detailed logs. My hunch is that the act of switching to a room is incorrectly triggering "Preparing to encrypt events" (or perhaps presence or typing state is being picked up, triggering it - e.g. hitting the ⌘ key in preparation to ⌘K), which is obviously awful if all encryption elsewhere then gets blocked behind it.

@t3chguy
Copy link
Member

t3chguy commented May 9, 2023

Looks like any unhandled keydown event in the composer will trigger prepareToEncrypt - will change it to happen only when the model is changed as to ignore unrelated shortcuts.

@ara4n ara4n changed the title Sending encrypted events takes ~60s in a room with 1000 devices Sending encrypted events is slow and takes ~60s in a room with 1000 devices May 22, 2023
@ara4n
Copy link
Member

ara4n commented May 22, 2023

thanks. @richvdh will EWR also suffer from "setting up olm for the devices in one room blocks all other crypto" failure mode?

@richvdh
Copy link
Member

richvdh commented May 22, 2023

thanks. @richvdh will EWR also suffer from "setting up olm for the devices in one room blocks all other crypto" failure mode?

Unfortunately, yes it will. The rust crypto SDK requires that only one call to OlmMachine::get_missing_sessions be in flight at once, so we have a similar locking arrangement to the current Crypto implementation.

@richvdh
Copy link
Member

richvdh commented Jun 15, 2023

Anecdotally, I've noticed a few more people complaining about this. It has also started to bite me recently, meaning I can dig into it a bit more.

Essentially, the problem seems to be poor indexeddb performance. For each device, we are doing 4 transactions, each of which takes about 20ms. In a room with 1000 devices, that's 80 seconds.

My theory here is that, once the IndexedDB database gets above a certain size, performance drops off a cliff. That may be something that could be investigated more. We could also look at combining those 4000 transactions.

However... either approach is likely to be a significant time-sink, and I'd still much rather focus on ER (which does use indexeddb rather more sensibly).

@richvdh
Copy link
Member

richvdh commented Jun 21, 2023

Out of interest, I tried exporting the indexeddb to a JSON blob, and re-importing. It didn't help.

@ara4n
Copy link
Member

ara4n commented Jul 3, 2023

I dug into this again with a RS for a message taking 12 minutes(!) to share keys to 1200 devices. In this instance, each slice is taking 15s to share... however, there were also as many as 4 parallel megolm shares going on. I think these are different symptoms to #15476 (comment), as I don't see one share getting stuck behind another. Have rageshaked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-E2EE A-Performance O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Team: Crypto Z-Awaiting-Element-R Issues in code which will be replaced by the port of Element's crypto layer to Rust Z-Chronic Z-Legacy-Crypto Issues affecting the legacy crypto stack
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants