-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sending encrypted events is slow and takes ~60s in a room with 1000 devices #15476
Comments
@turt2live Wouldn't this more likely be resolved on the server? Or what would you imagine the client would change here? |
It feels like it's just code execution time on the client side, and unrelated to the server. Just need to make the loops go faster :) |
@turt2live Is this still an issue for you? |
Yes. Messages in the company chat take 2-3 minutes to encrypt. |
I joined a room with 3 members and sent 2 messages. They are in state "Encrypting your message..." for an hour now! When i open a group members profile on the right i see a loading icon under "security", even for myself. This is not working at all. Update: I suspended the computer over night and when i resumed today, the message was sent and i see my sessions in the profile. The 4 users have 33 sessions in total (4+9+14+6) that need to get encrypted. My browser is Google Chrome 97.0.4692.99 on a linux computer with an Intel i9-9900K CPU. Hardware performance should not be an issue. Actually the browser tab process used only 5% CPU when encrypting! The "Today" is repeated 7 times in the chat. Can you fix that? The software feels broken and unreliable. |
There are two separate issues here. The original report is "it takes anywhere between 30 and 120 seconds extra to send a message." There is some investigation of this at #24153, but the TL;DR is that we don't plan to do much about it before #21972. If it gets completely stuck - and stays green for many minutes - that is a separate issue. Please (a) ensure you are using the latest release of Element-Web, and if so open a new bug and file a new rageshake. |
I compared a recent rageshake from Matthew showing encryption taking 78 seconds for a room with 1288 devices, with one from October 2022 which shows @thibaultamartin sending a message in a room with 1132 devices in 49 seconds. Both the initial load of olm sessions (5s), and the sharing of the keys (44s), is much quicker. On the other hand, a rageshake from my own device just now (https://github.com/matrix-org/element-web-rageshakes/issues/20175) shows the whole thing happening in 12 seconds. So I think we're comparing apples and oranges by comparing these two rageshakes: the fact that Thib was able to encrypt in 49 seconds back in October doesn't indicate that Matthew would have been able to, or that there has been any sort of regression. I'm a bit reluctant to speculate on why it is so slow for Matthew - or even why it took as long as 49 seconds for Thib - without proper profiling information, and again the whole thing seems rather moot since we're planning to throw all this code away. |
See #24612 for this. |
I don't think this affects "most users" so I am downgrading its |
We expect this to be improved a lot with the introduction of Element R which is just around the corner. Ref. for more background matrix-org/matrix-rust-sdk#170 (comment). |
So I just got a really bad instance of this - 96s to encrypt a message in a smallish room (with only ~100 devices present). It looks like:
I think this might be the regression we've been chasing since ~Feb. I'm flagging it in case ER doesn't help (when it lands), which would make sense if the actual problem here is that we're incorrectly preemptively sharing megolm sessions for massive rooms, and blocking smaller rooms. |
see rageshake for detailed logs. My hunch is that the act of switching to a room is incorrectly triggering "Preparing to encrypt events" (or perhaps presence or typing state is being picked up, triggering it - e.g. hitting the ⌘ key in preparation to ⌘K), which is obviously awful if all encryption elsewhere then gets blocked behind it. |
Looks like any unhandled keydown event in the composer will trigger |
thanks. @richvdh will EWR also suffer from "setting up olm for the devices in one room blocks all other crypto" failure mode? |
Unfortunately, yes it will. The rust crypto SDK requires that only one call to |
Anecdotally, I've noticed a few more people complaining about this. It has also started to bite me recently, meaning I can dig into it a bit more. Essentially, the problem seems to be poor indexeddb performance. For each device, we are doing 4 transactions, each of which takes about 20ms. In a room with 1000 devices, that's 80 seconds. My theory here is that, once the IndexedDB database gets above a certain size, performance drops off a cliff. That may be something that could be investigated more. We could also look at combining those 4000 transactions. However... either approach is likely to be a significant time-sink, and I'd still much rather focus on ER (which does use indexeddb rather more sensibly). |
Out of interest, I tried exporting the indexeddb to a JSON blob, and re-importing. It didn't help. |
I dug into this again with a RS for a message taking 12 minutes(!) to share keys to 1200 devices. In this instance, each slice is taking 15s to share... however, there were also as many as 4 parallel megolm shares going on. I think these are different symptoms to #15476 (comment), as I don't see one share getting stuck behind another. Have rageshaked. |
Note: this is distinct to #24612, in which events never get encrypted at all.
In relatively simple rooms (30 people from the company) it takes anywhere between 30 and 120 seconds extra to send a message due to device list changes.
The text was updated successfully, but these errors were encountered: