-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Root causing performance issues #2114
Comments
I'm also seeing a lot of postgres errors for device uniqueness
|
I also notice the slowness too with my homeserver. A lot of strange stuff happens, such as messages taking way too long to send, images and other media sit for minutes without finishing sending. Sometimes my friend will send me a message (from matrix.org homeserver account) to me and I will never get it for another 30 minutes. I'm not sure what causes the problem. |
please do run the queries mentioned at #1760. If you have busy rooms with large numbers of forward extremities, clearing them out will significantly reduce the amount of work synapse has to do for every single event received (locally or over federation) in those rooms, so that alone can make all the difference. Please also upgrade to synapse 0.20.0 if you're on an older version: it should improve a number of things related to performance.
Errors sending to offline homeservers are not a cause for concern. Synapse 0.20 should spend less time trying to contact homeservers which are known to be offline, thanks to #2050.
Those are also normal. |
(If #1760 doesn't make sense, please comment there rather than forking new issues) |
Acknowledged and thank you. We will upgrade the server once the arch package is updated to v0.20.0 and I will run those queries now. I mainly was just seeing a variety of things and wasn't sure if they correlated to some issues that were closed/opened/etc. Feel free to close this issue if you'd prefer, otherwise I can update it once we upgrade as well |
I couldn't run the queries at the time, but I did leave all the big rooms I
was in (Matrix HQ, etc) and that improved performance a ton. It's really
snappy now, so I think this is the problem. I too will run the queries and
see what happens.
…On Apr 11, 2017 6:56 AM, "Sean Enck" ***@***.***> wrote:
Acknowledged and thank you. We will upgrade the server once the arch
package is updated to v0.20.0 and I will run those queries now. I mainly
was just seeing a variety of things and wasn't sure if they correlated to
some issues that were closed/opened/etc.
Feel free to close this issue if you'd prefer, otherwise I can update it
once we upgrade as well
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2114 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFtT5tZL95ZN8OnHAe4MyjB93E-mIVDEks5ru2pYgaJpZM4M4uwy>
.
|
@enckse: ok, thanks. do let us know how you get on. |
just as a follow-up, currently (after upgrading to v0.20.0) performance generally seems to have improved a lot |
@enckse, @storrgie was just saying in #riot that the server is still struggling with lots of federation traffic timeouts in the logs. If you can share logs with me and/or @richvdh then we can dig in further to see what it's spasming on. Just to confirm: did you ever run the queries to clean out the extremities from #1760? 0.20 does not fix these retrospectively. |
It's unfair to say "Riot is slow due to federation timeouts" based on the cursory glance I had while doing something else the other day and not drilling down into it - it would be misleading. Next time we have performance issues I will actually take a look but switching contexts (for me) and being like "Yeah this might be slow I guess" and seeing timeouts is not a reasonable analysis I will keep an eye on this |
I'll add that my reason for saying it would be misleading ^ is
So, from the project's perspective the only guidance I would ask for is should I immediately circle back around to run the queries in #1760 or, at this point, see if we run into future problems before doing something? How would you like me to proceed in providing some useful information here? |
From my pov the best approach for getting a useful insight into whether performance issues exist, and if so what's causing them would be to get an instance of prometheus set up and collecting metrics from synapse, as per https://github.com/matrix-org/synapse/blob/master/docs/metrics-howto.rst. As for #1760 - it's certainly worth running the select query to see if it flags up any problems; 0.20 fixes one particular root cause of extremity proliferation, but as Matthew says, won't fix existing instances. And more may continue to accumulate until room state on your server converges with that in the rest of the federation. |
Acknowledge on metrics, may take a bit more work so we will get to it when we can. Re-ran select from #1760 and only had once > 10 (29 - IRC Bridge) which I cleared out. It was worse < 0.20 certainly (we had like 5-10 in the > 10 range). |
Metrics are setup, if there is something I'll want to be looking for please let me know and I can start tracking things |
Happy to open another issue if need but during federation we're also seeing these errors for some servers, if it matters and/or is unrelated and/or is another issue
|
it seems like a bunch of the issues raised here were fixed, and those that remain may no longer be relevant. I don't think there's much more here that is worth spending time on. |
We're noticing some slowness on our synapse server, I'm trying to track it down but uncertain what the best place to start looking is. I'll supply the initial things that sort of "concern" me but may not matter at all.
Given those and the fact that there is a slowness, is there some good way for us to track this down? I would love to chat in the matrix rooms about this but the slowness is a pretty good reason for me to open this as a ticket. I know this is a pretty vague issue but I'd like to track down our performance issue(s) and maybe that would help others as well.
*When I reference slowness I mean rooms take time to load, messages sit before sending, I've stood over someone and watched them send a message and it takes several seconds (30+ in some cases) to send out. This is on the https://riot.im/app/ and riot android client
The text was updated successfully, but these errors were encountered: