-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Pusher stops pushing when it falls behind, with lots of DNSLookupFailed errors #7113
Comments
(it's worth pointing out that the matrix-org-hotfixes branch has this substitution to make it talk over http: synapse/synapse/push/httppusher.py Lines 106 to 109 in bc936b5
|
so here's what's going on
and yes, we're doing DNS lookups despite the fact it's an IP address It appears that we're using Twisted's GAIResolver, which works via |
Some thoughts on what we could do about this:
|
Isn't this embarrassingly parallelisable though? Each request does not depend on any other, so can't we just spin more processes up depending on load? Even if we did do the proposed fixes, how did it get so far behind in the first place? Slow push is also a huge problem. |
This is a bit simplistic. If you have multiple processes doing the pushing, you have to agree a way to make sure that each push gets processed by exactly one pusher. Obviously there are ways to solve that problem, but just spinning up more processes isn't likely to work currently.
Since you ask, #7075 originally (we got two weeks behind). But actually this can happen whenever we have a brief period of a high volume of pushing to do, which the pusher can't keep up with. Once we start failing, the pusher drowns itself in retries and failures.
It is, but I'd rather have pushes happening 10 seconds late than a flurry of pushes making the entire process fall over so that pushes don't happen at all until someone manually restarts the thing. |
this is definitely a thing. One of the things that can lead us into a death spiral is that, if the push gateway starts responding slowly for any reason, we can stack up thousands of concurrent connections, each of which take a long time to service. Indeed, they can take so long to service that the push gateway starts timing out the requests, which leads to us retrying them, which leads to more load, etc etc. |
I believe this is effecting my home server as well. Is there anything I can do to rectify the situation?
I believe my issue is related to this issue |
I'm going to write this down here because this seems to be the most likely place future me will find it.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
In my case, the same error was received when the ip of the push server was added in ip_range_blacklist config. Removing the IP from this list fixed the problem. |
Same error message, completely unrelated problem. Again: if your push gateway never worked at all then it is not due to this bug. |
Our pusher process got behind. A couple of hours after it started to catch up, almost all pushes started failing:
Inspection of logs showed that they were failing with DNSLookupFailed errors:
The text was updated successfully, but these errors were encountered: