Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Rate Limiter: RecursionError: maximum recursion depth exceeded #14480

Closed
richvdh opened this issue Nov 17, 2022 · 9 comments · Fixed by #14812
Closed

Rate Limiter: RecursionError: maximum recursion depth exceeded #14480

richvdh opened this issue Nov 17, 2022 · 9 comments · Fixed by #14812
Assignees
Labels
A-Federation O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Nov 17, 2022

This can't be good (observed in the federation-reader logs on libera.chat):

2022-11-17 10:48:27,448 - twisted - 274 - CRITICAL - sentinel- Unhandled error in Deferred:
2022-11-17 10:48:27,474 - twisted - 274 - CRITICAL - sentinel- 
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 307, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 513, in _async_render
    callback_return = await raw_callback_return
  File "/usr/local/lib/python3.9/site-packages/synapse/federation/transport/server/_base.py", line 342, in new_func
    await d
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 993, in send
    raise result.value
synapse.api.errors.LimitExceededError: 429: Too Many Requests

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/util/ratelimitutils.py", line 249, in ratelimit
    yield ret
  File "/usr/local/lib/python3.9/site-packages/synapse/federation/transport/server/_base.py", line 344, in new_func
    logger.warning(
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1458, in warning
    self._log(WARNING, msg, args, **kwargs)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1577, in _log
    fn, lno, func, sinfo = self.findCaller(stack_info, stacklevel)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1519, in findCaller
    f = currentframe()
  File "/usr/local/lib/python3.9/logging/__init__.py", line 160, in <lambda>
    currentframe = lambda: sys._getframe(3)
RecursionError: maximum recursion depth exceeded while calling a Python object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 307, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 513, in _async_render
    callback_return = await raw_callback_return
  File "/usr/local/lib/python3.9/site-packages/synapse/federation/transport/server/_base.py", line 349, in new_func
    response = await func(
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/synapse/util/ratelimitutils.py", line 251, in ratelimit
    self._on_exit(request_id)
  File "/usr/local/lib/python3.9/site-packages/synapse/util/ratelimitutils.py", line 366, in _on_exit
    logger.debug("Ratelimit(%s) [%s]: Processed req", self.host, id(request_id))
RecursionError: maximum recursion depth exceeded while calling a Python object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/site.py", line 309, in processing
    yield
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 229, in wrapped_async_request_handler
    await h(self, request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 317, in _async_render_wrapper
    self._send_error_response(f, request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 525, in _send_error_response
    return_json_error(f, request, self.hs.config)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 124, in return_json_error
    logger.error(
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1475, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1577, in _log
    fn, lno, func, sinfo = self.findCaller(stack_info, stacklevel)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1519, in findCaller
    f = currentframe()
  File "/usr/local/lib/python3.9/logging/__init__.py", line 160, in <lambda>
    currentframe = lambda: sys._getframe(3)
RecursionError: maximum recursion depth exceeded while calling a Python object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/site.py", line 312, in processing
    logger.exception(
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1481, in exception
    self.error(msg, *args, exc_info=exc_info, **kwargs)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1475, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1577, in _log
    fn, lno, func, sinfo = self.findCaller(stack_info, stacklevel)
  File "/usr/local/lib/python3.9/logging/__init__.py", line 1519, in findCaller
    f = currentframe()
RecursionError: maximum recursion depth exceeded

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/twisted/internet/defer.py", line 1696, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 229, in wrapped_async_request_handler
    await h(self, request)
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/site.py", line 328, in processing
    self._finished_processing()
  File "/usr/local/lib/python3.9/site-packages/synapse/http/site.py", line 428, in _finished_processing
    usage = self.logcontext.get_resource_usage()
  File "/usr/local/lib/python3.9/site-packages/synapse/logging/context.py", line 481, in get_resource_usage
    res = self._resource_usage.copy()
  File "/usr/local/lib/python3.9/site-packages/synapse/logging/context.py", line 136, in copy
    return ContextResourceUsage(copy_from=self)
RecursionError: maximum recursion depth exceeded
@richvdh
Copy link
Member Author

richvdh commented Nov 17, 2022

There's some very weird stuff going on with the ratelimiter on this process.

It seems like lots of requests stack up, and then all suddenly get released at once:

2022-11-17 12:02:10,269 - synapse.federation.transport.server._base - 344 - WARNING - POST-653982- client disconnected before we started processing request
2022-11-17 12:02:10,269 - synapse.federation.transport.server._base - 344 - WARNING - POST-653983- client disconnected before we started processing request
2022-11-17 12:02:10,270 - synapse.federation.transport.server._base - 344 - WARNING - GET-653984- client disconnected before we started processing request
2022-11-17 12:02:10,270 - synapse.federation.transport.server._base - 344 - WARNING - POST-653991- client disconnected before we started processing request
2022-11-17 12:02:10,270 - synapse.federation.transport.server._base - 344 - WARNING - POST-653993- client disconnected before we started processing request
2022-11-17 12:02:10,271 - synapse.federation.transport.server._base - 344 - WARNING - GET-653994- client disconnected before we started processing request
2022-11-17 12:02:10,271 - synapse.federation.transport.server._base - 344 - WARNING - GET-654002- client disconnected before we started processing request
2022-11-17 12:02:10,271 - synapse.federation.transport.server._base - 344 - WARNING - GET-654004- client disconnected before we started processing request
2022-11-17 12:02:10,271 - synapse.federation.transport.server._base - 344 - WARNING - GET-654005- client disconnected before we started processing request
2022-11-17 12:02:10,272 - synapse.federation.transport.server._base - 344 - WARNING - GET-654007- client disconnected before we started processing request
2022-11-17 12:02:10,272 - synapse.federation.transport.server._base - 344 - WARNING - POST-654011- client disconnected before we started processing request
2022-11-17 12:02:10,272 - synapse.federation.transport.server._base - 344 - WARNING - GET-654020- client disconnected before we started processing request
2022-11-17 12:02:10,272 - synapse.federation.transport.server._base - 344 - WARNING - GET-654022- client disconnected before we started processing request
2022-11-17 12:02:10,273 - synapse.federation.transport.server._base - 344 - WARNING - GET-654023- client disconnected before we started processing request
2022-11-17 12:02:10,273 - synapse.federation.transport.server._base - 344 - WARNING - POST-654028- client disconnected before we started processing request
2022-11-17 12:02:10,273 - synapse.federation.transport.server._base - 344 - WARNING - GET-654030- client disconnected before we started processing request
2022-11-17 12:02:10,273 - synapse.federation.transport.server._base - 344 - WARNING - GET-654032- client disconnected before we started processing request
2022-11-17 12:02:10,274 - synapse.federation.transport.server._base - 344 - WARNING - GET-654033- client disconnected before we started processing request
2022-11-17 12:02:10,274 - synapse.federation.transport.server._base - 344 - WARNING - GET-654038- client disconnected before we started processing request
2022-11-17 12:02:10,274 - synapse.federation.transport.server._base - 344 - WARNING - POST-654043- client disconnected before we started processing request
2022-11-17 12:02:10,274 - synapse.federation.transport.server._base - 344 - WARNING - POST-654044- client disconnected before we started processing request
2022-11-17 12:02:10,275 - synapse.federation.transport.server._base - 344 - WARNING - GET-654047- client disconnected before we started processing request
2022-11-17 12:02:10,275 - synapse.federation.transport.server._base - 344 - WARNING - POST-654051- client disconnected before we started processing request
2022-11-17 12:02:10,275 - synapse.federation.transport.server._base - 344 - WARNING - GET-654063- client disconnected before we started processing request
2022-11-17 12:02:10,275 - synapse.federation.transport.server._base - 344 - WARNING - GET-654066- client disconnected before we started processing request
2022-11-17 12:02:10,276 - synapse.federation.transport.server._base - 344 - WARNING - GET-654067- client disconnected before we started processing request
2022-11-17 12:02:10,276 - synapse.federation.transport.server._base - 344 - WARNING - GET-654070- client disconnected before we started processing request
2022-11-17 12:02:10,276 - synapse.federation.transport.server._base - 344 - WARNING - GET-654079- client disconnected before we started processing request
2022-11-17 12:02:10,277 - synapse.federation.transport.server._base - 344 - WARNING - POST-654086- client disconnected before we started processing request
2022-11-17 12:02:10,277 - synapse.federation.transport.server._base - 344 - WARNING - POST-654087- client disconnected before we started processing request
2022-11-17 12:02:10,277 - synapse.federation.transport.server._base - 344 - WARNING - GET-654089- client disconnected before we started processing request
2022-11-17 12:02:10,277 - synapse.federation.transport.server._base - 344 - WARNING - GET-654100- client disconnected before we started processing request
2022-11-17 12:02:10,278 - synapse.federation.transport.server._base - 344 - WARNING - GET-654102- client disconnected before we started processing request
2022-11-17 12:02:10,278 - synapse.federation.transport.server._base - 344 - WARNING - GET-654103- client disconnected before we started processing request
2022-11-17 12:02:10,278 - synapse.federation.transport.server._base - 344 - WARNING - POST-654109- client disconnected before we started processing request
2022-11-17 12:02:10,278 - synapse.federation.transport.server._base - 344 - WARNING - GET-654110- client disconnected before we started processing request
2022-11-17 12:02:10,279 - synapse.federation.transport.server._base - 344 - WARNING - GET-654114- client disconnected before we started processing request
2022-11-17 12:02:10,279 - synapse.federation.transport.server._base - 344 - WARNING - GET-654115- client disconnected before we started processing request
2022-11-17 12:02:10,279 - synapse.federation.transport.server._base - 344 - WARNING - POST-654117- client disconnected before we started processing request
2022-11-17 12:02:10,279 - synapse.federation.transport.server._base - 344 - WARNING - POST-654118- client disconnected before we started processing request
2022-11-17 12:02:10,280 - synapse.federation.transport.server._base - 344 - WARNING - GET-654120- client disconnected before we started processing request
2022-11-17 12:02:10,280 - synapse.federation.transport.server._base - 344 - WARNING - GET-654123- client disconnected before we started processing request
2022-11-17 12:02:10,280 - synapse.federation.transport.server._base - 344 - WARNING - GET-654126- client disconnected before we started processing request
2022-11-17 12:02:10,280 - synapse.federation.transport.server._base - 344 - WARNING - POST-654128- client disconnected before we started processing request
2022-11-17 12:02:10,281 - synapse.federation.transport.server._base - 344 - WARNING - GET-654129- client disconnected before we started processing request
2022-11-17 12:02:10,281 - synapse.federation.transport.server._base - 344 - WARNING - POST-654131- client disconnected before we started processing request

Some requests are going missing altogether:

$ grep POST-653322 2022-11-17-synapse-federation-reader-1.log
2022-11-17 11:09:02,062 - synapse.util.ratelimitutils - 299 - INFO - POST-653322- Ratelimiter(matrix.org): queueing request (queue now 47 items)
2022-11-17 11:10:01,936 - synapse.http.site - 372 - INFO - POST-653322- Connection from client lost before response was sent

(note that it is never responded to)

Also, er:

image

(that's actually a different worker. federation-reader-1, from which the above logs are taken, also shows an increase, though less extreme)

@turt2live
Copy link
Member

re: the graph: a few servers in the wild have been reporting backlogs in matrix.org->them traffic (their server falls behind) - is it possible something happened in the wild, or synapse changed something to do with transaction handling?

@DMRobertson DMRobertson self-assigned this Nov 22, 2022
@DMRobertson DMRobertson added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Federation S-Major Major functionality / product severely impaired, no satisfactory workaround. O-Occasional Affects or can be seen by some users regularly or most users rarely labels Nov 22, 2022
@progval
Copy link
Contributor

progval commented Nov 25, 2022

Sorry if this is off-topic, but could https://status.matrix.org/ be updated to reflect this issue?

@DMRobertson
Copy link
Contributor

Sorry if this is off-topic, but could status.matrix.org be updated to reflect this issue?

I hadn't realised we also tracked the status of the libera bridge there. Status page updated.

@DMRobertson
Copy link
Contributor

Libera seems happier these days. I'll leave this issue open in case we see this failure mode again.

@DMRobertson DMRobertson removed their assignment Dec 1, 2022
@MadLittleMods MadLittleMods changed the title RecursionError: maximum recursion depth exceeded Rate Limiter: RecursionError: maximum recursion depth exceeded Dec 13, 2022
@squahtx
Copy link
Contributor

squahtx commented Dec 20, 2022

This is basically #2532 but for the per-host rate limiters.

The federation rate limiter is per-host and its usage looks like

with ratelimiter.ratelimit(origin) as d:
await d
if request._disconnected:
logger.warning(
"client disconnected before we started processing "
"request"
)
return None
response = await func(
origin, content, request.args, *args, **kwargs
)
else:

await d blocks until the request is allowed to start. When the requesting homeserver gives up before the rate limiter allows a request to start, client disconnected before we started processing request is printed, and the rate limiter context manager is exited synchronously.

When the rate limiter context manager is __exit__ed, the next request is allowed to continue execution out of await d with the previous request still above it on the stack.

def _on_exit(self, request_id: object) -> None:
logger.debug("Ratelimit(%s) [%s]: Processed req", self.host, id(request_id))
self.current_processing.discard(request_id)
try:
# start processing the next item on the queue.
_, deferred = self.ready_request_queue.popitem(last=False)
with PreserveLoggingContext():
deferred.callback(None)
except KeyError:
pass

If the requesting homeserver has also given up on the next request, the next request after that is resumed, and so on. Each level of abandoned request adds 17 stack frames*. At around 50 abandoned requests, that comes out to 850 stack frames, which is close enough to Python's default limit that RecursionErrors are likely.

* determined through experiment

Experiment
  1. Change the warning to log stack depth.
                                logger.warning(
                                    "client disconnected before we started processing "
                                    "request [stack depth: %d]",
                                    len(inspect.stack(0))
                                )
  1. Disable sleeping in the rate limiter and set concurrency to 1.
rc_federation:
  window_size: 1000
  sleep_limit: 10000
  sleep_delay: 500
  reject_limit: 50
  concurrent: 1
  1. Use scripts-dev/federation_client.py to sign a request
scripts-dev/federation_client.py --destination "test.homeserver" "/_matrix/federation/v1/query/profile?user_id=%40test%3Atest.homeserver&field=displayname"
  1. Run lots of requests and disconnect before receiving a response
#!/usr/bin/sh

while true; do
    for i in `seq 100`; do
        curl "https://test.homeserver/_matrix/federation/v1/query/profile?user_id=%40test%3Atest.homeserver&field=displayname" -H 'Authorization: X-Matrix origin=...' &
    done

    pkill curl
done
  1. Observe log output.
2022-12-20 12:39:24,361 - synapse.federation.transport.server._base - 345 - WARNING - GET-2322 - client disconnected before we started processing request [stack depth: 56]
2022-12-20 12:39:24,363 - synapse.federation.transport.server._base - 345 - WARNING - GET-2323 - client disconnected before we started processing request [stack depth: 73]
2022-12-20 12:39:24,365 - synapse.federation.transport.server._base - 345 - WARNING - GET-2324 - client disconnected before we started processing request [stack depth: 90]
2022-12-20 12:39:24,367 - synapse.federation.transport.server._base - 345 - WARNING - GET-2325 - client disconnected before we started processing request [stack depth: 107]
2022-12-20 12:39:24,369 - synapse.federation.transport.server._base - 345 - WARNING - GET-2326 - client disconnected before we started processing request [stack depth: 124]
2022-12-20 12:39:24,371 - synapse.federation.transport.server._base - 345 - WARNING - GET-2327 - client disconnected before we started processing request [stack depth: 141]
2022-12-20 12:39:24,374 - synapse.federation.transport.server._base - 345 - WARNING - GET-2328 - client disconnected before we started processing request [stack depth: 158]
2022-12-20 12:39:24,377 - synapse.federation.transport.server._base - 345 - WARNING - GET-2329 - client disconnected before we started processing request [stack depth: 175]
2022-12-20 12:39:24,385 - synapse.federation.transport.server._base - 345 - WARNING - GET-2330 - client disconnected before we started processing request [stack depth: 192]
2022-12-20 12:39:24,390 - synapse.federation.transport.server._base - 345 - WARNING - GET-2331 - client disconnected before we started processing request [stack depth: 209]
2022-12-20 12:39:24,394 - synapse.federation.transport.server._base - 345 - WARNING - GET-2332 - client disconnected before we started processing request [stack depth: 226]
2022-12-20 12:39:24,398 - synapse.federation.transport.server._base - 345 - WARNING - GET-2333 - client disconnected before we started processing request [stack depth: 243]
2022-12-20 12:39:24,402 - synapse.federation.transport.server._base - 345 - WARNING - GET-2334 - client disconnected before we started processing request [stack depth: 260]
2022-12-20 12:39:24,407 - synapse.federation.transport.server._base - 345 - WARNING - GET-2335 - client disconnected before we started processing request [stack depth: 277]
2022-12-20 12:39:24,414 - synapse.federation.transport.server._base - 345 - WARNING - GET-2336 - client disconnected before we started processing request [stack depth: 294]
2022-12-20 12:39:24,418 - synapse.federation.transport.server._base - 345 - WARNING - GET-2337 - client disconnected before we started processing request [stack depth: 311]
2022-12-20 12:39:24,421 - synapse.federation.transport.server._base - 345 - WARNING - GET-2338 - client disconnected before we started processing request [stack depth: 328]
2022-12-20 12:39:24,424 - synapse.federation.transport.server._base - 345 - WARNING - GET-2339 - client disconnected before we started processing request [stack depth: 345]
2022-12-20 12:39:24,427 - synapse.federation.transport.server._base - 345 - WARNING - GET-2340 - client disconnected before we started processing request [stack depth: 362]
2022-12-20 12:39:24,431 - synapse.federation.transport.server._base - 345 - WARNING - GET-2341 - client disconnected before we started processing request [stack depth: 379]
2022-12-20 12:39:24,435 - synapse.federation.transport.server._base - 345 - WARNING - GET-2342 - client disconnected before we started processing request [stack depth: 396]
2022-12-20 12:39:24,440 - synapse.federation.transport.server._base - 345 - WARNING - GET-2343 - client disconnected before we started processing request [stack depth: 413]
2022-12-20 12:39:24,447 - synapse.federation.transport.server._base - 345 - WARNING - GET-2344 - client disconnected before we started processing request [stack depth: 430]
2022-12-20 12:39:24,451 - synapse.federation.transport.server._base - 345 - WARNING - GET-2345 - client disconnected before we started processing request [stack depth: 447]
2022-12-20 12:39:24,457 - synapse.federation.transport.server._base - 345 - WARNING - GET-2346 - client disconnected before we started processing request [stack depth: 464]
2022-12-20 12:39:24,465 - synapse.federation.transport.server._base - 345 - WARNING - GET-2347 - client disconnected before we started processing request [stack depth: 481]
2022-12-20 12:39:24,473 - synapse.federation.transport.server._base - 345 - WARNING - GET-2348 - client disconnected before we started processing request [stack depth: 498]
2022-12-20 12:39:24,482 - synapse.federation.transport.server._base - 345 - WARNING - GET-2349 - client disconnected before we started processing request [stack depth: 515]
2022-12-20 12:39:24,491 - synapse.federation.transport.server._base - 345 - WARNING - GET-2350 - client disconnected before we started processing request [stack depth: 532]
2022-12-20 12:39:24,500 - synapse.federation.transport.server._base - 345 - WARNING - GET-2351 - client disconnected before we started processing request [stack depth: 549]
2022-12-20 12:39:24,509 - synapse.federation.transport.server._base - 345 - WARNING - GET-2352 - client disconnected before we started processing request [stack depth: 566]
2022-12-20 12:39:24,520 - synapse.federation.transport.server._base - 345 - WARNING - GET-2353 - client disconnected before we started processing request [stack depth: 583]
2022-12-20 12:39:24,538 - synapse.federation.transport.server._base - 345 - WARNING - GET-2354 - client disconnected before we started processing request [stack depth: 600]
2022-12-20 12:39:24,552 - synapse.federation.transport.server._base - 345 - WARNING - GET-2355 - client disconnected before we started processing request [stack depth: 617]
2022-12-20 12:39:24,588 - synapse.federation.transport.server._base - 345 - WARNING - GET-2356 - client disconnected before we started processing request [stack depth: 634]
2022-12-20 12:39:24,610 - synapse.federation.transport.server._base - 345 - WARNING - GET-2357 - client disconnected before we started processing request [stack depth: 651]
2022-12-20 12:39:24,622 - synapse.federation.transport.server._base - 345 - WARNING - GET-2358 - client disconnected before we started processing request [stack depth: 668]
2022-12-20 12:39:24,634 - synapse.federation.transport.server._base - 345 - WARNING - GET-2359 - client disconnected before we started processing request [stack depth: 685]
2022-12-20 12:39:24,647 - synapse.federation.transport.server._base - 345 - WARNING - GET-2360 - client disconnected before we started processing request [stack depth: 702]
2022-12-20 12:39:24,659 - synapse.federation.transport.server._base - 345 - WARNING - GET-2361 - client disconnected before we started processing request [stack depth: 719]
2022-12-20 12:39:24,672 - synapse.federation.transport.server._base - 345 - WARNING - GET-2362 - client disconnected before we started processing request [stack depth: 736]
2022-12-20 12:39:24,685 - synapse.federation.transport.server._base - 345 - WARNING - GET-2363 - client disconnected before we started processing request [stack depth: 753]
2022-12-20 12:39:24,698 - synapse.federation.transport.server._base - 345 - WARNING - GET-2364 - client disconnected before we started processing request [stack depth: 770]
2022-12-20 12:39:24,708 - synapse.federation.transport.server._base - 345 - WARNING - GET-2365 - client disconnected before we started processing request [stack depth: 787]
2022-12-20 12:39:24,721 - synapse.federation.transport.server._base - 345 - WARNING - GET-2366 - client disconnected before we started processing request [stack depth: 804]
2022-12-20 12:39:24,733 - synapse.federation.transport.server._base - 345 - WARNING - GET-2367 - client disconnected before we started processing request [stack depth: 821]
2022-12-20 12:39:24,743 - synapse.federation.transport.server._base - 345 - WARNING - GET-2368 - client disconnected before we started processing request [stack depth: 838]
2022-12-20 12:39:24,754 - synapse.federation.transport.server._base - 345 - WARNING - GET-2372 - client disconnected before we started processing request [stack depth: 871]

@squahtx
Copy link
Contributor

squahtx commented Dec 20, 2022

In short, this failure mode happens when close to 50 (all) of the requests in the rate limiter queue have been abandoned.

@H-Shay
Copy link
Contributor

H-Shay commented Jan 9, 2023

When the rate limiter context manager is __exit__ed, the next request is allowed to continue execution out of await d with the previous request still above it on the stack.

Is there a way to clear the abandoned requests from the stack?

@squahtx
Copy link
Contributor

squahtx commented Jan 9, 2023

Is there a way to clear the abandoned requests from the stack?

Yep, we can either

  • do a 0-duration sleep, just before the with PreserveLoggingContext(): ...
  • or hs.get_reactor().callFromThread() the with PreserveLoggingContext(): ...

both of which resume the next request later.

The first option adds another reactor tick to the response time, which I don't like.
The second option needs a tiny bit of refactoring to thread hs or the reactor through into PerHostRateLimiter.

squahtx pushed a commit that referenced this issue Jan 10, 2023
When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.

A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.

Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.

Fixes #14480.

Signed-off-by: Sean Quah <[email protected]>
@squahtx squahtx self-assigned this Jan 11, 2023
squahtx added a commit that referenced this issue Jan 13, 2023
…ts (#14812)

When there are many synchronous requests waiting on a
`_PerHostRatelimiter`, each request will be started recursively just
after the previous request has completed. Under the right conditions,
this leads to stack exhaustion.

A common way for requests to become synchronous is when the remote
client disconnects early, because the homeserver is overloaded and slow
to respond.

Avoid stack exhaustion under these conditions by deferring subsequent
requests until the next reactor tick.

Fixes #14480.

Signed-off-by: Sean Quah <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Federation O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants