Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

log_kv within _handle_new_device_update_async is not associated with an opentracing span, causing sentry logspam and large memory usage #12552

Closed
DMRobertson opened this issue Apr 26, 2022 · 5 comments
Labels
A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release X-Release-Blocker Must be resolved before making a release

Comments

@DMRobertson
Copy link
Contributor

This caused rapid memory usage rises on matrix.org.

We suspect it's only noticable if you're using sentry to report errors.

2022-04-26 13:34:43,569 - synapse.logging.opentracing - 347 - ERROR - _handle_new_device_update_async-0 - There was no active span when trying to log. Did you forget to start one or did a context slip?
Stack (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/synapse/src/synapse/app/homeserver.py", line 460, in <module>
    main()
  File "/home/synapse/src/synapse/app/homeserver.py", line 456, in main
    run(hs)
  File "/home/synapse/src/synapse/app/homeserver.py", line 442, in run
    logger=logger,
  File "/home/synapse/src/synapse/app/_base.py", line 180, in start_reactor
    run()
  File "/home/synapse/src/synapse/app/_base.py", line 162, in run
    run_command()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 1315, in run
    self.mainLoop()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 1325, in mainLoop
    reactorBaseSelf.runUntilCurrent()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 964, in runUntilCurrent
    f(*a, **kw)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 661, in callback
    self._startRunCallbacks(result)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
    self._runCallbacks()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result, *args, **kwargs
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1750, in gotResult
    current_context.run(_inlineCallbacks, r, gen, status)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1663, in _inlineCallbacks
    status.deferred.callback(getattr(e, "value", None))
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 661, in callback
    self._startRunCallbacks(result)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
    self._runCallbacks()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result, *args, **kwargs
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 661, in callback
    self._startRunCallbacks(result)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 763, in _startRunCallbacks
    self._runCallbacks()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 858, in _runCallbacks
    current.result, *args, **kwargs
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1750, in gotResult
    current_context.run(_inlineCallbacks, r, gen, status)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/home/synapse/src/synapse/metrics/background_process_metrics.py", line 243, in run
    return await func(*args, **kwargs)
  File "/home/synapse/src/synapse/handlers/device.py", line 687, in _handle_new_device_update_async
    #    {"message": "sent device update to host", "host": host}
  File "/home/synapse/src/synapse/logging/opentracing.py", line 347, in ensure_active_span_inner_2
    stack_info=True,

@DMRobertson
Copy link
Contributor Author

DMRobertson commented Apr 26, 2022

Related: #12321 #12365. In particular we suspect the latter is responsible.

@DMRobertson DMRobertson added A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release X-Release-Blocker Must be resolved before making a release labels Apr 26, 2022
@DMRobertson
Copy link
Contributor Author

@erikjohnston says:

select min(stream_id), max(stream_id) - min(stream_id), count(*) from matrix.device_lists_changes_in_room where NOT converted_to_destinations is the query I'm using to track progress

DMRobertson pushed a commit that referenced this issue Apr 26, 2022
@anoadragon453
Copy link
Member

What's happening here is that log_kv is trying to log to the active OpenTracing span, but there was no active span. Hence the error. Note spans are started automatically for requests, but we don't do it automatically for background jobs.

OpenTracing spans are propagated by log contexts (which is how they're somewhat closely related). The shown error can happen if you didn't start a span, or the log context slipped. In this case it's the former.

We can either:

  1. Create a big span for this whole background job, but long-lived spans are against the philosophy of OpenTracing, and we'd only report it once it's finished (and thus you get a disconnected parent).

  2. Otherwise we can do a span per loop, but the issue there is that the log_kv is logging the hosts we're sending a device update to. We'd like that to be logged against a span that includes the device update itself - otherwise the span is pretty much useless.

    So what we could do is start a span based on the opentracing_context given back to us by get_uncoverted_output_room_pokes. Though... now we've noticed that a type error has caused opentracing_context to be broken - so that needs to be fixed first.

@DMRobertson
Copy link
Contributor Author

DMRobertson commented Apr 26, 2022

There is also a problem propagating opentracing spans between whitelisted homeservers. Details of the underlying cause here: #12485 (comment)

(Oh, Andrew says just as much above.)

#12554 mitigates the logging pain.
#12555 fixes the opentracing_context not being parsed.

I'm not sure if we ought to keep this issue open to track fixing the opencontext span?

@DMRobertson DMRobertson changed the title log_kv from _handle_new_device_update_async missing log context and causing log spam log_kv within _handle_new_device_update_async is not associated with an opentracing span Apr 26, 2022
@DMRobertson DMRobertson changed the title log_kv within _handle_new_device_update_async is not associated with an opentracing span log_kv within _handle_new_device_update_async is not associated with an opentracing span, causing sentry logspam and large memory usage Apr 26, 2022
@DMRobertson
Copy link
Contributor Author

I'm not sure if we ought to keep this issue open to track fixing the opencontext span?

After discussion, will raise a new issue to track fixing the logging.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Regression Something broke which worked on a previous release X-Release-Blocker Must be resolved before making a release
Projects
None yet
Development

No branches or pull requests

2 participants