Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryAnalysisLayer holds spans open through tokio::sync::Mutex when tracing feature enabled #4074

Closed
xuorig opened this issue Oct 22, 2023 · 4 comments

Comments

@xuorig
Copy link
Contributor

xuorig commented Oct 22, 2023

A tokio::sync::Mutex with the tracing feature enabled holds a tracing::Span.

This span holds a "reference" (in the sense that it increments the ref count, through clone_span) to a parent span. In the router, an Arc<Mutex<Compiler>> is stored in QueryAnalysisLayer's cache, which is only cleared on router reload. When the mutex is created, its span increments the ref count of the router span.

This leads to the router span never exiting since it is never fully dropped (refs > 1).

I am really not sure about the fix here. An alternative is Mutex::const_new which not create a span. Maybe a std::sync::Mutex is fine here too. I was surprised by this behavior. Caching mutexes across requests within a request trace seems dangerous when the tracing feature is enabled.

Note that this only affects the first request for a particular QueryAnalysisKey.

@garypen
Copy link
Contributor

garypen commented Oct 22, 2023

I have been investigating memory issues in the router over the last week and I think there may be some kind of issue related to tracing or telemetry which I haven't pinned down yet. I was excited when I saw this, but then realised that we don't enable the tracing feature in tokio (unless we are doing a special build for tokio console).

I agree that this seems like an under-notified feature (e.g.: It would be good to have the doc comment on const_new on the new fn as well to really call attention to it), but I don't think this is an issue for the default router source build.

Using const_new would be fine if you are building the router from source with tokio's tracing feature enabled, but it would limit the observability of that mutex in tokio console. std Mutex (or parking_lot Mutex) can't be used here since it definitely crosses over a .await point.

@xuorig
Copy link
Contributor Author

xuorig commented Oct 22, 2023

Yeah the alternatives don't seem to work here, besides for disabling console : tracing for now.

I'll bring this up to Tokio folks, I'm not sure how this use case can even work.

@xuorig
Copy link
Contributor Author

xuorig commented Oct 22, 2023

Testing with const_new, it indeed fixes the mutex issue, but another one pops up, with a hyper IdleTask trace holding the router span open. (again through tokio tasks being instrumented using tracing feature).

I need to think about this more but it seems like any traces that can potentially be held open across multiple requests would cause this issue 🤔

@xuorig
Copy link
Contributor Author

xuorig commented Oct 22, 2023

I'm going to close this as this is more of a general tracing thing, but would love some thoughts on this general problem if anyone has any 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants