-
-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch pipeline trace to DiagnosticSource #1169
Conversation
…ns and conflicting styles.
… DiagnosticSource.
Remove IsTopLevelOperation and ITracingIdentifier
A few extra things did sneak in here:
I didn't run the perf metrics against it to see if things changed, though I can't imagine it's significant enough to make a huge impact. I've been wrong. One thing that is different is that the diagnostic source is shared from the container down. You attach at the container level, not at any arbitrary lifetime scope level. I'm not sure if this reduction in functionality matters too much, but it's important to be aware - that's different than the pipeline tracer approach from before. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally looks great, I can see the benefits of using the DiagnosticSource system.
Unfortunately, there is a performance regression from v6. I've run the benchmarks for the DeepGraph and ChildScope against v6 vs issue788:
DeepGraphResolve - ~1.2μs / 13% slower
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
v6 | 9.513 μs | 0.0358 μs | 0.0299 μs | 2.0599 | 0.0458 | - | 16.92 KB |
issue-788 | 10.78 μs | 0.092 μs | 0.087 μs | 2.0599 | 0.0458 | - | 16.91 KB |
ChildScopeResolve - ~2μs / 7% slower
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
v6 | 30.47 μs | 0.176 μs | 0.156 μs | 3.9063 | 1.9531 | - | 32.09 KB |
issue-788 | 32.70 μs | 0.456 μs | 0.427 μs | 4.0894 | 2.0142 | - | 33.88 KB |
There's some thoughts and proposed changes in the individual comments, including on the performance, let me know what you think!
// Optimise the path depending on whether a tracer is attached. | ||
if (ctxt.TracingEnabled) | ||
// Optimise the path depending on whether diagnostics are enabled. | ||
if (ctxt.DiagnosticSource.MiddlewareDiagnosticsEnabled()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line may be at least partly responsible for the performance regression when there are no listeners; this line of code is possibly the hottest path in the entire pipeline.
In the original version of tracing I had to add a boolean cache of this 'Enabled' flag and store it in the Request Context. Can we do that again, and cache the result of this method when the Request begins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tough because the thing being checked is looking to see if any subscribers have this enabled, right? And there could be multiple subscribers, which could be added at any time.
I could simplify the checks - check for literally the exact event being subscribed instead of one of a logical set. Right now I check for start or success or fail to determine if any subscribers exist, but in the majority case I'm guessing that's three "false" checks every time. I could switch it to check only for the specific event being issued. It'll change the flow a bit since you could subscribe to a failure event without subscribing to start or success, but maybe that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably justified for performance reasons to say "If you change diagnostic subscriptions to middleware events, that change may not take effect until the next resolve operation.
Especially since the 99% use case is going to be turning it on once, for all events, and never changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So evaluate all the subscriptions at the operation level and pass that all down through the stack? I think that might add some complexity. Let me see about making the checks more precise and see where that leaves us. I also have a not-totally-formed idea about using flags internally to do bitwise checks and speed up comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I was just thinking that if someone changes the set of suscribers or subscribed events such that our checking method returns a different value, we...might not raise some middleware events during that transition. But then if someone did change that set during a request, they would have missed related operation and request messages anyway, which could introduce different problems.
Is this sort of thing what the DiagnosticSource Activity
functionality is for?
Fundamentally it's whatever gets us the perf saving with no subscribers, so whatever approach works.
Are we anticipating encouraging the use of diagnostics in production? If so, do we need to bench the diagnostic source capture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using Activity but it didn't seem worth the allocations.
I'm less concerned about changing subscriptions in the middle of the request than I am about trying to be too smart in the caching mechanism such that we stop responding correctly to observables or it becomes hard to add a new event in here. I could see wanting diagnostic events for all the regular lifetime events, too.
I don't think diagnostics in prod would be recommended. More like "turn this on if you find a problem and turn it back off" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized now that I was basically checking enabled twice for any given write.
First time in if(source.MiddlewareDiagnosticsEnabled())
and later in the actual call to write the event. Same for operation and request.
I've removed the aggregate checks and just leave it to the call to write the event. This should remove some of the overhead. Let's see where this puts us perf-wise and I can try to tweak some more if it looks like it still needs help.
I'm trying to not cart too much additional context around if possible, and while a single Boolean isn't that big a deal, the orchestration of keeping the cache up to date with the actual settings seems like it could be a challenge long-term, similar to how container rebuilding got to be a bit sticky and unpredictable.
I can see where there is a use case for success or failure instrumentation, like perf counters or something, to track stats. That'd mean subscribing to just success/failure events but maybe not start events. The ability to have that granularity seems interesting to me, rather than just "all on" or "all off" for a whole request.
I can also see that we may want to support (at some point) the ability to conditionally determine if you're 'enabled' based on the data itself which wouldn't be possible if we try to pre-cache things too much. Like, "I want to see events that involve resolving a certain type" to diagnose an issue would be a use case for being enabled based on certain data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed an update to have the event source DiagnosticListener
check for IsEnabled()
before checking if the specific named subscription is enabled. Benchmark the crap out if it, it should still basically boil down to a check for subscriptions != null before doing anything, which should be just as fast as what we had before. Granted, it's going to do that null check for each event rather than just one time, but that still should be pretty fast.
src/Autofac/Core/Resolving/Pipeline/ResolveRequestContextBase.cs
Outdated
Show resolved
Hide resolved
First push today fixes the |
…cription handling mid-operation or on specific lifetime scopes.
A few more pushes address
|
Pushed an update to the way things get checked for subscription enablement to try to speed that up. I think it's ready for another review and benchmarking. |
Just ran the Benchmarks on Linux, hope that's fine.
Increase around ~4%
Increase around ~0.9% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the functionality changes look good, API surface on the tracer is nice and compact now.
Perf-wise, some improvement, although not back to v6 levels. It's probably worth pointing out though that we are still a slight gain on develop even after these changes (I've included results from develop here).
DeepGraphResolve - ~0.9μs / 8% slower (before was ~1.2μs / 13% slower)
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
develop | 10.82 us | 0.056 us | 0.052 us | 2.2430 | 0.0458 | - | 18.45 KB |
v6 | 9.569 μs | 0.0553 μs | 0.0517 μs | 2.0599 | 0.0458 | - | 16.92 KB |
issue-788 | 10.42 μs | 0.080 μs | 0.071 μs | 2.0599 | 0.0458 | - | 16.91 KB |
ChildScopeResolve - ~1.5μs / 5% slower (before was ~2μs / 7% slower)
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
develop | 37.33 us | 0.229 us | 0.214 us | 6.4697 | 0.4272 | - | 52.86 KB |
v6 | 30.92 μs | 0.152 μs | 0.142 μs | 3.9063 | 1.9531 | - | 32.09 KB |
issue-788 | 32.48 μs | 0.141 μs | 0.132 μs | 4.0894 | 2.0142 | - | 33.88 KB |
Regardless, I'm going to approve the changes, since we're down to refining it now. If you wanted to merge and address any remaining perf investigation separately (or under #1127) that's fine with me.
🍻 |
I appreciate the patience on this - it was definitely a long and winding road to get here, but I think it's good overall. Now let me see about getting that DOT graph tracer updated and PR'd based on these changes. 🎉 |
This PR is part of #788 and is the first step in getting the Autofac.Diagnostics.DotGraph package out.
It updates the diagnostic mechanism to use
System.Diagnostics.DiagnosticSource
to emit events and moves the diagnostics classes into anAutofac.Diagnostics
namespace.Subscribing to events in this system means creating a
System.Diagnostics.DiagnosticListener
and attaching it to the container'sDiagnosticSource
. Internally this follows an observer pattern that receives a stream of events, keyed by string IDs, with which the listener may do... whatever they want. There are convenience methods for subscribing to the container's diagnostics, like:The default diagnostic tracer was updated to be a diagnostic listener in this format. Rather than paste an example in here, check that out.
I tried to follow patterns I'd seen in areas like ASP.NET Core with respect to things like:
That latter one was sort of interesting. I wondered why it was internal but realized a couple of things:
The history is pretty messy because I started with the DOT graph tracer in this PR and then it moved out to a separate package. I tried to squash/rebase it but that, too, got kinda dicey since I'd been pulling changes from the v6 branch and... yeah.