-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenTelemetry Instrumentation #124
Conversation
return builder; | ||
} | ||
|
||
internal class Constants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to agree on constant values. These probably should not change after release, b/c they could break monitoring rules & alerts users have set up.
@Cooksauce thank you for the contribution! |
@Cooksauce btw if you're not already on https://slack.nats.io |
A few updates:
|
Rebased this on top of the
I may be missing understanding you here, but this is how the trace is currently propagated in the implementation. Specifically, here after rebase and changes. Are there other headers you're thinking to add?
Aspire OTEL is a direct implementation of the standard, so this plugs right with the standard OpenTelemetry exporter. Aspire just sets the
I think OpenTelemetry metrics export is a good idea. That said, the implementation for logging, tracing, and metrics is all pretty orthogonal in dotnet, so my opinion would be that be a separate, follow-up PR. Aspire Trace View![]() |
I was referring to passing activity down in headers instead of We also have an effort to minimize
yep, that makes sense.
excellent! good to know. just wanted to bring that to your attention. the PR is here dotnet/aspire#1175 |
@Cooksauce I also noticed @stebet is working on rabbitmq implementation rabbitmq/rabbitmq-dotnet-client#1261 |
Yeah, getting close to being done, although there's some questions around Baggage propagation since the baggage functionality for .NET's |
Just been pointed here, love the work... Noticed a point on dependencies. We're not planning on taking anything in Contrib for .NET unless there's a big reason to in terms of extra functionality needed. We advise that you create a .OpenTelemetry package in your own namespace for that. Be careful taking a dependency on the semantic conventions package, we're introducing some change there are some point. Not to say you shouldn't, it just might be better not to take that dependency if you don't need to. I'm more than happy to help with any questions you have around the spec and how to implement in the best way like I did with a few other OSS projects like Rabbit. |
Had a quick look... couple of small tweaks you can make... check ActivitySource.HasListeners instead of getting the Activity and checking for null. Also, the comment around propagation is yes, it should propagate the Activity.Current properties, regardless of whether your instrumentation is active. Finally, when creating/starting activities, add as many tags as you can inside the create method, it better perf (upto 8 and it uses a different faster approach) and it also provides people with the ability to Sample better as the parameters in the create/start are all you get access to. I do open office hours, feel free to book some time if I can help move it along! |
Thank you @martinjt we're eager to learn from you and we'd love to take you up on your kind offer. I'm still learning, @Cooksauce is our expert here. btw I noticed we're a bit behind. I'll get the branch up to date as soon as I can. Edit: just merged main. should be up to date now. |
Just created |
Reworked this quite a bit to address the discussion here, and to make it play more nicely with jetstream & the higher level features built on top of that. I think it's ready for a real review now with deeper scrutiny.
@martinjt I'm curious on this one, is there a reason you mention this other than the allocation of parameters such as
@martinjt Thanks for this tip! The source definitely looks more optimized. @mtmk Decisions to make before moving forward:
|
# Conflicts: # NATS.Client.sln
I think we can change the NatsMsg.ctor to take in a concrete NatsConnection since we do that pretty much with everything else (e.g. contexts, NatsJSMsg). For unit testing purposes we have NatsMsg-INatsMsg cast as an option. I don't think anyone will be passing interfaces to NatsMsg for unit testing if so I think it'd be reasonable to ask them to convert to accepting INatsMsg. So I'd say yes we can change NatsMsg to take NatsConnection (rather than INatsConnection) unless someone hollers otherwise cc @caleblloyd @sspates @to11mtm @paulomf
No idea TBH. If it's good enough for Rabbit good for us too I'm guessing 😅 |
Just my 2c, I think it's fine to leave it for another PR. The cooperative story behind OTEL's Baggage.* and Activity.Baggage.* APIs is vague at best, and frankly I could never get them to work reliably in tandem. Microsoft's legacy uses Trace-Context as a carrier, and OTEL use "baggage." Hopefully Microsoft will soon clear up the confusion caused by hiding their OTEL support inside of legacy APIs. |
Thanks for the tag here. Thanks to a tangentially related conversation, I actually have two important-ish points/questions. 1: Can we get benchmarks for this before/after? To be clear we don't have to have telemetry actually 'wired up', moreso want to make sure perf impact is minimized for minimal cases. 2: Agree OTEL could be a separate PR if folks really need it (AFAIR it's the 'old thing' anyway) but no strong feelings either way. |
I'm not sure what you mean by OTEL being the old way and leaving that to
another PR?
…On Tue, 30 Jan 2024, 04:37 Drew, ***@***.***> wrote:
@mtmk <https://github.com/mtmk> Decisions to make before moving forward:
1. Are the Nats folks open to tweaking NatsMsg<T> to expose
NatsConnection so that these couple things can be addressed?
https://github.com/Cooksauce/nats.net.v2/blob/611093cf416d6fde1360d94191e806b2360a0843/src/NATS.Client.Core/NatsMsg.cs#L174
I think we can change the NatsMsg.ctor to take in a concrete
NatsConnection since we do that pretty much with everything else (e.g.
contexts, NatsJSMsg). For unit testing purposes we have NatsMsg-INatsMsg
cast as an option. I don't think anyone will be passing interfaces to
NatsMsg for unit testing if so I think it'd be reasonable to ask them to
convert to accepting INatsMsg. So I'd say yes we can change NatsMsg to take
NatsConnection (rather than INatsConnection) unless someone hollers
otherwise cc @caleblloyd <https://github.com/caleblloyd> @sspates
<https://github.com/sspates> @to11mtm <https://github.com/to11mtm>
@paulomf <https://github.com/paulomf>
1. Are we okay leaving out OTEL Baggage support similar to RabbitMQ? -
github.com/rabbitmq/rabbitmq-dotnet-client/pull/1261#issuecomment-1875667625
No idea TBH. If it's good enough for Rabbit good for us too I'm guessing
😅 @stebet <https://github.com/stebet> do you have an objection?
Thanks for the tag here. Thanks to a tangentially related conversation, I
actually have two important-ish points/questions.
1: Can we get benchmarks for this before/after? To be clear we don't have
to have telemetry actually 'wired up', moreso want to make sure perf impact
is minimized for minimal cases.
2: Agree OTEL could be a separate PR if folks really need it (AFAIR it's
the 'old thing' anyway) but no strong feelings either way.
—
Reply to this email directly, view it on GitHub
<#124 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM66A67SMRIEHVSTUFKCEDYRB2HNAVCNFSM6AAAAAA4HC2SQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJWGA3DANZZHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Also the plan is not to leave baggage propagation out for the RabbitMQ client. It'll be added very soon, we just wanted to get the core ActivitySource implementation PR done first. |
Thank you! I appreciate everyone's help here since my OTEL knowledge is quite limited until I can find some time to get up to speed with it all.
My quick publish only benchmarks didn't show anything alarming but I plan to run more comprehensive benchmarks especially to make sure we're not affecting applications not using OTEL. Could you expand a little on what you meant by "we don't have to have it wired up" please?
👍 |
I think they mean "OTEL baggage," though I don't think that changes your sentiment. Baggage isn't an old thing, and is a first-class and orthogonal feature to Tracing. |
/// <param name="builder"><see cref="TracerProviderBuilder"/> being configured.</param> | ||
/// <param name="includeInternal">Include traces from internal messaging used for control flow and lower level usage during high level abstractions (e.g. JetStream, KvStore, etc)</param> | ||
/// <returns>The same instance of <see cref="TracerProviderBuilder"/> to chain the calls.</returns> | ||
public static TracerProviderBuilder AddNatsInstrumentation(this TracerProviderBuilder builder, bool includeInternal = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this library needed? It's wrapper around calling a method with these 2 strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty standard. Providing a second package that provides a strongly typed extension to add the instrumentation.
What we've found is that people don't want the magic strings for AddSource()
and they want the semantic meaning of "Adding instrumentation". It's also providing the semantics and documentation around what internal vs standard telemetry is.
Adding a second package means that you don't need pollute the main packages with OpenTelemetry. That said, that will be required for baggage to work properly at some point soon though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get why but IMO, I wouldn't use this package because it does so little. I'm hoping we can get to a model where this is fully configuration based like logging and metrics (in .NET 8). Hopefully we can pull this off for tracing in later versions.
Unless Nats.Net.Core is somehow pulling in the `OpenTelemetry` package by
some other virtue it makes sense to me despite the low line count, as it
avoids unnecessary library churn for those who don't need it and/or those
who are hoping for constrained/IOT deploy scenarios.
Alas I'm semi out of pocket so can't easily check if there's a better spot
in existing project structure, if there is that may be a better option.
…On Sat, Feb 24, 2024, 9:53 PM David Fowler ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/NATS.Net.OpenTelemetry/OpenTelemetryTracingExtensions.cs
<#124 (comment)>:
> @@ -0,0 +1,23 @@
+using NATS.Client.Core;
+
+// ReSharper disable once CheckNamespace - Place in OTEL namespace for discoverability
+namespace OpenTelemetry.Trace;
+
+public static class OpenTelemetryTracingExtensions
+{
+ /// <summary>
+ /// Enables trace collection on activity sources from the NATS.Client nuget package.
+ /// </summary>
+ /// <param name="builder"><see cref="TracerProviderBuilder"/> being configured.</param>
+ /// <param name="includeInternal">Include traces from internal messaging used for control flow and lower level usage during high level abstractions (e.g. JetStream, KvStore, etc)</param>
+ /// <returns>The same instance of <see cref="TracerProviderBuilder"/> to chain the calls.</returns>
+ public static TracerProviderBuilder AddNatsInstrumentation(this TracerProviderBuilder builder, bool includeInternal = false)
Is this library needed? It's wrapper around calling a method with these 2
strings.
—
Reply to this email directly, view it on GitHub
<#124 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7UYVLG3VARC37XFGVCV7TYVKRR7AVCNFSM6AAAAAA4HC2SQCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQOJZGU4TMMJXGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The audience for the package is not really you or I.
There is a big audience for strongly typed abstractions like that, even if it was for a single method.
From talking to people about this kind of thing, they feel adding the string is a hack, as it could change etc.
It's lightweight, and shouldn't really change, so the maintainence overhead being minimal means the advantage around adoption outweighs it in our opinion on the Otel teams.
|
<rant> Well .NET did all of this work to make sure that library authors could use core primitives to emit traces, metrics and logs without taking on extra dependencies. Except these core libraries still need an extra companion package to call an extension method to turn on the telemetry. The story is incomplete, and we'll get to a point where that isn't needed. I hope these glue packages go away. </rant> |
There's lots of ways .NET could solve this, I'm surprised that the research already done by the .NET hasn't already flagged this gap.
Until there is a better way, this is rhe recommended approach from OpenTelemetry, if the .NET team has other recommendations then it would be great to hear them as part of the OpenTelemetry SIGs which are the ones guiding, you're welcome to join sometime and discuss it.
The only solution I've heard so far is to move the magic strings to app config (like the idea for meters), which doesn't solve the problem that developers want strongly typed extensions.
I suppose its upto the team here whether they want to follow the conventions, or stick with the advice of using the strings.
|
# Conflicts: # sandbox/BlazorWasm/Server/Program.cs # src/NATS.Client.Core/NatsConnection.cs # src/NATS.Client.JetStream/NatsJSContext.Consumers.cs # src/NATS.Client.JetStream/NatsJSContext.cs # src/NATS.Client.ObjectStore/NatsObjStore.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @Cooksauce and everyone for your input.
We will merge this into a staging branch to tidy up examples etc.
Are there any updates in this space? The target branch contains a few nice-to-haves? I don't think these changes have made into the main branch. |
@iby-dev thanks for your interest. please feel free to open a new issue or PR with features you want to see implemented. |
Implements: #53
Background for OpenTelemetry in a messaging context: https://youtu.be/WsTYClGrOVI
As of now, this PR contains a rough implementation approach which is working in the sandbox. See below for how to run this.
I have this set as a draft at the moment b/c I would like a bit of feedback on the general approach before moving forward and cleaning everything up well. Specifically, I'd like to know if the maintainers here approve of adding an
Activity
object to all of the commands. This was the best approach I came up with due to the pipelining nature of the actual msg publishing.Note that the objects should be null if no tracing listeners have been wired up.
Todo:
Activity
) on pubActivity
) on subSurface "metrics"... counters, etc(tabled for follow up work)Open Questions
kv put
, etc) be surfaced as such or as their underlying pub / sub commands?How to run in sandbox:
docker-compose up
from./sandbox
directoryExample.Core.SubscribeRaw
projectExample.Core.PublishModel
projecthttp://localhost:16686
Example trace at current state: