-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] 🔭 OpenTelemetry support #187
Comments
This is a sample of the result we will have soon, in this case via the Jaeger UI: This example is using a lot of different features of FusionCache all at once, so that I can see what would be like to have one of the most complex scenario rendered out:
Looks good? |
I am more interested in Metrics with respect to Cache as they provide more value to make efficient use of cache, helps you adjust your cache strategy.
The above metrics are a good start: Cache hit/miss will be a good start. Do not care much about global vs per instance level metrics as long as I can see the ratios of usage. But global seems a good start. Traces are very handy when debugging production. My employer uses DynaTraces, and I know from experience that |
Maybe I've expressed myself badly here, what I meant was:
Maybe "general" would've been a better word for it, I'm updating the wording, thanks. |
I would prefer the above because my experience is stale is an edge case that makes the difference as you outlined valuable |
Yes, I'm planning to add all 3 of them, so you can pick the one you are more interested in. Basically if the cache had 10 cache hits, 3 of which were stale and 7 of which were not, you'll have 3 counters with:
Sounds good? |
I'm not sure about that TBH. We prefer to use ELK stack for logging. However, some users might desire an all-in-one solution encompassing logging, tracing, and metrics using OpenTelemetry. This could be implemented based on user demand.
Using boolean flags are very common approach.
I believe the concept of TraceLevel doesn't currently exist. A disscussion sheds light on why it's absent.
Necessary metrics for adjusting the cache strategy could be recorded by default, while others can be managed by boolean flags.
It depends on performance impact. Granular selection could be better option.
I aggree. |
It looks amazing @jodydonetti! |
Based on what I've read, the story about logging + OpenTelemetry in .net is just this: use the existing shared abstractions ( builder.Logging.ClearProviders()
.AddOpenTelemetry(loggerOptions =>
{
loggerOptions
// define the resource
.SetResourceBuilder(resourceBuilder)
// add custom processor
.AddProcessor(new CustomLogProcessor())
// send logs to the console using exporter
.AddConsoleExporter();
loggerOptions.IncludeFormattedMessage = true;
loggerOptions.IncludeScopes = true;
loggerOptions.ParseStateValues = true;
});
Ah, thanks for linking this, I'll have a read about that!
Good point, will experiment with this approach. |
Thanks! I'm really happy about how it is already looking 😬 |
If already using
If this is referring to Logs, then using appropriate Logger Category is good, as users can filter/set-log-level etc. based on LoggerCategory. If this is referring to traces (as in distributed traces), then you can consider different |
Hi all, I just released v0.25.0-preview1 🎉 This is a pre-release version which already contains basically all the observability work done, with both traces and metrics. Please try it and let me know what you think. ps: many thanks to @martinjt for the invaluable support in better understanding the whole OpenTelemetry game, including some nuanced details about best practices, usages and so on. Thanks! |
Anybody have been able to try this out? Any opinion would be greatly appreciated! |
Hi @jodydonetti Sorry I wasn't able to get back to you sooner. I've been quite occupied in the past few weeks. I've recently explored the tracing feature, and it's exactly what I needed. Thanks again for all your efforts! |
Thanks @akoken , I'm very glad it's working well for you! Btw, sneak peek: |
Awesome 🚀 Looking forward to the final version. |
Just now getting a chance to look. Lookint at the OnEviction and comparing to my plugin behavior. In my Plugin the I chose to have two counters (instruments). One for "expire" and one for "capacity". You can see the logic in the Plugin's OnExpire handler. Or you could do follow the OnHit pattern by adding a tag to indicate eviction reason. Assuming I find more time this week to test this out, my goal is to not be dependent on my ZiggyCreatures.FusionCache.Metrics package. The labor will mostly lie in naming convention mapping. Like in my environment all apps have a prefix for all meter names. But this should not be a problem. For example, I can call WithMetrics again and add a metric view via the If I don't find a way then I would come back to the technique I mentioned before. Create a ISemanticConventions and default implementation that holds all the const values for meternames and tags. Then the consumer can pass in it's own imlementation of ISemanticConvention to control naming convention.
So far things are looking good. I am very happy Open Telemetry is arriving in FusionCache.
|
Interesting conversation on naming there. Having the consumer of a library manage the naming of meters, instruments and attributes is more of an anti-pattern that a library shouldn't support. I can understand your usecase (I.e. you have existing dashboards etc.) And this is more of a usecase for the collector (which you mentioned not wanting to do which is interesting). The Semantic Conventions are set by otel, for everything else is naming conventions specific to the repo. So if FusionCache were to support that, I wouldn't recommend using the term "semantic" as it implies you'll be overriding the ratified conventions (when they are ratified). From a naming perspective, the hierarchy should be.
For application code, you should use your own prefix, but that shouldn't influence the names given to signal data from libraries. |
Thanks @martinjt for the comments. I almost didn't even mention how I used my own |
Thanks, these insights are very useful in better understanding the nuances that sometimes get lost. |
Hi all, I just release v0.25.0 🥳 |
Problem
Observability is a key feature of modern software systems that allows us to clearly see what is going on at any given time.
It is composed of 3 main parts:
FusionCache already has rich logging support via the standard ILogger interface, with various options available to granularly configure it.
Metrics are not currently available in FusionCache natively, even though there's some support via third-party plugins.
Traces instead is currently not available at all.
It would be nice to add native support for the missing parts.
Solution
From some time now OpenTelemetry has become a fundamental staple in modern software development to bring complete observability to our applications.
FusionCache should adopt it.
Thankfully in .NET they adopted a way to do it with which, for library authors, it's possible not to take a dependency on the OpenTelemetry packages and simply use existing concepts in the BCL, like
Activity
,ActivitySource
,Meter
etc and have the OpenTelemetry consuming side interact with them, which is nice.Alternatives
Even though metrics are kinda obtainable via plugins (as it already is, see the link above), it would be better to have native support for it.
Traces instead are not achievable unless there's native support for it inside of FusionCache internals.
Questions
I still have some open questions, which I'll list here: if anyone has a suggestion about it please chip in!
Thanks.
Logging
Having already rich support for logging via the
ILogger
interface, is there something else that should be done regarding OpenTelemetry integration? Maybe I'm missing something.Traces: minimum level?
Regarding traces I've yet to undersand how to fine-tune the "verbosity" of it, meaning that if with Logging we have the common concept of a "minimum log level" to avoid bloating our logs in production, I'm still trying to understand how to achieve the same level of control with traces.
One idea may be to have different
bool
flags to enable/disable traces for different parts of the system like memory cache, distributed cache, backplane, etc with options likebool EnableDistributedCacheTraces
or similar.Another approach may be to re-use the
LogLevel
type with an option named something likeTracesMinLevel
, with which a user can choose the level of details of the traces. I don't like that it would be mixing different concepts (logs/traces) so if there's something better I'd like that.Traces: instance granularity?
Another question is about instance granularity: with logging I can create 2 different instances of FusionCache and pass a logger to one but not the other: how can we achieve something like this with traces?
I was thinking about an instance-level
bool
switch (or the aforementionedLogLevel
one) via instance-specific options, so that it would be possible to achieve that.Metrics: which ones?
What metrics should be collected?
I'm thinking probably the ones related to already existing events like cache hit (general/fresh/stale), cache miss, factory success, factory error, etc. Anything else?
Since in metrics, on top of counters, there are also histograms would it make sense to also record things like factory durations, distributed cache operations durations and so on? Maybe it's overkill, maybe not.
Also with metrics, should we always record them all or allow a granular selection via some options?
Metrics: instance granularity?
Just like with traces, the question is if there's a way to enable/disable metrics for a specific cache instance?
Statistics
Finally, on a related note, by adding native support for metrics, is there a meaning in also adding some sort of direct access to locally collected statistics, like the ones recently added in MemoryCache? On one hand it seems a nice addition, but on the other hand it would be like collecting metrics twice, and for what? Also, metrics with OpenTelemetry are long-lived, whereas simple stats stored only in memory would be wiped out at each restart of the application, and by having real metrics support what would be the points?
My current position is not to add them, but if you have any idea for why I should, please let me know.
The text was updated successfully, but these errors were encountered: