-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The approach to telemetry (using custom ITelemetryPublisher
) seems overengineered and counter to modern OpenTelemetry abstractions
#412
Comments
There can be an open telemetry implementation of ITelemetryPublisher. I think that should alleviate these concerns as that implementation could use Having |
@jimmyca15 My argument here is more for using the native If you take all other major frameworks (from Microsoft itself), they use Once again, I'd like for the team to reconsider this design and create something simpler and more consistent with the rest of the framework. |
@julealgon Thanks for the feedback! It's much valuable and appreciated! Here is a scenario you can help me understand: An existing application is using AzureAppConfiguration for Configuration, Feature Management (no telemetry yet) and AppInsights for metric and custom application events (no Open Telemetry anywhere). What would be the path for this application to adopt Feature Mangement telemetry, assuming Feature Management only integrates with OpenTelemetry? |
The bolded is a bit misleading, since tecnically, you could interface with any library or framework by leveraging the native Assuming you want to keep interfacing directly with AppInsights, you'd likely add the following packages:
Then on your services.AddOpenTelemetry()
.ConfigureResource(c => c.AddService("YourServiceNameHere"))
.WithTracing(c => c
.AddAspNetCoreInstrumentation()
.AddAzureMonitorTraceExporter(...); Then, every time a request is made to the API, a new root span is generated automatically. At any point during the request, if a feature flag is checked, an event would be added to the current span, and propagated into AppInsights via the configured exporter. This of course assumes that Let me know if this covers your question properly. |
@julealgon thank you for the feedback. You have a good point for us to use |
Here's what I'm thinking as far as what the activity would look like for feature management.
|
By transitioning to use Activity, we can remove the The ApplicationInsightsTelemetryPublisher would instead be based off activity listener.
Instantiating the |
Ideally it should be easy for a user to use this with DI. The common usage of feature management is
We will want something like
|
@jimmyca15 I think there is a slight misunderstanding here. When I made the original suggestion/post, I was thinking about FeatureManagement NOT creating an activity, but adding an activity event to the current activity. This is how the OpenTelemetry specification defines feature toggle checks: as events, not as spans. In this case, if there is no ambient activity when a toggle is called, the logic wouldn't add any event either. But if the check happens in the scope of an activity (again, using the example above, as part of an AspNetCore request), then the "feature flag check" event would be added to that ambient activity. I think creating an activity for the feature check itself doesn't make much sense because a feature evaluation is an "instant" occurrence, and not a "window" like what |
Yes. I think it's what I said though, maybe you got confused on "activity event"? We are adding an "activity event" to the current "activity". |
Yes, deleted my comment, but looks like you replied. I had read 'activity event' as just 'activity'. |
@julealgon with such a design if a developer wanted to just have a callback to be able to handle feature evaluation events (like ITelemetryPublisher.PublishEvent(EvaluationEvent)) would you expect them to implement an activity listener that listens to all activities and only processes activities that have an event named "FeatureEvaluation"? |
If that's the case, I don't believe this will be a good fit for all expected scenarios for consumption of flag events. While I do agree that the library should have an answer for OpenTelemetry, I fear the path that has been led to would take away from those who publish to a monitoring system that has not fully adopted OpenTelemetry. With that, I would suggest that we keep ITelemetryPublisher to allow fully customizable handling of feature evaluation events and simply have an OpenTelemetryPublisher implementation that adds to the current activity.
|
Can you consider another option where the Most frameworks don't need to have This would allow the standard flow to just call services.AddFeatureManagement(); And it would propagate the events as per the telemetry setting. EDIT: Also, keep in mind that |
If we were to add it by default do you have any opinion on the makeup of the event being added? Or is your suggestion purely on the mechanics?
|
I would strongly recommend following the OpenTelemetry Semantic Conventions for the attribute values in the event:
For the provider name I assume it's totally up to you to define. Additionally, I believe you can introduce extra attributes as needed for your implementation too, but I'll defer that to you guys to check with the OTEL team on what the best course of action is. The only caveat with these attributes is that they are still in "Experimental" stage in the OTEL spec, so they could potentially change later on (or have additional attributes added, or some removed, etc). And lastly... I would also suggest reaching out to the OTEL group to propose new attributes that you think could be general enough. I see from your example that we have some stuff that might make sense being a default (probably optional) attribute for other feature flagging systems to follow. I did notice this section which also seemed interesting to me:
It might make sense for the team to consider logging instead of an |
@julealgon @jimmyca15 I noticed this thread and thought you might be interested in joining the feature flag semantic conventions OpenTelemetry project. It's a joint collaboration between OpenFeature and OpenTelemetry that aims to extend and harden the existing semantics to support more use cases. We're looking for input from end users and feature flag vendors. |
Just want to call out: Can we be confident about that the number of feature evaluation events is always modest? When application starts processing an operation e.g. HTTP request or task from queue, it creates an |
Considering the telemetry toggle in
I believe it would still be a good practice to check the Another thing which would impact this from a performance standpoint would be trace sampling strategy: IIRC, the sampler will impact whether or not a given trace is being considered and should be setting that property above (I'm not 100% sure about that however). EDIT: Example usage from the
By redundant, do you mean when folks check the same flag multiple times during the same operation? Or are you referring to common attributes across multiple flag check events (such as potentially context info, library/SDK attributes, etc that are shared across all flags)? |
Yes. Imagine that different components may refer the same feature flag and IsEnabled method may be called multiple times during the same operation. |
This is a very fair concern... I wonder if this should be accounted for in the OpenTelemetry spec itself though: like a way to "reference" common data in multiple places when it happens more than once in the same span/trace/grouping etc. It wouldn't apply only to feature flags. Imagine for example all the potential duplication in logs that have the same "contextual information". Perhaps there should be a similar mechanism than what exists today for resources, which can be shared and pushed with many "child spans" in the payload while not having to replicate all resource attributes for each individual span. I personally wouldn't want for the trace itself to NOT show some of the feature flag checks, so a way to just point to a shared data section would make sense: each event would just reference the same piece of data and there would be no duplication or overhead. |
Even with references, tracing can easily become overwhelming. I can picture devs codding loops with IsEnabled in a single operation context or calling IsEnabled in scaffolding UI scenarios. Checking feature flag is supposed to be lightweight. |
I understand this concern. The open telemetry feature flag spec defines feature flag evaluations within a transaction as span events. But my concern is that the concept in the open telemetry spec is language neutral. You can interpret "transaction" as Maybe I am wrong, but I think we can also interpret "transaction" as the parent The best practice mentioned in this doc says "Each library or library subcomponent can (and often should) create its own source. " So, my thought is to let the feature manager have an ActivitySource and when a feature flag evaluation happens (it could be either a call of IsEnabled or GetVariant), it will start an activity. BTW, if we purely think from the side of supporting OpenTelemetry: From this doc, I can not tell whether the OpenTelemetry .NET SDK will look at |
I'm a bit confused by this though. They are essentially the same to me: the call triggers an evaluation. If a call is happening in a loop, then surely we would want to surface all of those. If you call some SQL procedure in a redundant way multiple times, you'd want to see all those represented in spans. I don't see how feature evaluation as events are any different. If somebody want's to only call it once, they should move it out of the loop. |
You surely could start an activity to represent the overall flow, yes, but I think it wouldn't be useful since the check is almost instantaneous, and any actual external activity would already be covered by things such as HTTP instrumentation or SQL instrumentation for actually fetching values from the configuration (or somewhere else a custom provider decides to fetch from). Additionally, such activity would be mapping a span that is not part of the OTEL spec too... so I think you'd be adding a bit of complication to the whole flow by doing it. Since the OTEL spec only defines feature evaluations, I would personally focus on that only and maybe start a discussion around representing more about the feature management as an actual full-blown span.
I think the highlighted part is a stretch. An activity clearly represents a 1:1 mapping with a "span" in OpenTelemetry. Span events are their own abstraction and model an instantaneous action instead of an operation window like spans do. Hierarchy is kept between parent and child, sure, but this doesn't change the semantics of each entity: spans are one thing, events are another. If the team would want to dispute this, I think it would need to happen with the OTEL Spec team. From my personal perspective, it does make a ton more sense to model feature evaluations as events though, instead of spans. It just feels more correct from a conceptual standpoint to me.
I would assume that only applies to when it makes sense for the library to create an
We've been using the I would assume any other event would be handled similarly although we don't currently have any on our applications. |
IsEnabled and evaluation are not necessary the same thing. The evaluation is always the same for the same context. Sure, evaluation can be triggered on the first call to IsEnabled, but that's implementation detail. Do we surface every time when the app calls to read key-value setting from IConfiguration? |
Agree. |
Totally fair. But I think the spec is really talking about the immediate value of the toggle regardless of any caching (so the
We don't but that's not an apples-to-apples comparison IMHO. General configuration is not part of the OTEL spec (it could end up being, in the future... that would be interesting) so there is nothing defined for it. And while one can compare settings to feature flags, feature flags is like a higher-level abstraction that can work on top of |
Feature Management is often seen as extension/feature of the configuration system. That's also from customers and providers prospective. With dynamic configuration already a norm, variant-based feature management, etc., these are much closer that it seems, in some cases even interchangeable. |
Sure, I'm aware of that in the case of Keep in mind that OTEL tries to be framework-agnostic as much as possible and focus on semantics in the conventions. If you want to push for a standardized view of "configuration in general" and "feature flags" in the OTEL spec, I'd probably support you there but that's a much broader topic than what we are talking about here. I don't see the current spec as a blocker for the implementation on FeatureManagement. If configuration and feature flags are later unified in some sort of new spec, then it would be a matter of adapting to that breaking change in here. Besides, if you want to make the argument that strongly, I'd then challenge you with the question of "why do we even need a separate feature toggling abstraction/interface, if I can just inject an Again... I could agree with you if you went on that direction, but that's a totally separate can of worms IMHO. |
Definitely not unique for Microsoft.FeatureManagement. If I remember correctly, Split and LaunchDarkly support complex (multi-value) variant feature flags.
Not at all. Just commenting on "feature flags" as a target for OTEL spec, while configuration is not under consideration.
Indeed, we don't need separate abstraction in principle. But unfortunately, most config systems are not context aware to allow for targeting. |
Oh to be clear, I wasn't referring to variant support there, but to the concept that the feature management system is built on top of a generic configuration system. Apologies if I misunderstood your original statement there. No disagreement from me in terms of variant support for sure. |
I didn't mean to relate that Feature Management is linked to configuration because it can be implemented on top of generic configuration system. It's rather the use case. Feature Variants make it a bit clearer, that's why I pointed to them. A variant feature allows for non-boolean (complex) value type, even arbitrary json. The concepts of enabled/disabled fades away, replaced by default treatment and targeting. These are mostly used as context-aware configuration settings, and less as if-else control flow. |
@julealgon You are right! OTel .NET SDK can recognize Another question is if we only want feature manager to add event to the existing activity (which is most likely the Besides, if we want to convert the "feature_flag" activity event to Application Insights EventTelemetry. Without depending on OpenTelemetry, if the feature manager doesn't create any activity, we have to listen all activities and find whether there is any activity contains a "feature_flag" activity. Our current pattern is that as long as the user uses our built-in App Insights telemetry publisher, a "FeatureEvaluation" custom event will be sent automatically. If feature manager doesn't create any activity, when there is no activity, no feature flag telemetry will be sent to App Insights. |
Yeah this is a good question. If it was up to me, I'd probably NOT create an activity, but I would maybe keep your custom notification hook in parallel so you can do other types of integrations even when no
Yeah I understand this is a bit tricky, and is why I mentioned above that maybe keeping some sort of custom hook there is still a good idea, even if the library interacts with an optional
Yeah that makes sense. Not sure how you'd code this in a way that it still pushes in a custom way to App Insights but you might be able to just detect the presence of Azure Insights integration in the library itself and then if that is turned on, employ the custom publisher as well. I will say though that the fact that we even have this custom integration path with Azure telemetry seems like a gap to me in either the spec or in the Azure implementation side. It shouldn't be the case IMHO that app insights needs anything custom moving forward if the OTEL spec already models that interaction. But of course that's up to you guys on how you want to move Azure telemetry forward. As a consumer, I'd certainly try to be as far away as possible from these custom integrations so that my applications are as vendor-agnostic as it can get. The whole |
Hey @julealgon, I just want to thank you for calling this out and helping educate us on OpenTelemetry / Activity based telemetry. We're now moving in this direction. If you're interested, here's a quick summary of where we landed:
PR is out for review now- so some of these things might change. But figured it was worth sharing! 😄 |
I appreciate the update @rossgrambo . Your plan seems sensible to me. I look forward to integrating this in our project as we get rid of our custom feature toggling abstractions and switch into Microsoft FeatureManagement and our new OpenTelemetry pipeline. Will keep an eye out for when this becomes available in the upcoming releases 😉 |
I just saw the latest release notes, and with that, I got to know a bit more on how this project is raising telemetry information on feature toggle changes through the use of a custom
ITelemetryPublisher
abstraction.I'm raising this issue here to discuss on what I believe to be a poor approach to the problem. The OpenTelemetry spec already defines how feature toggles should be emitted in telemetry for both traces and logs here:
The way
ITelemetryPublisher
was devised appears to ignore well-known observability abstractions such asActivity
in favor of a completely bespoke channel that requires custom implementers to communicate with external observability platforms (like the AppInsights implementation that was added).Instead, what I believe should be happening is that, when telemetry is enabled in the library, a span event is registered in the current activity (as per the OTEL spec). This would naturally flow the information outwards if any
Activity
is in scope, and subsequently using OpenTelemetry, an exporter would pick it up (AppInsights would come into play at this point).I strongly dislike the current design because it creates a completely separate custom flow for telemetry while we already have a very well established mechanism for capturing it with OpenTelemetry.
Please keep me honest if I'm missing something completely obvious here.
The text was updated successfully, but these errors were encountered: