Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The approach to telemetry (using custom ITelemetryPublisher) seems overengineered and counter to modern OpenTelemetry abstractions #412

Closed
julealgon opened this issue Mar 29, 2024 · 37 comments
Assignees

Comments

@julealgon
Copy link
Contributor

I just saw the latest release notes, and with that, I got to know a bit more on how this project is raising telemetry information on feature toggle changes through the use of a custom ITelemetryPublisher abstraction.

I'm raising this issue here to discuss on what I believe to be a poor approach to the problem. The OpenTelemetry spec already defines how feature toggles should be emitted in telemetry for both traces and logs here:

The way ITelemetryPublisher was devised appears to ignore well-known observability abstractions such as Activity in favor of a completely bespoke channel that requires custom implementers to communicate with external observability platforms (like the AppInsights implementation that was added).

Instead, what I believe should be happening is that, when telemetry is enabled in the library, a span event is registered in the current activity (as per the OTEL spec). This would naturally flow the information outwards if any Activity is in scope, and subsequently using OpenTelemetry, an exporter would pick it up (AppInsights would come into play at this point).

I strongly dislike the current design because it creates a completely separate custom flow for telemetry while we already have a very well established mechanism for capturing it with OpenTelemetry.

Please keep me honest if I'm missing something completely obvious here.

@jimmyca15
Copy link
Member

jimmyca15 commented Mar 30, 2024

@julealgon

There can be an open telemetry implementation of ITelemetryPublisher. I think that should alleviate these concerns as that implementation could use Activity as you mentioned.

Having ITelemetryPublisher allows anyone using the feature management library to have an extension point into feature management telemetry without having to migrate to open telemetry.

@julealgon
Copy link
Contributor Author

@julealgon

There can be an open telemetry implementation of ITelemetryPublisher. I think that should alleviate these concerns as that implementation could use Activity as you mentioned.

Having ITelemetryPublisher allows anyone using the feature management library to have an extension point into feature management telemetry without having to migrate to open telemetry.

@jimmyca15 Activity is not OpenTelemetry-specific, it's a native .NET type that can integrate with OpenTelemetry.

My argument here is more for using the native Activity as the telemetry abstraction instead of devising a completely custom abstraction, than it is for OpenTelemetry per se.

If you take all other major frameworks (from Microsoft itself), they use Activity and Meters for telemetry internally, and then a separate OpenTelemetry instrumentation module is created to listen to those and expose them via OTEL. You are just introducing an additional (IMHO completely redundant and dangerous in the sense that it leads people away from the common abstraction) mechanism in this library.

Once again, I'd like for the team to reconsider this design and create something simpler and more consistent with the rest of the framework.

@drago-draganov
Copy link

drago-draganov commented Apr 1, 2024

@julealgon Thanks for the feedback! It's much valuable and appreciated!

Here is a scenario you can help me understand:

An existing application is using AzureAppConfiguration for Configuration, Feature Management (no telemetry yet) and AppInsights for metric and custom application events (no Open Telemetry anywhere). What would be the path for this application to adopt Feature Mangement telemetry, assuming Feature Management only integrates with OpenTelemetry?

@julealgon
Copy link
Contributor Author

@drago-draganov

Here is a scenario you can help me understand:

An existing application is using AzureAppConfiguration for Configuration, Feature Management (no telemetry yet) and AppInsights for metric and custom application events (no Open Telemetry anywhere). What would be the path for this application to adopt Feature Mangement telemetry, assuming Feature Management only integrates with OpenTelemetry?

The bolded is a bit misleading, since tecnically, you could interface with any library or framework by leveraging the native Activity, but let`s ignore that for this.

Assuming you want to keep interfacing directly with AppInsights, you'd likely add the following packages:

  • OpenTelemetry.Extensions.Hosting: for most apps, this is the simplest way to integrate OTEL as it has a DI-friendly API similar to other services that rely on Microsoft.Extensions.Hosting abstractions
  • OpenTelemetry.Instrumentation.AspNetCore: assuming an AspNetCore project here, this gives you the basic "root" activity
  • Azure.Monitor.OpenTelemetry.Exporter: allows exporting telemetry directly into Azure AppInsights

Then on your program.cs, this is all that is needed:

services.AddOpenTelemetry()
    .ConfigureResource(c => c.AddService("YourServiceNameHere"))
    .WithTracing(c => c
        .AddAspNetCoreInstrumentation()
        .AddAzureMonitorTraceExporter(...);

Then, every time a request is made to the API, a new root span is generated automatically. At any point during the request, if a feature flag is checked, an event would be added to the current span, and propagated into AppInsights via the configured exporter.

This of course assumes that Microsoft.FeatureManagement would add the code to enrich the current Activity with an event when the "Telemetry: Enabled" is present in configuration for that feature flag (following the existing design created for the ITelemetryPublisher here).

Let me know if this covers your question properly.

@jimmyca15
Copy link
Member

@julealgon thank you for the feedback. You have a good point for us to use Activity given its role as the standard method for telemetry transmission between components. We will put together a bit of a design of what it would look like to use Activity instead of ITelemetryPublisher and provide a summary here for review.

@jimmyca15 jimmyca15 self-assigned this Apr 1, 2024
@jimmyca15
Copy link
Member

jimmyca15 commented Apr 10, 2024

Here's what I'm thinking as far as what the activity would look like for feature management.

ActivitySource:
  Name: "Microsoft.FeatureManagement"
  Version: "1.0.0"
  Activities:
    - Name: "EvaluateFeature"
      Events:
        - Name: "FeatureEvaluation"
          Tags: 
            - key: "FeatureName"
              value: "Alpha" # The name of the feature
            - key: "Enabled"
              value: "true" | "false" # Whether the flag is enabled or not
            - key: "TargetingId"
              value: "Fred" # The targeting id, if present, used during targeting evaluation
            - key: "Variant"
              value: "On" # The name of the assigned variant
            - key: "VariantAssignmentReason"
              value: "Percentile" # The reason the variant has been assigned

@jimmyca15
Copy link
Member

jimmyca15 commented Apr 10, 2024

By transitioning to use Activity, we can remove the ITelemetryPublisher interface.

The ApplicationInsightsTelemetryPublisher would instead be based off activity listener.

public class ApplicationInsightsTelemetryPublisher : IDisposable
{
    private readonly TelemetryClient _client;
    private readonly ActivityListener _listener;

    //
    // Creates an instance of ApplicationInsightsTelemetryPublisher that will immediately begin publishing feature management events to Application Insights
    public ApplicationInsightsTelemetryPublisher (TelemetryClient client);

    //
    // Disposes the ApplicationInsightsTelemetryPublisher and underlying activity listener. Once disposed, the publisher will immediately stop sending events to Application Insights.
    public void Dispose();
}

Instantiating the ApplicationInsightsTelemetryPublisher sets everything up for feature evaluation events to be published to Application Insights.

@jimmyca15
Copy link
Member

Ideally it should be easy for a user to use this with DI.

The common usage of feature management is

services.AddFeatureManagement();

We will want something like

services.AddFeatureManagement()
    .WithApplicationInsightsTelemetryPublisher();

@julealgon
Copy link
Contributor Author

Here's what I'm thinking as far as what the activity would look like for feature management.

@jimmyca15 I think there is a slight misunderstanding here. When I made the original suggestion/post, I was thinking about FeatureManagement NOT creating an activity, but adding an activity event to the current activity. This is how the OpenTelemetry specification defines feature toggle checks: as events, not as spans.

In this case, if there is no ambient activity when a toggle is called, the logic wouldn't add any event either. But if the check happens in the scope of an activity (again, using the example above, as part of an AspNetCore request), then the "feature flag check" event would be added to that ambient activity.

I think creating an activity for the feature check itself doesn't make much sense because a feature evaluation is an "instant" occurrence, and not a "window" like what Activity models.

@julealgon
Copy link
Contributor Author

NOT creating an activity, but adding an activity event to the current activity

Did you mean to say add an event to the current activity?

Yes. I think it's what I said though, maybe you got confused on "activity event"? We are adding an "activity event" to the current "activity".

@jimmyca15
Copy link
Member

Yes, deleted my comment, but looks like you replied. I had read 'activity event' as just 'activity'.

@jimmyca15
Copy link
Member

@julealgon with such a design if a developer wanted to just have a callback to be able to handle feature evaluation events (like ITelemetryPublisher.PublishEvent(EvaluationEvent)) would you expect them to implement an activity listener that listens to all activities and only processes activities that have an event named "FeatureEvaluation"?

@jimmyca15
Copy link
Member

If that's the case, I don't believe this will be a good fit for all expected scenarios for consumption of flag events. While I do agree that the library should have an answer for OpenTelemetry, I fear the path that has been led to would take away from those who publish to a monitoring system that has not fully adopted OpenTelemetry.

With that, I would suggest that we keep ITelemetryPublisher to allow fully customizable handling of feature evaluation events and simply have an OpenTelemetryPublisher implementation that adds to the current activity.

services.AddFeatureManagement().
    .WithOpenTelemetry();

@julealgon
Copy link
Contributor Author

julealgon commented Apr 10, 2024

With that, I would suggest that we keep ITelemetryPublisher to allow fully customizable handling of feature evaluation events and simply have an OpenTelemetryPublisher implementation that adds to the current activity.

Can you consider another option where the Activity enrichment happens by default, and people can hook into the publisher mechanism for special cases?

Most frameworks don't need to have Activity "enabled": they are built into the standard implementation.

This would allow the standard flow to just call

services.AddFeatureManagement();

And it would propagate the events as per the telemetry setting.

EDIT:

Also, keep in mind that Activity by itself is not an OpenTelemetry type: it just implements the OpenTelemetry spec in the .NET space without needing a separate type to represent spans. So calling that method WithOpenTelemetry would be fairly misleading as one could use Activity type for many other use cases other than OTEL.

@jimmyca15
Copy link
Member

If we were to add it by default do you have any opinion on the makeup of the event being added? Or is your suggestion purely on the mechanics?

Name: "FeatureEvaluation"
Tags: 
  - key: "FeatureName"
    value: "Alpha" # The name of the feature
  - key: "Enabled"
    value: "true" | "false" # Whether the flag is enabled or not
  - key: "TargetingId"
    value: "Fred" # The targeting id, if present, used during targeting evaluation
  - key: "Variant"
    value: "On" # The name of the assigned variant
  - key: "VariantAssignmentReason"
    value: "Percentile" # The reason the variant has been assigned

@julealgon
Copy link
Contributor Author

If we were to add it by default do you have any opinion on the makeup of the event being added? Or is your suggestion purely on the mechanics?

I would strongly recommend following the OpenTelemetry Semantic Conventions for the attribute values in the event:

Convention

A flag evaluation SHOULD be recorded as an Event on the span during which it occurred.

The event name MUST be feature_flag.

Attribute Description Recommendation
feature_flag.key The unique identifier of the feature flag. Required
feature_flag.provider_name The name of the service provider that performs the flag evaluation. Recommended
feature_flag.variant SHOULD be a semantic identifier for a value. If one is unavailable, a stringified version of the value can be used. Recommended

For the provider name I assume it's totally up to you to define.

Additionally, I believe you can introduce extra attributes as needed for your implementation too, but I'll defer that to you guys to check with the OTEL team on what the best course of action is.

The only caveat with these attributes is that they are still in "Experimental" stage in the OTEL spec, so they could potentially change later on (or have additional attributes added, or some removed, etc).

And lastly... I would also suggest reaching out to the OTEL group to propose new attributes that you think could be general enough. I see from your example that we have some stuff that might make sense being a default (probably optional) attribute for other feature flagging systems to follow.

I did notice this section which also seemed interesting to me:

To record an evaluation outside of a transaction context, consider recording it as a log record.

It might make sense for the team to consider logging instead of an Activity event when there is no ambient Activity to add the event to. Just some food for thought here. Again, this could also be standard behavior and not necessarily toggleable (although I think that could be up to you too).

@beeme1mr
Copy link

@julealgon @jimmyca15 I noticed this thread and thought you might be interested in joining the feature flag semantic conventions OpenTelemetry project. It's a joint collaboration between OpenFeature and OpenTelemetry that aims to extend and harden the existing semantics to support more use cases. We're looking for input from end users and feature flag vendors.

open-telemetry/community#2042

@zhiyuanliang-ms
Copy link
Contributor

zhiyuanliang-ms commented May 6, 2024

Just want to call out:
"Events are stored in an in-memory list until they can be transmitted which makes this mechanism only suitable for recording a modest number of events." ref

Can we be confident about that the number of feature evaluation events is always modest?
We may need to think about whether there is any mechanism to limit the event number during the feature flag evaluation.

When application starts processing an operation e.g. HTTP request or task from queue, it creates an Activity.
Currently, whenever IsEnabled or GetVariant is called, the PublishTelemetry will be called. I can imagine there could be some redundant/verbose telemetry/event.

@julealgon
Copy link
Contributor Author

julealgon commented May 6, 2024

Just want to call out: "Events are stored in an in-memory list until they can be transmitted which makes this mechanism only suitable for recording a modest number of events." ref

Can we be confident about that the number of feature evaluation events is always modest?

Considering the telemetry toggle in Microsoft.FeatureManagement is both opt-in, and per-flag, it would be up to the consumer to understand their own scenario and test the viability of enabling these events on a per-flag basis. I think this question would be more critical if the mechanism was always-on by default.

We may need to think about whether there is any mechanism to limit the event number during the feature flag evaluation.

I believe it would still be a good practice to check the Activity.IsAllDataRequested property before adding the event so it only happens when there are actual listeners for that specific source.

Another thing which would impact this from a performance standpoint would be trace sampling strategy: IIRC, the sampler will impact whether or not a given trace is being considered and should be setting that property above (I'm not 100% sure about that however).

EDIT:
Yeah, sampling will indeed have an effect on Activity properties such as IsAllDataRequested, so the implementation should definitely inspect that property before propagating the event.

Example usage from the AlwaysOff sampler:

image

When application starts processing an operation e.g. HTTP request or task from queue, it creates an Activity. Currently, whenever IsEnabled or GetVariant is called, the PublishTelemetry will be called. I can imagine there could be some redundant/verbose telemetry/event.

By redundant, do you mean when folks check the same flag multiple times during the same operation? Or are you referring to common attributes across multiple flag check events (such as potentially context info, library/SDK attributes, etc that are shared across all flags)?

@zhiyuanliang-ms
Copy link
Contributor

By redundant, do you mean when folks check the same flag multiple times during the same operation?

Yes. Imagine that different components may refer the same feature flag and IsEnabled method may be called multiple times during the same operation.

@julealgon
Copy link
Contributor Author

By redundant, do you mean when folks check the same flag multiple times during the same operation?

Yes. Imagine that different components may refer the same feature flag and IsEnabled method may be called multiple times during the same operation.

This is a very fair concern... I wonder if this should be accounted for in the OpenTelemetry spec itself though: like a way to "reference" common data in multiple places when it happens more than once in the same span/trace/grouping etc.

It wouldn't apply only to feature flags. Imagine for example all the potential duplication in logs that have the same "contextual information".

Perhaps there should be a similar mechanism than what exists today for resources, which can be shared and pushed with many "child spans" in the payload while not having to replicate all resource attributes for each individual span.

I personally wouldn't want for the trace itself to NOT show some of the feature flag checks, so a way to just point to a shared data section would make sense: each event would just reference the same piece of data and there would be no duplication or overhead.

@drago-draganov
Copy link

I personally wouldn't want for the trace itself to NOT show some of the feature flag checks, so a way to just point to a shared data section would make sense: each event would just reference the same piece of data and there would be no duplication or overhead.

Even with references, tracing can easily become overwhelming. I can picture devs codding loops with IsEnabled in a single operation context or calling IsEnabled in scaffolding UI scenarios. Checking feature flag is supposed to be lightweight.
I think the important part for tracing is the feature flag evaluation, not the flag check call.

@zhiyuanliang-ms
Copy link
Contributor

I was thinking about FeatureManagement NOT creating an activity, but adding an activity event to the current activity. This is how the OpenTelemetry specification defines feature toggle checks: as events, not as spans.

I understand this concern. The open telemetry feature flag spec defines feature flag evaluations within a transaction as span events. But my concern is that the concept in the open telemetry spec is language neutral. You can interpret "transaction" as Activity and "span event" is an AcitivityEvent of it. Or we can start an activity called "feature_management" when any public of feature manager is called and it will have some "feature_flag" events as some feature flags are evaluated.

Maybe I am wrong, but I think we can also interpret "transaction" as the parent Activity and the "span event" is its child Activity. When starting a new Activity, .NET will set the parent id automatically, so it keeps the hierarchy information. ref

The best practice mentioned in this doc says "Each library or library subcomponent can (and often should) create its own source. "

So, my thought is to let the feature manager have an ActivitySource and when a feature flag evaluation happens (it could be either a call of IsEnabled or GetVariant), it will start an activity.

BTW, if we purely think from the side of supporting OpenTelemetry:
image

From this doc, I can not tell whether the OpenTelemetry .NET SDK will look at Activity's event. We need to do more investigation about it.

@julealgon
Copy link
Contributor Author

@drago-draganov

I think the important part for tracing is the feature flag evaluation, not the flag check call.

I'm a bit confused by this though. They are essentially the same to me: the call triggers an evaluation. If a call is happening in a loop, then surely we would want to surface all of those. If you call some SQL procedure in a redundant way multiple times, you'd want to see all those represented in spans.

I don't see how feature evaluation as events are any different. If somebody want's to only call it once, they should move it out of the loop.

@julealgon
Copy link
Contributor Author

@zhiyuanliang-ms

I was thinking about FeatureManagement NOT creating an activity, but adding an activity event to the current activity. This is how the OpenTelemetry specification defines feature toggle checks: as events, not as spans.

I understand this concern. The open telemetry feature flag spec defines feature flag evaluations within a transaction as span events. But my concern is that the concept in the open telemetry spec is language neutral. You can interpret "transaction" as Activity and "span event" is an AcitivityEvent of it. Or we can start an activity called "feature_management" when any public of feature manager is called and it will have some "feature_flag" events as some feature flags are evaluated.

You surely could start an activity to represent the overall flow, yes, but I think it wouldn't be useful since the check is almost instantaneous, and any actual external activity would already be covered by things such as HTTP instrumentation or SQL instrumentation for actually fetching values from the configuration (or somewhere else a custom provider decides to fetch from). Additionally, such activity would be mapping a span that is not part of the OTEL spec too... so I think you'd be adding a bit of complication to the whole flow by doing it.

Since the OTEL spec only defines feature evaluations, I would personally focus on that only and maybe start a discussion around representing more about the feature management as an actual full-blown span.

Maybe I am wrong, but I think we can also interpret "transaction" as the parent Activity and the "span event" is its child Activity. When starting a new Activity, .NET will set the parent id automatically, so it keeps the hierarchy information. ref

I think the highlighted part is a stretch. An activity clearly represents a 1:1 mapping with a "span" in OpenTelemetry. Span events are their own abstraction and model an instantaneous action instead of an operation window like spans do.

Hierarchy is kept between parent and child, sure, but this doesn't change the semantics of each entity: spans are one thing, events are another.

If the team would want to dispute this, I think it would need to happen with the OTEL Spec team. From my personal perspective, it does make a ton more sense to model feature evaluations as events though, instead of spans. It just feels more correct from a conceptual standpoint to me.

The best practice mentioned in this doc says "Each library or library subcomponent can (and often should) create its own source. "

So, my thought is to let the feature manager have an ActivitySource and when a feature flag evaluation happens (it could be either a call of IsEnabled or GetVariant), it will start an activity.

I would assume that only applies to when it makes sense for the library to create an Activity in the first place, but again, will leave it up to you to discuss with the OTEL team on whether or not that is a sensible approach. I don't think we need it, personally.

BTW, if we purely think from the side of supporting OpenTelemetry: image

From this doc, I can not tell whether the OpenTelemetry .NET SDK will look at Activity's event. We need to do more investigation about it.

We've been using the RecordException capability for AspNetCore, SQL and HTTP spans for quite a bit now and they are properly propagated and surfaced in Datadog. Note that these are implemented using activity events.

I would assume any other event would be handled similarly although we don't currently have any on our applications.

@drago-draganov
Copy link

IsEnabled and evaluation are not necessary the same thing. The evaluation is always the same for the same context. Sure, evaluation can be triggered on the first call to IsEnabled, but that's implementation detail.
IsEnabled is supposed to be lightweight, almost like property call. It's often used in binding and scaffolding, similar to configuration. The transaction length is much different than expensive SQL call to be compared.

Do we surface every time when the app calls to read key-value setting from IConfiguration?

@zhiyuanliang-ms
Copy link
Contributor

zhiyuanliang-ms commented May 7, 2024

it does make a ton more sense to model feature evaluations as events though, instead of spans. It just feels more correct from a conceptual standpoint to me.

Agree.

@julealgon
Copy link
Contributor Author

@drago-draganov

IsEnabled and evaluation are not necessary the same thing. The evaluation is always the same for the same context. Sure, evaluation can be triggered on the first call to IsEnabled, but that's implementation detail. IsEnabled is supposed to be lightweight, almost like property call. It's often used in binding and scaffolding, similar to configuration. The transaction length is much different than expensive SQL call to be compared.

Totally fair. But I think the spec is really talking about the immediate value of the toggle regardless of any caching (so the IsEnabled in your example). Again, I could be wrong here, I was not part of the feature flag spec definition.

Do we surface every time when the app calls to read key-value setting from IConfiguration?

We don't but that's not an apples-to-apples comparison IMHO. General configuration is not part of the OTEL spec (it could end up being, in the future... that would be interesting) so there is nothing defined for it. And while one can compare settings to feature flags, feature flags is like a higher-level abstraction that can work on top of IConfiguration. They aren't directly comparable.

@drago-draganov
Copy link

drago-draganov commented May 7, 2024

@julealgon,

Feature Management is often seen as extension/feature of the configuration system. That's also from customers and providers prospective. With dynamic configuration already a norm, variant-based feature management, etc., these are much closer that it seems, in some cases even interchangeable.

@julealgon
Copy link
Contributor Author

@drago-draganov

Feature Management is often seen as extension/feature of the configuration system. That's also from customers and providers prospective. With dynamic configuration already a norm, variant-based feature management, etc., these are much closer that it seems, in some cases even interchangeable.

Sure, I'm aware of that in the case of Microsoft.FeatureManagement, but that's not necessarily true for other feature flag frameworks, or even feature flagging as a concept.

Keep in mind that OTEL tries to be framework-agnostic as much as possible and focus on semantics in the conventions. If you want to push for a standardized view of "configuration in general" and "feature flags" in the OTEL spec, I'd probably support you there but that's a much broader topic than what we are talking about here. I don't see the current spec as a blocker for the implementation on FeatureManagement. If configuration and feature flags are later unified in some sort of new spec, then it would be a matter of adapting to that breaking change in here.

Besides, if you want to make the argument that strongly, I'd then challenge you with the question of "why do we even need a separate feature toggling abstraction/interface, if I can just inject an IOptions<SomeModelWithABoolean> and test that boolean internally like it was a feature toggle?" If you follow it to its conclusion, the entire feature toggling abstraction would fall apart and instead what we'd end up with is proper multi-tenant IConfiguration, with support for "random per context/percentage" values etc.

Again... I could agree with you if you went on that direction, but that's a totally separate can of worms IMHO.

@drago-draganov
Copy link

drago-draganov commented May 7, 2024

Sure, I'm aware of that in the case of Microsoft.FeatureManagement, but that's not necessarily true for other feature flag frameworks, or even feature flagging as a concept.

Definitely not unique for Microsoft.FeatureManagement. If I remember correctly, Split and LaunchDarkly support complex (multi-value) variant feature flags.

If you want to push for a standardized view of "configuration in general" and "feature flags" in the OTEL spec...

Not at all. Just commenting on "feature flags" as a target for OTEL spec, while configuration is not under consideration.

"why do we even need a separate feature toggling abstraction/interface, if I can just inject an IOptions and test that boolean internally like it was a feature toggle?"

Indeed, we don't need separate abstraction in principle. But unfortunately, most config systems are not context aware to allow for targeting.

@julealgon
Copy link
Contributor Author

Sure, I'm aware of that in the case of Microsoft.FeatureManagement, but that's not necessarily true for other feature flag frameworks, or even feature flagging as a concept.

Definitely not unique for Microsoft.FeatureManagement. If I remember correctly, Split and LaunchDarkly support complex (multi-value) variant feature flags.

Oh to be clear, I wasn't referring to variant support there, but to the concept that the feature management system is built on top of a generic configuration system. Apologies if I misunderstood your original statement there. No disagreement from me in terms of variant support for sure.

@drago-draganov
Copy link

I didn't mean to relate that Feature Management is linked to configuration because it can be implemented on top of generic configuration system. It's rather the use case. Feature Variants make it a bit clearer, that's why I pointed to them. A variant feature allows for non-boolean (complex) value type, even arbitrary json. The concepts of enabled/disabled fades away, replaced by default treatment and targeting. These are mostly used as context-aware configuration settings, and less as if-else control flow.

@zhiyuanliang-ms
Copy link
Contributor

zhiyuanliang-ms commented May 8, 2024

I would assume any other event would be handled similarly although we don't currently have any on our applications.

@julealgon You are right! OTel .NET SDK can recognize Activity's event.
image

Another question is if we only want feature manager to add event to the existing activity (which is most likely the Activity.Current), what do you think should be the behavior of feature manager when there is no existing activity? Should it do nothing or create an activity by itself?

Besides, if we want to convert the "feature_flag" activity event to Application Insights EventTelemetry. Without depending on OpenTelemetry, if the feature manager doesn't create any activity, we have to listen all activities and find whether there is any activity contains a "feature_flag" activity.

Our current pattern is that as long as the user uses our built-in App Insights telemetry publisher, a "FeatureEvaluation" custom event will be sent automatically.

If feature manager doesn't create any activity, when there is no activity, no feature flag telemetry will be sent to App Insights.

@julealgon
Copy link
Contributor Author

@zhiyuanliang-ms

Another question is if we only want feature manager to add event to the existing activity (which is most likely the Activity.Current), what do you think should be the behavior of feature manager when there is no existing activity? Should it do nothing or create an activity by itself?

Yeah this is a good question. If it was up to me, I'd probably NOT create an activity, but I would maybe keep your custom notification hook in parallel so you can do other types of integrations even when no Activity is present at all.

Besides, if we want to convert the "feature_flag" activity event to Application Insights EventTelemetry. Without depending on OpenTelemetry, if the feature manager doesn't create any activity, we have to listen all activities and find whether there is any activity contains a "feature_flag" activity.

Yeah I understand this is a bit tricky, and is why I mentioned above that maybe keeping some sort of custom hook there is still a good idea, even if the library interacts with an optional Activity by default. I wouldn't think that would be bad in any way, as long as the default Activity-based behavior is still there for those that don't need the fully custom pipeline.

Our current pattern is that as long as the user uses our built-in App Insights telemetry publisher, a "FeatureEvaluation" custom event will be sent automatically.

If feature manager doesn't create any activity, when there is no activity, no feature flag telemetry will be sent to App Insights.

Yeah that makes sense. Not sure how you'd code this in a way that it still pushes in a custom way to App Insights but you might be able to just detect the presence of Azure Insights integration in the library itself and then if that is turned on, employ the custom publisher as well.

I will say though that the fact that we even have this custom integration path with Azure telemetry seems like a gap to me in either the spec or in the Azure implementation side. It shouldn't be the case IMHO that app insights needs anything custom moving forward if the OTEL spec already models that interaction. But of course that's up to you guys on how you want to move Azure telemetry forward. As a consumer, I'd certainly try to be as far away as possible from these custom integrations so that my applications are as vendor-agnostic as it can get. The whole Activity-based flow, plus OTEL and the OTLP exporter achieves that nicely for us and if it was up to me I'd push people in this direction as much as possible.

@rossgrambo
Copy link
Contributor

Hey @julealgon,

I just want to thank you for calling this out and helping educate us on OpenTelemetry / Activity based telemetry. We're now moving in this direction. If you're interested, here's a quick summary of where we landed:

  1. We will emit an ActivityEvent. OTel spec defines either a SpanEvent or Log is viable, and we landed on SpanEvent making the most sense for now.
  2. We landed on our own ActivitySource, as it makes listening concise, ensures an activity exists, and makes more sense if developers implement their own components and end up hitting dbs or any other slow operations during the request. We only create an Activity & ActivityEvent when the feature flag explicitly has telemetry enabled so it shouldn't be too noisy unless explicitly enabled.
  3. We're not taking a direct dependency on OTel- but instead using the .NET distributed tracing, which can easy be added as a source to a developer's OpenTelemetry.
  4. We're not adhering to the OTel Feature Flag schema yet- as we already have our schema in use and OTel's isn't complete yet. Once their schema is stable we'll work on moving over.

PR is out for review now- so some of these things might change. But figured it was worth sharing! 😄

@julealgon
Copy link
Contributor Author

I appreciate the update @rossgrambo . Your plan seems sensible to me. I look forward to integrating this in our project as we get rid of our custom feature toggling abstractions and switch into Microsoft FeatureManagement and our new OpenTelemetry pipeline.

Will keep an eye out for when this becomes available in the upcoming releases 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants