Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(AspNet.TelemetryHttpModule) Activity.Current not restored before all IIS LifeCycle events #3286

Closed
chris-smith-zocdoc opened this issue May 17, 2022 · 6 comments · Fixed by open-telemetry/opentelemetry-dotnet-contrib#761
Assignees
Labels
bug Something isn't working

Comments

@chris-smith-zocdoc
Copy link
Contributor

Bug Report

Activity.Current lost in some IIS lifecycle events

We have a custom HttpApplication with handlers for most IIS lifecycle events. These events can start child spans (mostly sql) that should be attached to the current request trace. For a percentage of the IIS lifecycle events, we are seeing traces that don't appear to be properly restoring Activity.Current, causing child spans to start their own trace.

OpenTelemetry = 1.2.0
OpenTelemetry.Api = 1.2.0
OpenTelemetry.Exporter.Zipkin = 1.2.0
OpenTelemetry.Instrumentation.AspNet = 1.0.0-rc9.2
OpenTelemetry.Instrumentation.AspNet.TelemetryHttpModule = 1.0.0-rc9.2
OpenTelemetry.Instrumentation.SqlClient = 1.0.0-rc9.2

Runtime version net48

Symptom

I believe Activity.Current is null

What is the expected behavior?

Activities started in IIS lifecycle methods should be attached to the request trace to which they belong

What is the actual behavior?

The Activities are orphaned, creating single span traces.

Additional Context

We aren't seeing this in all lifecycle events, only some of them and only a certain percentage of the time.

Here is the approximate percentage of time we're seeing this occur by lifecycle method. It happens most frequently in PreSendRequestHeaders (nearly every time)

Unaffected

IIS Life Cycle Method   % parent span exists
PostAuthenticateRequest	      1
PostMapRequestHandler	      1
OnAuthenticate	              1
BeginRequest	              1
RequestHandlerExecute	      0.99999 (assuming this is probably not a real issue)

Affected by the bug

PreRequestHandlerExecute	0.98452
PostRequestHandlerExecute	0.98104
PostLogRequest	                0.83814
PreSendRequestHeaders	        0.0085

How are we tracking this? We have a custom span processor that adds the current lifecycle method to each span by reading it from HttpContext.Items

@chris-smith-zocdoc chris-smith-zocdoc added the bug Something isn't working label May 17, 2022
@chris-smith-zocdoc
Copy link
Contributor Author

chris-smith-zocdoc commented May 17, 2022

I believe this could be resolved by changing this line

if (context.CurrentNotification == RequestNotification.ExecuteRequestHandler && !context.IsPostNotification)
{
ActivityHelper.RestoreContextIfNeeded(context.ApplicationInstance.Context);
}

to

if (!context.IsPostNotification)
{
    ActivityHelper.RestoreContextIfNeeded(context.ApplicationInstance.Context);
}

@CodeBlanch
Copy link
Member

@chris-smith-zocdoc

So that rule is just something we carried over from the original version here: https://github.com/aspnet/Microsoft.AspNet.TelemetryCorrelation/blob/50f0acb7a0cb164d834eb91b839959a841577302/src/Microsoft.AspNet.TelemetryCorrelation/TelemetryCorrelationHttpModule.cs#L84

I don't think anyone is opposed to changing it.

Could you possibly share a small repo that reproduces the problem so I can take it for a spin?

@CodeBlanch CodeBlanch self-assigned this Jun 15, 2022
chris-smith-zocdoc added a commit to chris-smith-zocdoc/OtelSample that referenced this issue Jun 17, 2022
@chris-smith-zocdoc
Copy link
Contributor Author

Thanks @CodeBlanch I've setup a minimum reproduction here https://github.com/chris-smith-zocdoc/OtelSample

You should be able to open the solution and hit F5 to run under IIS Express to see this easily.

Set a breakpoint on https://github.com/chris-smith-zocdoc/OtelSample/blob/master/Global.asax.cs#L39

@CodeBlanch
Copy link
Member

@chris-smith-zocdoc Thanks for that code, very helpful! I took it for a spin, here's my findings...

It is a bit more complicated than simply removing context.CurrentNotification == RequestNotification.ExecuteRequestHandler.

The order of events firing in OnExecuteRequestStep looks like this:

  • BeginRequest (Activity started)
  • AuthenticateRequest
  • AuthorizeRequest
  • ResolveRequestCache
  • MapRequestHandler
  • AcquireRequestState
  • PreExecuteRequestHandler
  • ExecuteRequestHandler
  • ReleaseRequestState
  • UpdateRequestCache
  • LogRequest
  • EndRequest (Activity is stopped)
  • SendResponse (Application_PreSendRequestHeaders/PreSendRequestContent called at this point)

That last line is the important one. Even if we call the restore code, the Activity has been stopped so there is nothing to restore 🤣

We could move the stop to the SendResponse event. I need to test this IIS URL Rewrite logic in there to make sure that works. BUT, let me ask first, @cijothomas @alanwest @reyang how do we feel about changing how this instrumentation works so that some of the time spent sending the response starts to be included in the span duration?

@chris-smith-zocdoc
Copy link
Contributor Author

@CodeBlanch that may be true for PreSendRequestHeaders but not the other events which were showing smaller amounts of loss (~2% to 16%)

PreRequestHandlerExecute, PostRequestHandlerExecute, PostLogRequest	  

So I think there are two decisions here:

  • can we restore before all events? I'd be happy to send a pr for this
  • should we move the Activity stop call

@chris-smith-zocdoc
Copy link
Contributor Author

@CodeBlanch I opened open-telemetry/opentelemetry-dotnet-contrib#761 to restore the context before all events

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants