This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
[3.1] Support large EventSource filter args #27522
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Port #27500 to 3.1
Customer Impact
Today APM tools such as ApplicationInsights ask customers to reference NuGet packages and load in-proc assemblies in order to make distributed tracing work. When an engineer needs to diagnose a production issue and the application developer didn't have the foresight to adopt an appropriate NuGet package already, it is extremely high friction to retroactively add it. It requires source access, rebuilding the application, redeploy to production, and restarting the running instance where the problem is reproing. DiagnosticSourceEventSource and EventPipe provide an alternate telemetry channel that can be enabled on-demand with no restart and no redeploy to solve the same class of issues. The fix here removes a rough edge allowing the alternative EventPipe scenario to work much more effectively for this use case.
How the fix works
Going one level deeper, DiagnosticSourceEventSource requires the telemetry consuming tool to specify a list of runtime events to gather as well as every desired property on those events. This list could easily include dozens of events with multiple properties on each event when trying to use it for distributed tracing. Currently there is a limitation that the complete textual encoding of this list is less than 1KB and it is very easy to exceed this limit for this use case. If the limit is exceeded the runtime ignores the request and logs nothing. The fix relaxes this limit.
Testing
I have privately debugged and tested the fix using a prototype distributed tracing tool.
Risk
Low - ETW blocks tools from entering this codepath with more than 1KB of data so it can only be hit with new EventPipe based tooling. The modified code path is also wrapped in a blanket try/catch region such that even the previous bad behavior of throwing a NullReferenceException was silently ignored and the application continued as if the problematic logging request had never been made.
Original commit text
Fix NullReferenceException. When the filter args exceeded the
hard-coded size limit GetDataFromController would return data = null.
The code previously asserted that data was not null and triggered NRE
when it was. The fix correctly null checks the value instead of
asserting and uses an empty args dictionary in this case, the same as if
filter args had been empty to begin with.
ETW has always limited filter args to 1024 bytes but EventPipe has no
such restriction. When using DiagnosticSourceEventSource it can be
useful to specify a larger filter arg blob. I can't do anything about
ETW's restriction but there is no need for the runtime to force
EventPipe to be equally limited. The larger size also reduces the chance
that we need to hit the fallback path above causing filter args to be
ignored.