-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strings with embedded nulls cause issues with event payload decoding #52025
Comments
Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti Issue DetailsManifest driven C#, however, allows for embedded nulls in strings, and
runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs Lines 1089 to 1091 in c37257f
This can lead to problems for the parser if there are things in the payload after the string. For example:
will result in the following at the reader end: The event contains the full data:
TraceEvent attempts to read a null terminated string, and then interprets the next 4 bytes as the A potential solution would be to truncate input strings to the first null encountered. I imagine this is a very rare occurrence and the overhead of running
|
we should also probably document this behavior in EventSource docs? |
Another possible fix would be to replace null characters with U+2400 SYMBOL FOR NULL ( |
For the self-describing format, DataCollector.AddNullTerminatedString already truncates the string at the first null character: Lines 155 to 162 in d1fed28
This came from dotnet/coreclr#16672; it used counted strings before that. |
As an FYI, this is a long-standing known issue for EventSource. We've done nothing in the past to solve it, and have just allowed it to live on "by design". If we were so inclined to fix it, we could switch to counted strings, though we'd need to add support to EventPipe for this, and then investigate how to address LTTng, and EventListener deserialization for both EventSource produced events and native events that are plumbed up to EventListener implementations. |
Due to how long this behavior has existed, I think it may be beneficial to just document it. Are there known, common use cases where a null would be present in the middle of a C# string? I can't think of any off the top of my head, but they might exist. |
We just hit this issue in .NET Framework because we were tracing some HTTP request bodies using ETW. In our case, one of our APIs was being sent a binary payload that contained multiple '\0' characters. This completely messed up our tracing due to the issue you described, and it was quite a journey to understand what was going on.
I think the problem with this resolution is that there are many cases where event data does not originate from the event emitter (like the case above). So even if you document it, consumers are likely to hit this unless they are explicitly sanitizing all of their event data for null terminators. I'm not sure how complicated the internals are here, but your earlier suggestions of truncation and/or some kind of warning being logged sound appropriate. |
Out of curiosity, were you logging this payload as a string or was one of the in-box providers logging a binary payload as a string? I don't believe this should be affecting arbitrary binary data that is logged. Only strings.
It's not so much that it's complicated, but that it would be a breaking change for some providers (this only affects manifest-based providers; self-described providers are unaffected since they truncate to the first null). Similarly, we would be introducing a O(N) or O(logN) (I don't know if As I said, self-describing providers already truncate to the first null. I'm not opposed to making the change, but I haven't done the due diligence that comes with making a breaking change. @brianrob it sounds like there was a history to making this change. Any context you can share on why we haven't made this change in the past? I would be on board with truncating to the first null and optionally logging an out-of-band message that a string was truncated. With the caveat that I don't want to introduce allocations or O(N) work for every string put through |
This is a binary payload that was converted to a string as part of the logging process. I should have specified that.
By breaking change, are you referring to the change in behavior regarding null terminators, or some other consequence of the proposed change? |
The behavior change. Currently, the data is all there, but the reader doesn't know how to parse it correctly. In theory, so long as the string is only variably sized payload parameter, the parser could be changed to get all the data. If we change the behavior, that could have unintended consequences for readers or any writers that have made changes to get around this limitation. Making the change is a matter of doing due diligence and getting enough signal that the change is worth making. |
This is one of those things that has been around for a long time, and there hasn't been much will to address it - mostly because there was a strong belief that strings are null-terminated. The amount of change involved here also isn't small, because if you switch all strings to be counted strings by default, then this is a breaking change to the serialization format, but it won't be obvious. There's also likely to be a bug tail because there are a handful of places where we serialize and de-serialize. My recommendation here is to just log a If you really do want to change this, it might be good to consider building it as an opt-in, which would allow you to not break any existing implementations, but folks could decorate the parameter as a counted string in the |
decoding Fixes [52025](dotnet/runtime#52025)
* embedded nulls cause issues with event payload decoding Fixes [52025](dotnet/runtime#52025) * update eventsource to not use strings containing nulls Co-authored-by: David Mason <[email protected]> * Fixing whitespace Co-authored-by: David Mason <[email protected]>
Manifest driven
EventSource
s mark all strings aswin:UnicodeString
in the manifest. This is interpreted by TraceEvent and other pieces of tracing infrastructure as "a null-terminated string". There is a separate type for "counted strings" that isn't used byEventSource
.C#, however, allows for embedded nulls in strings, and
String.Length
will give the full length of string including the embedded nulls.EventSource
usesString.Length
when it is encoding strings, e.g.,runtime/src/libraries/System.Private.CoreLib/src/System/Diagnostics/Tracing/EventSource.cs
Lines 1089 to 1091 in c37257f
This can lead to problems for the parser if there are things in the payload after the string. For example:
will result in the following at the reader end:

The event contains the full data:
TraceEvent attempts to read a null terminated string, and then interprets the next 4 bytes as the
baz
parameter which is incorrect.A potential solution would be to truncate input strings to the first null encountered. I imagine this is a very rare occurrence and the overhead of running
String.IndexOf((char)0)
should be minimal for the common use case. We could also push and out of band message that an event payload string was truncated.CC @sywhang @noahfalk
The text was updated successfully, but these errors were encountered: