-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Better ILogger sampling and filtering #82465
Comments
Tagging subscribers to this area: @dotnet/area-extensions-logging Issue DetailsProblem:Large scale .NET customers are constantly complaining that logging is too expensive to turn on in production, perf, or stress scenarios. Their external log storage supports sampling of logs to limit storage costs, but that doesn't address the perf impact of collecting and exporting the data from the app. Existing partial solutions:
Better potential solutions:There are several things we might want to look into and they aren't mutually exclusive:
Ideally any/all of the above options would be configurable in code or via reloadable configuration and they are generic enough to work with any log source and sink. Customers may want to use config settings to mitigate an active issue where a production is logging too much or they may want to proactively configure these options to curate their log output.
|
Added some additional comments |
Ideally all of these would be configurable by category like log level today. E.g. for a random request I want to see the routing, auth, and MVC debug level and higher logs. |
Does this imply that the SRE should be able to get a full list of logs that are available? (e.g. discover all the logger categories/instances) |
More likely they're told which logs to enable for a given feature area or situation. |
Discovery of telemetry is a general problem, without good solutions. The same problem exists for Activities, Counters, EventSource, & Logging. Its further complicated by conditional code in the app that means that log messages will only be sent if that log provider and level are supported. There is no good way to statically analyze the application and produce a list of log entries, events etc. The main way to discover them seems to be through reading the code, documentation or a tool that will capture all of them and enumerate the results. I wrote this for eventsource that will listen to all the events while running, and then provide details of them. But using it requires modifying the app code. I theory an SDK like open-telemetry could include a similar feature in it - as a form of collection mode that can be run for a period, that will then produce a report with all the telemetry that could be collected. Its semi-expensive to run in that all the code paths for generating all the telemetry will be run, but its not egressing much data out of the application. For that reason it would probably a good feature for dotnet-counters, dotnet-trace and dotnet-monitor. Does the app-insights SDK have any command/control capability from the backend to the app? As this is something I'd expect a n SRE to run occasionally when examining a problem, rather than something the app is written to send? Maybe this is a case where you'd use one of the dotnet-* tools from a console to interrogate the running app to determine what its emitting. |
I have different understanding. I think ActivitySource, Meter and EventSource allow folks to inject a "listener", and that listener will get notified for all the existing sources and the new sources that will be created in the future. Which means there is a central way of discovering all the sources. ILogger has a different philosophy that there is no such central place. |
With the above, it will tell you the sources, but not the events that occur - which is half the value so that you can know what settings to use to see the events fired so you can pick the appropriate settings. With ILogger, you could create a new provider, and configure it to enumerate the events that are fired, rather than log each of them individually. |
Would you elaborate a bit? With the listener/source model, it is possible to build a listener that catches all the events, although it might not be the most efficient way, it has provided a mechanism for folks to get what they want if they are willing to pay the cost. I don't know a way how to get a holistic view of "given an application which uses several 3rd party libs, find all the loggers and hook all the logs/records" unless I use a debugger. |
Following the code in https://learn.microsoft.com/en-us/dotnet/core/extensions/custom-logging-provider
It would collect information on each log entry passed through it. It would build up a dictionary with each log message, including:
Then when It would need to be enabled via program.cs, in addition to whatever other providers the app wants to use: var builder = WebApplication.CreateBuilder(args);
builder.Logging.AddLogEntryEnumeratorLogger(); And the configured in AppSettings.json, as to whether to collect the info, and how to output it. {
"Logging" : {
"LogEntryEnumeratorLogger" : {
"Enabled" : true,
"OutputTo" : "File",
"FilePath" : "./log_entries.json"
}
}
} There would be a cost to operate the logger, as it would be building up a structure of all the unique log messages. It wouldn't want to format the messages, just collect the format string itself. If the log entry has an ID, then for efficiency it can assume that all messages with that ID would be the same, so it can shortcut having to compare them all. For messages without an ID, it would compare the format string. When 3rd party libraries are logging, they should be retrieving the ILogger from DI, and so as long as this was initialized before they are called, then it will be able to capture their entries too. |
So it is Friday, and it was a good opportunity to write some code to delve more into Logging implementation, I hacked up a quick demo. Using https://gist.github.com/samsp-msft/37a5aba3ef7abb5bbadf01ed48477886 and builder.Logging.AddLogEnumeratorLogger();
...
app.MapGet("/Logs", () => LogEnumeratorLoggerProvider.dumpLogInfo()); and a configuration of {
"Logging": {
"LogLevel": {
"Default": "Information",
"Microsoft.AspNetCore": "Warning",
"System.Net.Http": "Warning"
},
"Console": {
"IncludeScopes": true,
"LogLevel": {
"Microsoft": "Warning",
"Default": "Information"
}
},
"LogEnumerator": {
"LogLevel": {
"Default" : "Trace"
}
}
}
} Category Level ID Message Format Count
System.Net.Http.HttpClient.Default.ClientHandler Information 100 Sending HTTP request {HttpMethod} {Uri} 120
System.Net.Http.HttpClient.Default.ClientHandler Information 101 Received HTTP response headers after {ElapsedMilliseconds}ms - {StatusCode} 119
System.Net.Http.HttpClient.Default.ClientHandler Trace 102 120
System.Net.Http.HttpClient.Default.ClientHandler Trace 103 119
Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware Trace 2 All hosts are allowed. 4
Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware Debug -- Wildcard detected, all requests with hosts will be allowed. 1
Microsoft.AspNetCore.Routing.EndpointMiddleware Information 1 Executed endpoint '{EndpointName}' 3
Microsoft.AspNetCore.Routing.EndpointMiddleware Information -- Executing endpoint '{EndpointName}' 4
Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware Error 1 An unhandled exception has occurred while executing the request. 1
Microsoft.AspNetCore.Diagnostics.DeveloperExceptionPageMiddleware Warning 2 The response has already started, the error page middleware will not be executed. 1
Microsoft.Hosting.Lifetime Information 14 Now listening on: {address} 2
Microsoft.Hosting.Lifetime Information -- Hosting environment: {EnvName} 1
Microsoft.Hosting.Lifetime Information -- Content root path: {ContentRoot} 1
Microsoft.Hosting.Lifetime Information -- Application started. Press Ctrl+C to shut down. 1
Microsoft.Extensions.Hosting.Internal.Host Debug 1 Hosting starting 1
Microsoft.Extensions.Hosting.Internal.Host Debug 2 Hosting started 1
Microsoft.AspNetCore.Routing.Matching.DfaMatcher Debug 1001 {CandidateCount} candidate(s) found for the request path '{Path}' 4
Microsoft.AspNetCore.Routing.Matching.DfaMatcher Debug 1005 Endpoint '{Endpoint}' with route pattern '{RoutePattern}' is valid for the request path '{Path}' 2
Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets Debug 6 Connection id "{ConnectionId}" received FIN. 2
Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets Debug 7 Connection id "{ConnectionId}" sending FIN because: "{Reason}" 2
Microsoft.AspNetCore.Hosting.Diagnostics Information 1 4
Microsoft.AspNetCore.Hosting.Diagnostics Information 2 3
Microsoft.AspNetCore.Hosting.Diagnostics Debug 13 Loaded hosting startup assembly {assemblyName} 3
System.Net.Http.HttpClient.Default.LogicalHandler Information 100 Start processing HTTP request {HttpMethod} {Uri} 120
System.Net.Http.HttpClient.Default.LogicalHandler Information 101 End processing HTTP request after {ElapsedMilliseconds}ms - {StatusCode} 119
System.Net.Http.HttpClient.Default.LogicalHandler Trace 102 120
System.Net.Http.HttpClient.Default.LogicalHandler Trace 103 119
Microsoft.AspNetCore.Server.Kestrel.Http2 Debug 35 Trace id "{TraceIdentifier}": HTTP/2 stream error "{error}". A Reset is being sent to the stream. 1
Microsoft.AspNetCore.Server.Kestrel.Http2 Trace 37 Connection id "{ConnectionId}" received {type} frame for stream ID {id} with length {length} and flags {flags}. 9
Microsoft.AspNetCore.Server.Kestrel.Http2 Trace 49 Connection id "{ConnectionId}" sending {type} frame for stream ID {id} with length {length} and flags {flags}. 91
Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware Debug 1 Failed to authenticate HTTPS connection. 2
Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware Debug 3 Connection {connectionId} established using the following protocol: {protocol} 1
Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware Debug 1 Request matched endpoint '{EndpointName}' 4
Microsoft.AspNetCore.Server.Kestrel.Connections Debug 1 Connection id "{ConnectionId}" started. 3
Microsoft.AspNetCore.Server.Kestrel.Connections Debug 2 Connection id "{ConnectionId}" stopped. 2
Microsoft.AspNetCore.Server.Kestrel.Connections Debug 39 Connection id "{ConnectionId}" accepted. 3
LoadTest Information -- using service url {url} 2
LoadTest Trace -- Thread {threadID} making request {count} 120
LoadTest Information -- Loading service #{index} for {duration} with {threads} 2
Microsoft.AspNetCore.Server.Kestrel Error 13 Connection id "{ConnectionId}", Request id "{TraceIdentifier}": An unhandled exception was thrown by the application. 1 |
This also makes me question whether we need to have filtering built in - or whether it could be another Logging provider that you would nest other loggers under. builder.Logging.AddLogFilter( logs =>
{
logs.AddConsoleLogger();
Logs.AddXXXLogger();
} Which would be all managable through config as to what filtering is desired. |
@samsp-msft You'll be to collect information those logs only which were made using the |
More examples of wanting to emit a specific log from a category: microsoft/reverse-proxy#2202 |
#49924 appears to be a special case of this issue. |
I'd like to propose a narrow and more concrete solution to see if we can move this forward. We would add a new API similar to this (I'm not too concerned with exact naming or signature, the proposal is primarily about the general concept and scope): public static class LoggingFilter
{
public static void AddDynamicFilter(
this ILoggingBuilder builder,
string category,
Func<LogLevel, EventId, TState, Exception?, Func<TState,Exception?,string>, bool> filter);
} Every time ILogger.Log() gets called on an ILogger produced from this ILoggerFactory it would do:
Some possible usage examples: // sample a randomly selected 5% of logs
builder.Logging.AddDynamicFilter(null,
(level, eventId, state, exception, formatter) => Random.Shared.NextDouble() < 0.05);
// only log messages if there is an active Activity which is marked as Recorded
// (distributed tracing sampling rules determine this)
builder.Logging.AddDynamicFilter(null,
(level, eventId, state, exception, formatter) => Activity.Current?.IsRecorded);
// eliminate a known noisy log message
builder.Logging.AddDynamicFilter("FooCorp.SomeCategoryName",
(level, eventId, state, exception, formatter) => eventId != UndesiredEventId);
// rate limiting
// assuming some helper rateLimiter object has already been written that keeps track
// of an incrementing count and resets it on a timer
builder.Logging.AddDynamicFilter(null,
(level, eventId, state, exception, formatter) => rateLimiter.IsNewMessageWithinLimit()); My immediate focus is less on the details and more on getting some agreement about the scope of solution. In particular I am suggesting these things are out of scope for this feature:
|
public static class LoggingFilter
{
public static void AddDynamicFilter(
this ILoggingBuilder builder,
string category,
Func<LogLevel, EventId, TState, Exception?, Func<TState,Exception?,string>, bool> filter);
} I'm not sure Being able to access the state is useful. It would require casting it to We don't want to preemptively cast state to a list because that could cause boxing, and state doesn't have to implement the list interface. Update: Would changing the signature to |
|
Through system.configuration, the settings for categories is a prefix match, that would probably be best here. Its not general wildcards, or regex, so should be about as fast as a normal string match. How will this work with a hierarchy. If one filter is filtering out "foo.bar" and another is trying to filter in "foo.bar.baz" won't the former deny the latter, and so break what the developer is trying to achieve? Maybe they need to be called from least to most specific, and have the current on/off state included in the parameters. That way more specific filters will still be called, and can make the call based on previous filters in the chain. |
So far all the feedback sounds like it is refining the design so I'm taking it as people are on-board with the general premise/scope so far. I'm not trying to dive too far into the details yet, but off-the-cuff thoughts for various points raised:
You are right a delegate is insufficient. We probably need an interface definition where one of the methods has a generic type parameter on it or we could exclude TState from the filtering parameter list. ILoggerDynamicFilter
{
bool Filter<TState>(...);
}
I was imagining it would be the latter but omitted this for brevity. I expect all the perf cost is paid at startup when determining which filters apply to which ILogger instances. Once that is determined the category string no longer plays a role and the Logger is just enumerating and invoking a list of filters that were determined to match.
Yeah we might want those. I'd consider any form of Add/Remove/Clear or any other API that manages the list to be in the scope of the proposal. We'd just have to narrow in on exactly which ones we want.
My initial thought (which I am not at all tied to) was that we'd enumerate every filter that matched and as soon as one returned false then we short circuit. I think this is an area of the design that would need more investigation and/or feedback. |
Getting into the weeds here a bit but I agree with @JamesNK. Working with Imagine a bunch of filters doing... if (state is IReadOnlyList<KeyValuePair<string, object>> list)
{
...
} That is going to cause a lot of boxing. One thing I would like to see which might help here is for FormattedLogValues to be made if (typeof(TState) == typeof(FormattedLogValues))
{
var typedState = (FormattedLogValues)(object)state;
...
}
else if (state is IReadOnlyList<KeyValuePair<string, object>> list)
{
...
} The idea being we could avoid boxing (for these filtering cases but also normal log handling) for the common |
If you are careful about the exact code pattern you use, the JIT can already optimize away the boxing operation without needing to special case the type in code (Example). In the short term you can change the code to this and avoid the boxing: if (state is IReadOnlyList<KeyValuePair<string, object>>)
{
((IReadOnlyList<KeyValuePair<string, object>>)state).WhateverFunction();
} In the longer term the JIT is still improving their analysis to recognize the box removal optimization in more code patterns. I wouldn't be surprised if your |
Interesting thanks for sharing @noahfalk! That sharplab example is 🤯 Improving the JIT to handle this would be 💯 🚀 |
This proposal looks good to me. Thank you. |
I tried to create a sampler for logs via OpenTelemetry using the OTel processor extension, but it turns out that there is no way to suppress entries with that API. I tried setting the log level of the discarded messages to The semantics of what I would want the sampler to do is captured in https://gist.github.com/samsp-msft/e37120be7322d17500dcc797a840faef. The API proposed above would work perfectly for this scenario. |
Problem:
Large scale .NET customers are constantly complaining that logging is too expensive to turn on in production, perf, or stress scenarios. Their external log storage supports sampling of logs to limit storage costs, but that doesn't address the perf impact of collecting and exporting the data from the app.
Each log level adds an order of magnitude to the number of records produced, making collection prohibitive, but not collecting them reduces the chances of detecting problems earlier, or further optimizing performance.
Existing partial solutions:
if(Activity.Current?.IsRecorded) { _logger.LogInfo(...); }
. This solution doesn't work for log statements in other libraries and it feels awkward to tie logging code from all over to a specific log sampling strategy.Better potential solutions:
UPDATE 1/12: I have added a more specific proposed solution in the discussion below
There are several things we might want to look into and they aren't mutually exclusive:
Ideally any/all of the above options would be configurable in code or via reloadable configuration and they are generic enough to work with any log source and sink. SRE may want to use config settings to mitigate an active issue where a production is logging too much or they may want to proactively configure these options to curate their log output. Being able to change configuration on the fly enables rapid exploration of problem hypotheses without requiring taking the service offline to update.
The text was updated successfully, but these errors were encountered: