-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential improvement for ActivitySource name matching #708
Comments
The runtime implementation for invoking the ShouldListenTo(ActivitySource) callback should only occur once for each (ActivityListener, ActivitySource) pair in the process and then the result is cached. This means any performance changes in this suggestion probably show up as a small difference in process startup time. Were you concerned about process startup time or perhaps there was a misunderstanding that the performance of this callback would influence steady-state throughput? A small optimization that might be useful regardless is creating the HashSet using the constructor that takes a custom IEqualityComparer<T> and using StringComparer.OrdinalIgnoreCase. This will let you avoid allocating new upper case strings. I did the benchmark below to show how different options compare. Its possible I made mistakes or alternate variations will let you see more nuanced differences. I was surprised that Regex lookup was as slow as it was, but I am not surprised overall that the case insensitive hash lookup won the match up.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
namespace ConsoleApp24
{
class Program
{
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<ActivitySourceNameBenchmark>();
}
}
[MemoryDiagnoser]
public class ActivitySourceNameBenchmark
{
private Random r = new Random();
HashSet<string> hs = new HashSet<string>();
HashSet<string> hsIgnoreCase = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
string componentName = "SomeCompany.SomeComponent.MakeTheNameABitLonger";
Regex regex;
public ActivitySourceNameBenchmark()
{
hs.Add(componentName.ToUpperInvariant());
hsIgnoreCase.Add(componentName);
regex = new Regex(Regex.Escape(componentName), RegexOptions.Compiled | RegexOptions.IgnoreCase);
}
[Benchmark]
public Regex CreateRegex() => new Regex(Regex.Escape(componentName), RegexOptions.Compiled | RegexOptions.IgnoreCase);
[Benchmark]
public bool UpperCaseLoopkup() => hs.Contains(componentName.ToUpperInvariant());
[Benchmark]
public bool IgnoreCaseLookup() => hsIgnoreCase.Contains(componentName);
[Benchmark]
public bool RegexLookup() => regex.IsMatch(componentName);
}
} |
+1 on the consideration of startup overhead of RegEx. This might not be a problem for services but definitely I've seen a lot in device/app scenario. Looks like we're looking from different angles:
@noahfalk looks like you have a more powerful machine than me, in C Runtime we used to give developers the low end machines so they write fast code 🤣.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApp24
{
class Program
{
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<ActivitySourceNameBenchmark>();
}
}
[MemoryDiagnoser]
public class ActivitySourceNameBenchmark
{
private Random r = new Random();
HashSet<string> hs = new HashSet<string>();
// UseRandomizedStringHashAlgorithm = 1
HashSet<string> hsIgnoreCase = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
private string componentName = "SomeCompany.SomeComponent.MakeTheNameABitLonger";
Regex regex;
public ActivitySourceNameBenchmark()
{
var componentNames = new List<string> {
// if all the sources are controlled by the developer, there shouldn't be a concern of hash collision attack
// collision should be rare, maybe we can do a custom perfect hash?
"Microsoft.Azure.Source1",
"Microsoft.Azure.Source2",
"Microsoft.Azure.Source3",
"Microsoft.Azure.Source4",
"Microsoft.Azure.Source5",
"Microsoft.Azure.Source6",
"Microsoft.Azure.Source7",
"Microsoft.Azure.Source8",
"Microsoft.Azure.Source9",
};
var patterns = new List<string>();
foreach (var name in componentNames)
{
hs.Add(name.ToUpperInvariant());
hsIgnoreCase.Add(name);
patterns.Add(Regex.Escape(name));
}
var pattern = String.Join('|', patterns);
regex = new Regex(pattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);
}
[Benchmark]
public Regex CreateRegex() => new Regex(Regex.Escape(componentName), RegexOptions.Compiled | RegexOptions.IgnoreCase);
[Benchmark]
public bool UpperCaseLoopkup() => hs.Contains(componentName.ToUpperInvariant());
[Benchmark]
public bool IgnoreCaseLookup() => hsIgnoreCase.Contains(componentName);
[Benchmark]
public bool RegexLookup() => regex.IsMatch(componentName);
}
} |
This is good to know, I wasn't aware of this caching. This has a small implication that the listener has to give consistent result for the given name/ver (so if we want to change the listener behavior, we should create a new one and discard the old one rather than modify it in place). Seems to be a very good choice considering the outcome vs. implication. In this case, looks like |
Yes using StringComparer.OrdinalIgnoreCase is the preferred method here as it shouldn't allocate extra objects and it should perform fast comparisons too. The other question I have is, why using HashSet and not something like Dictionary? I am asking because HashSet on the full framework has some perf issues which we fixed on the net core only. |
We havent finalized on what would be the default listening model - it could or "listen to all except those explicitly turned off", or "listen to only sources explicitily enabled" or "something else". For now, we can change the comparison to ignorecase to avoid the string allocation. |
I would say try to measure the perf when using HashSet and Dictionary on the full framework. and we can decide which to use. just make sure you use OrdinalIgnoreCase with both when you measure it. let me know if you want any help from me on that. |
The
As (we can still fix it if needed, but not a priority) |
The following code holds a hash set of ActivitySource names - converted to uppercase invariant:
opentelemetry-dotnet/src/OpenTelemetry/Trace/Configuration/OpenTelemetryBuilder.cs
Line 73 in 577ae6c
The following code converts the incoming ActivitySource name to uppercase invariant, and do a hash lookup. This involves a string allocation/conversion, the time complexity is X * O(n), where n is the length of the input name, X is the hash lookup, which varies from 1 to m (where m is the number of activity sources) depending on the hash collision status.
opentelemetry-dotnet/src/OpenTelemetry/Trace/Configuration/OpenTelemetrySdk.cs
Line 69 in ace469d
The potential improvement is to change the API from
AddActivitySource
toSetActivitySources
, thepattern = new regex("|".join(map(sources, s => regex.escape(s)), options=COMPILED | CASE_INSENSITIVE)
.This would result in a compiled DFA which has the best performance - O(n) (where n is the length of the input string), and yet avoid the string uppercase convert / allocation during each matching.
In the future, this also opens a potential to support pattern such like wildcards.
The text was updated successfully, but these errors were encountered: