-
Notifications
You must be signed in to change notification settings - Fork 780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize unnecessarily allocation when initializing MetricProvider #1685
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1685 +/- ##
==========================================
+ Coverage 83.26% 83.28% +0.02%
==========================================
Files 250 250
Lines 6870 6874 +4
==========================================
+ Hits 5720 5725 +5
+ Misses 1150 1149 -1
|
@@ -59,7 +59,7 @@ internal BoundCounterMetric<T> Bind(LabelSet labelset, bool isShortLived) | |||
lock (this.bindUnbindLock) | |||
{ | |||
var recStatus = isShortLived ? RecordStatus.UpdatePending : RecordStatus.Bound; | |||
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus)); | |||
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, (_) => this.CreateMetric(recStatus)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this still allocates a delegate for the lambda. The ones below aren't using a closure so we could just pass a static ref to avoid that. For the one here we could try calling TryGetValue first and then TryAdd if needed. Not sure which perf hit is worse, double-call or allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your comments. However, the initial concern was more on the logical side as it relates to calling this.CreateMetric() (and new *MetricSdk()) unnecessarily, rather than the allocation for the delegate and/or closure. This may become a concern as these "constructors" become more complex and requires additional resources to complete.
As to your concerns regarding lambdas/delegates, I can try doing some profiling of delegates vs methods. (i.e. TryGetValue/TryAdd/etc...) In my initial google search, it seems the debate is leaning toward use of delegates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing some research and running some perf tests on my own, I've come to the following conclusions…
-
Current PR does not 100% solve the issue of over-allocating a TValue when inserting into a ConcurrentDictionary. The PR will reduce the probability of this “leakage”.
-
Use of Lambda expressions with GetOrAdd is more concise (IMHO) vs using TryGetValue/TryAdd/etc… However, there may be a small performance penalty if lambda expression is “declared”/allocated constantly (as in the case of closures, with older .NET).
-
I don’t have enough context (of this project) to know what priorities and tradeoffs are valued, and thus, I look forward to feedback from owners and members here.
In more details… in the context of using ConcurrentDictionary<int,int> and my test program inserting and reading 40 Million entries…
-
GetOrAdd(TKey,TValue) seems to be equivalent of CompareAndExchange operator. Given current code base, we need to pre-allocate a TValue before calling GetOrAdd and can result in over allocation when race condition inserts TKey before us. This is also true if we use the delegate/Func/Lambda versions of GetOrAdd, as per remarks in the official documentation.
-
Performance (CPU) is equivalent when comparing Func<> vs TryGetValue/TryAdd/GetOrAdd. But, only if we pay special attention to make sure Func<> declarations are minimized (i.e. static Func<>, although, this really should be relegated to the Compiler/Optimizer to handle. Unfortunately, we need to explicitly help it).
-
For Func<> requiring closures, static Func<> declaration is possible in newer .NET when using the new GetOrAdd(TKey,TArg,TValue) form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but related to ConcurrentDictionary, we've seen several cases where folks hit OOM due to the 2GB limit of ConcurrentDictionary on 64bit machines.
There are other caveats - e.g. dotnet/runtime#41840 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t have enough context (of this project) to know what priorities and tradeoffs are valued, and thus, I look forward to feedback from owners and members here.
My perspective on this is that anyone using OpenTelemetry (ipso facto) is sensitive to performance so the library should be as low-impact to the hosting process as possible.
Check out this benchmark:
[MemoryDiagnoser]
public class ConcurrentDictionaryBenchmarks
{
private ConcurrentDictionary<LabelSet, object> instruments;
[IterationSetup]
public void IterationSetup()
{
this.instruments = new ConcurrentDictionary<LabelSet, object>();
}
[Benchmark]
public void GetOrAdd()
{
int closure = 0;
this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (labelSet) => new object[closure]);
this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (labelSet) => new object[closure]);
}
[Benchmark]
public void TryAdd()
{
int closure = 0;
if (!this.instruments.TryGetValue(LabelSet.BlankLabelSet, out object value))
{
this.instruments.TryAdd(LabelSet.BlankLabelSet, new object[closure]);
}
if (!this.instruments.TryGetValue(LabelSet.BlankLabelSet, out value))
{
this.instruments.TryAdd(LabelSet.BlankLabelSet, new object[closure]);
}
}
}
Method | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
GetOrAdd | 2.732 us | 0.2642 us | 0.7537 us | 2.600 us | - | - | - | 224 B |
TryAdd | 2.408 us | 0.2731 us | 0.7658 us | 2.100 us | - | - | - | 72 B |
According to that, doing TryGetValue
+TryAdd
is faster and uses less memory. So for me, it's a no-brainer 😄
I do agree with you the lambda syntax is much nicer, the automatic allocation of a delegate in C# is a tragedy!
To @reyang's point, the code is in a lock
...
lock (this.bindUnbindLock)
{
var recStatus = isShortLived ? RecordStatus.UpdatePending : RecordStatus.Bound;
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus));
}
...so why the heck are we even using a ConcurrentDictionary? 🤣 I haven't spent much time in the metrics code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodeBlanch Thanks for the tip on BenchmarkDotNet! That is an awesome library.
Your benchmark confirmed my findings as well. So, if you do this...
[Benchmark]
public void GetOrAdd()
{
int closure = 0;
Func<string, int, object> func = (labelSet, closure) => new object[closure];
this.instruments.GetOrAdd(LabelSet.BlankLabelSet, func, closure);
this.instruments.GetOrAdd(LabelSet.BlankLabelSet, func, closure);
}
You get this...
Method | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
GetOrAdd | 2.323 us | 0.2878 us | 0.8258 us | 2.000 us | - | - | - | 72 B |
TryAdd | 2.577 us | 0.3651 us | 1.0476 us | 2.000 us | - | - | - | 72 B |
Unfortunately, the GetOrAdd(TKey,TArg,TValue) overload exists only on newer .NET. Without this form, we would still require allocation for the closure; in which case; TryAdd version is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I did not necessarily have to pull out func<> separately. The Compiler is suppose to optimize that portion. So...
[Benchmark]
public void GetOrAdd()
{
int closure = 0;
//Func<string, int, object> func = (labelSet, closure) => new object[closure];
this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (l, c) => new object[c], closure);
this.instruments.GetOrAdd(LabelSet.BlankLabelSet, (l, c) => new object[c], closure);
}
results in...
Method | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
GetOrAdd | 2.597 us | 0.3755 us | 1.101 us | 2.100 us | - | - | - | 72 B |
TryAdd | 2.591 us | 0.3495 us | 1.003 us | 2.000 us | - | - | - | 72 B |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we need to stay focus on the original goal. Which is to not over-allocate the object we're inserting into the ConcurrentDictionary. Of which, unfortunately, we only partially addressed. I have strategies for this issue if we remain focus on this.
Also, I understand the whole Metric spec is WIP, and thus, maybe my effort here is premature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given concern for use of Func<> and support-ability in older .NET, I refactor to using TryGetValue/GetOrAdd instead. This should optimize (although it does not 100% solve) our over-allocation of dictionary objects.
…erns over performance.
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus)); | ||
if (!this.counterBoundInstruments.TryGetValue(labelset, out boundInstrument)) | ||
{ | ||
boundInstrument = this.counterBoundInstruments.GetOrAdd(labelset, this.CreateMetric(recStatus)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to be:
if (!this.counterBoundInstruments.TryGetValue(labelset, out boundInstrument))
{
boundInstrument = this.CreateMetric(recStatus);
this.counterBoundInstruments.TryAdd(labelset, boundInstrument);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the TryAdd fail, we will return the wrong boundInstrument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetOrAdd
doesn't guarantee anything different, does it? From the doc:
Since a key/value can be inserted by another thread while valueFactory is generating a value, you cannot trust that just because valueFactory executed, its produced value will be inserted into the dictionary and returned. If you call GetOrAdd simultaneously on different threads, valueFactory may be called multiple times, but only one key/value pair will be added to the dictionary.
If we need a guarantee why not use a Dictionary
in a lock?
DoubleObserverMetricHandleSdk boundInstrument; | ||
if (!this.observerHandles.TryGetValue(labelset, out boundInstrument)) | ||
{ | ||
boundInstrument = this.observerHandles.GetOrAdd(labelset, new DoubleObserverMetricHandleSdk()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These other ones that don't need a closure you could use a static delegate:
private static readonly Func<LabelSet, DoubleObserverMetricHandleSdk> CreateMetricFunc = (labelSet) => new DoubleObserverMetricHandleSdk();
public override void Observe(double value, LabelSet labelset)
{
// TODO cleanup of handle/aggregator. Issue #530
var boundInstrument =
this.observerHandles.GetOrAdd(labelset, CreateMetricFunc);
boundInstrument.Observe(value);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is necessary as we already called TryGetValue prior. Thus, the expectation is that we will need to alloc the new item. Thus, converting to a static delegate is just adding 1 extra layer.
If your concern is about atomicity, then, both these are equivalent. In the case of a race condition, the GetOrAdd(Func<>) form does not guarantee atomicity either. Thus, the functionally is no different than current code. In both cases, we will wind up "leaking" the new object regardless.
IMHO, In this code base, I would prefer using the GetOrAdd(Func<>) syntax rather than break into TryGetValue/TryGetOrAdd pattern. Simply because the code is more readable and understandable. Plus, the compiler has the ability to optimize to "static" delegates as it sees fit. Also, going forward in newer .NET, we have the GetOrAdd syntax with passing in TArg, which addresses the performance with closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is necessary as we already called TryGetValue prior. Thus, the expectation is that we will need to alloc the new item. Thus, converting to a static delegate is just adding 1 extra layer.
Don't call TryGetValue for the case where a static delegate will work, just call TryGetValue passing the delegate ref (most places). Only call TryGetValue/TryAdd when you need the closure (the one place).
IMHO, In this code base, I would prefer using the GetOrAdd(Func<>) syntax rather than break into TryGetValue/TryGetOrAdd pattern.
Let's go with the allocation-free patterns.
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changes
Optimize unnecessarily allocation when initializing MetricProvider.
We allocate new object unnecessarily. We use the pattern collection.GetOrAdd(name, new Object()). The new Object() is not used if item already exists in the collection. Using a different form of GetOrAdd() we convert the new Object(0 into a Func(). Thus, only called if needed.
// Old Pattern
return this.longCounters.GetOrAdd(name, new Int64CounterMetricSdk(name));
// New Pattern
return this.longCounters.GetOrAdd(name, (k) => new Int64CounterMetricSdk(k));
Optimize SortAndDedup() method to remove double allocation of List object. Using a SortedDictionary<> to help with sorting and deduping.