Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address Collect/Shutdown thread safety for MetricReader #2506

Merged
merged 2 commits into from
Oct 25, 2021

Conversation

CodeBlanch
Copy link
Member

@reyang For comparison I took the code from #2499 (MetricReader task version) and switched it to use TaskCompletionSource<T> instead of Task<T> directly. What that does it take away the need to actually use a second thread to do the work while still allowing the same signaling/result storage mechanics. Makes the performance good like #2505 (wait handle version) but maybe a bit easier to read/maintain?

@CodeBlanch CodeBlanch requested a review from a team October 23, 2021 18:19
@codecov
Copy link

codecov bot commented Oct 23, 2021

Codecov Report

Merging #2506 (e43e5f6) into main (6f3f7e5) will increase coverage by 0.01%.
The diff coverage is 93.10%.

❗ Current head e43e5f6 differs from pull request most recent head 2d159b4. Consider uploading reports for the commit 2d159b4 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2506      +/-   ##
==========================================
+ Coverage   80.51%   80.52%   +0.01%     
==========================================
  Files         253      253              
  Lines        8503     8529      +26     
==========================================
+ Hits         6846     6868      +22     
- Misses       1657     1661       +4     
Impacted Files Coverage Δ
src/OpenTelemetry/Metrics/MetricReader.cs 91.54% <93.10%> (+0.43%) ⬆️
...Zipkin/Implementation/ZipkinExporterEventSource.cs 63.63% <0.00%> (-9.10%) ⬇️
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 73.83% <0.00%> (-0.94%) ⬇️

@reyang reyang changed the title [Do Not Merge] Address Collect/Shutdown thread safety for MetricReader Address Collect/Shutdown thread safety for MetricReader Oct 24, 2021
@@ -71,13 +76,48 @@ public bool Collect(int timeoutMilliseconds = Timeout.Infinite)
{
Guard.InvalidTimeout(timeoutMilliseconds, nameof(timeoutMilliseconds));

var kickoff = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that kickoff might be a confusing name, maybe shouldRunCollect is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even kickoffCollect


if (tcs == null)
{
lock (this.newTaskLock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess making this a spin lock might give better performance as the actual wait time should be short in normal cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that usually threads will not wait here and the cost of a lock is very cheap in such a case. If the threads will wait here - it may be caused by some troubles and the wait times may occur not so short. I suggest postponing it.

try
{
return this.OnCollect(timeoutMilliseconds);
if (kickoff)
Copy link
Member

@reyang reyang Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplify the flow (I realized it when I rewrote the logic in #2505):

if (!shouldRunCollect)
{
   // this shouldn't throw exception since tcs has it's value explicitly set
   return Task.WaitAny(tcs.Task, this.shutdownTcs.Task, Task.Delay(timeoutMilliseconds)) == 0 ? tcs.Task.Result : 
}

var result = false;

try
{
    lock (this.onCollectLock)
    {
        this.collectionTcs = null;
        result = this.OnCollect(timeoutMilliseconds);
    }
}
catch (Exception)
{
    // TODO: OpenTelemetrySdkEventSource.Log.SpanProcessorException(nameof(this.Shutdown), ex);
}

tcs.TrySetResult(result);
return result;

private AggregationTemporality preferredAggregationTemporality = CumulativeAndDelta;
private AggregationTemporality supportedAggregationTemporality = CumulativeAndDelta;
private int shutdownCount;
private int shutdownTimeout = int.MinValue;
Copy link
Member

@reyang reyang Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this anymore?
shutdownCount might be better?

if (Interlocked.CompareExchange(ref this.shutdownCount, 1, 0) != 0)
{
    return false; // shutdown already called
}

Up to you.

{
return false; // shutdown already called
}

this.shutdownTimeout = timeoutMilliseconds;
Copy link
Member

@reyang reyang Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line (it's already implied by CompareExchange).
(no idea why would I add it, most likely I was sleepy).

@CodeBlanch CodeBlanch merged commit 884b224 into open-telemetry:main Oct 25, 2021
@CodeBlanch CodeBlanch deleted the metricreader-tcs branch October 25, 2021 16:59
@CodeBlanch
Copy link
Member Author

FYI I merged PR and then @reyang is going to open a new one to clean up some of the feedback.

@reyang reyang mentioned this pull request Oct 25, 2021
@reyang
Copy link
Member

reyang commented Oct 25, 2021

FYI I merged PR and then @reyang is going to open a new one to clean up some of the feedback.

#2509

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants