Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sampler.AWS] Tested and Updated X-Ray Sampler #1887

Merged
merged 12 commits into from
Jun 17, 2024
11 changes: 7 additions & 4 deletions src/OpenTelemetry.Sampler.AWS/RulesCache.cs
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,14 @@ public void UpdateRules(List<SamplingRule> newRules)
List<SamplingRuleApplier> newRuleAppliers = new List<SamplingRuleApplier>();
foreach (var rule in newRules)
{
var currentStatistics = this.RuleAppliers
.FirstOrDefault(currentApplier => currentApplier.RuleName == rule.RuleName)
?.Statistics ?? new Statistics();
// If the ruleApplier already exists in the current list of appliers, then we reuse it.
var ruleApplier = this.RuleAppliers
AsakerMohd marked this conversation as resolved.
Show resolved Hide resolved
.FirstOrDefault(currentApplier => currentApplier.RuleName == rule.RuleName) ??
new SamplingRuleApplier(this.ClientId, this.Clock, rule, new Statistics());

// update the rule in the applier in case rule attributes have changed
ruleApplier.Rule = rule;

var ruleApplier = new SamplingRuleApplier(this.ClientId, this.Clock, rule, currentStatistics);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall this was done to "sort of" implement immutability of the ruleApplier instances (which is generally a good idea). But as we discussed offline this leads to some race conditions. Can you briefly explain either as a comment in the code or in this PR on why we need to break the immutability and mutate the instance now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was made mainly to address one race condition. Basically, with the previous implementation, whenever the RulesCache is periodically updated, it would pull in the list of rules available and if the rule already existed, the Statistics were carried over into a new instance of the rule applier. This caused a race condition. Say the rule had a reservoir of 500 requests/sec and FixedRateSampler with 1/sec after the reservoir is depleted. When creating a new instance of the rule, the constructor creates a new RateLimittingSampler with a reservoir of 1/s and fixedRateSampler with with 1/sec as well since that's in the rule definitio until UpdateTargets is called to set the reservoir according to how the rule is set in cloudwatch (500/s). Between creating a new instance and calling UpdateTargets, if we get a bunch of 200 requests/sec that match the rule, only 2 will be sampled, 1 due to reservoir sampler and the other due to falling back to the fixed rate sampler. If the sampler poling is set to refresh each second, then this will be a problem since it can be the case that the reservoir will mostly only be 1 and not the intended 500. With this change, it will retain the RateLimitingSampler with the correct reservoir while only the rule and it's definition will get updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and lgtm.
Another thing we could have done to preserve immutability was to also pass in the current reservoir size here and create a new RateLimitingSampler with this reservoir size for a new SamplingRuleApplier instance. Just a suggestion, not required right away.

newRuleAppliers.Add(ruleApplier);
}

Expand Down
8 changes: 4 additions & 4 deletions src/OpenTelemetry.Sampler.AWS/SamplingRuleApplier.cs
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ public SamplingRuleApplier(string clientId, Clock clock, SamplingRule rule, Stat
this.FixedRateSampler = new ParentBasedSampler(new TraceIdRatioBasedSampler(rule.FixedRate));

// We either have no reservoir sampling or borrow until we get a quota so have no end time.
this.ReservoirEndTime = DateTime.MaxValue;
this.ReservoirEndTime = DateTimeOffset.MaxValue;

// We don't have a SamplingTarget so are ready to report a snapshot right away.
this.NextSnapshotTime = this.Clock.Now();
Expand Down Expand Up @@ -97,15 +97,15 @@ public bool Matches(SamplingParameters samplingParameters, Resource resource)
{
foreach (var tag in samplingParameters.Tags)
{
if (tag.Key.Equals(SemanticConventions.AttributeHttpTarget, StringComparison.Ordinal))
if (tag.Key.Equals(SemanticConventions.AttributeUrlPath, StringComparison.Ordinal))
{
httpTarget = (string?)tag.Value;
}
else if (tag.Key.Equals(SemanticConventions.AttributeHttpUrl, StringComparison.Ordinal))
else if (tag.Key.Equals(SemanticConventions.AttributeUrlFull, StringComparison.Ordinal))
{
httpUrl = (string?)tag.Value;
}
else if (tag.Key.Equals(SemanticConventions.AttributeHttpMethod, StringComparison.Ordinal))
else if (tag.Key.Equals(SemanticConventions.AttributeHttpRequestMethod, StringComparison.Ordinal))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any unit tests in OpenTelemetry.Sampler.AWS.Tests that need to be updated as part of this change?

I'm not hyper familiar with this code base, but this looks like a potential candidate: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/test/OpenTelemetry.Sampler.AWS.Tests/TestSamplingRuleApplier.cs#L33

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout. Let me run the unit tests to make sure they are passing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the unit tests and they are passing now.

asakem@88665a24d661 OpenTelemetry.Sampler.AWS.Tests % dotnet test
  Determining projects to restore...
  All projects are up-to-date for restore.
  OpenTelemetry.Sampler.AWS -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/src/OpenTelemetry.Sampler.AWS/bin/Debug/net6.0/OpenTelemetry.Sampler.AWS.dll
  OpenTelemetry.Sampler.AWS.Tests -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net8.0/OpenTelemetry.Sampler.AWS.Tests.dll
  OpenTelemetry.Sampler.AWS.Tests -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net6.0/OpenTelemetry.Sampler.AWS.Tests.dll
  OpenTelemetry.Sampler.AWS.Tests -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net7.0/OpenTelemetry.Sampler.AWS.Tests.dll
Test run for /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net8.0/OpenTelemetry.Sampler.AWS.Tests.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.9.0 (x64)
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.29]     OpenTelemetry.Sampler.AWS.Tests.TestAWSXRayRemoteSampler.TestSamplerUpdateAndSample [SKIP]
  Skipped OpenTelemetry.Sampler.AWS.Tests.TestAWSXRayRemoteSampler.TestSamplerUpdateAndSample [1 ms]

Passed!  - Failed:     0, Passed:    46, Skipped:     1, Total:    47, Duration: 313 ms - OpenTelemetry.Sampler.AWS.Tests.dll (net8.0)

{
httpMethod = (string?)tag.Value;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ public void TestRuleMatchesWithAllAttributes()
var activityTags = new Dictionary<string, string>
{
{ "http.host", "localhost" },
{ "http.method", "GET" },
{ "http.url", @"http://127.0.0.1:5000/helloworld" },
{ "http.request.method", "GET" },
{ "url.full", @"http://127.0.0.1:5000/helloworld" },
{ "faas.id", "arn:aws:lambda:us-west-2:123456789012:function:my-function" },
};

Expand Down Expand Up @@ -59,8 +59,8 @@ public void TestRuleMatchesWithWildcardAttributes()
var activityTags = new Dictionary<string, string>
{
{ "http.host", "localhost" },
{ "http.method", "GET" },
{ "http.url", @"http://127.0.0.1:5000/helloworld" },
{ "http.request.method", "GET" },
{ "url.full", @"http://127.0.0.1:5000/helloworld" },
};

var applier = new SamplingRuleApplier("clientId", new TestClock(), rule, new Statistics());
Expand Down Expand Up @@ -132,7 +132,7 @@ public void TestRuleMatchesWithHttpTarget()

var activityTags = new Dictionary<string, string>
{
{ "http.target", "/helloworld" },
{ "url.path", "/helloworld" },
};

var applier = new SamplingRuleApplier("clientId", new TestClock(), rule, new Statistics());
Expand Down Expand Up @@ -164,7 +164,7 @@ public void TestAttributeMatching()

var activityTags = new Dictionary<string, string>
{
{ "http.target", "/helloworld" },
{ "url.path", "/helloworld" },
{ "dog", "bark" },
{ "cat", "meow" },
};
Expand Down Expand Up @@ -198,7 +198,7 @@ public void TestAttributeMatchingWithLessActivityTags()

var activityTags = new Dictionary<string, string>
{
{ "http.target", "/helloworld" },
{ "url.path", "/helloworld" },
{ "dog", "bark" },
};

Expand Down
Loading