-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sampler.AWS] Tested and Updated X-Ray Sampler #1887
Conversation
{ | ||
httpUrl = (string?)tag.Value; | ||
} | ||
else if (tag.Key.Equals(SemanticConventions.AttributeHttpMethod, StringComparison.Ordinal)) | ||
else if (tag.Key.Equals(SemanticConventions.AttributeHttpRequestMethod, StringComparison.Ordinal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any unit tests in OpenTelemetry.Sampler.AWS.Tests that need to be updated as part of this change?
I'm not hyper familiar with this code base, but this looks like a potential candidate: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/test/OpenTelemetry.Sampler.AWS.Tests/TestSamplingRuleApplier.cs#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good callout. Let me run the unit tests to make sure they are passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the unit tests and they are passing now.
asakem@88665a24d661 OpenTelemetry.Sampler.AWS.Tests % dotnet test
Determining projects to restore...
All projects are up-to-date for restore.
OpenTelemetry.Sampler.AWS -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/src/OpenTelemetry.Sampler.AWS/bin/Debug/net6.0/OpenTelemetry.Sampler.AWS.dll
OpenTelemetry.Sampler.AWS.Tests -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net8.0/OpenTelemetry.Sampler.AWS.Tests.dll
OpenTelemetry.Sampler.AWS.Tests -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net6.0/OpenTelemetry.Sampler.AWS.Tests.dll
OpenTelemetry.Sampler.AWS.Tests -> /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net7.0/OpenTelemetry.Sampler.AWS.Tests.dll
Test run for /Volumes/workplace/otel-dotnet-contrib/opentelemetry-dotnet-contrib/test/OpenTelemetry.Sampler.AWS.Tests/bin/Debug/net8.0/OpenTelemetry.Sampler.AWS.Tests.dll (.NETCoreApp,Version=v8.0)
Microsoft (R) Test Execution Command Line Tool Version 17.9.0 (x64)
Copyright (c) Microsoft Corporation. All rights reserved.
Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:00.29] OpenTelemetry.Sampler.AWS.Tests.TestAWSXRayRemoteSampler.TestSamplerUpdateAndSample [SKIP]
Skipped OpenTelemetry.Sampler.AWS.Tests.TestAWSXRayRemoteSampler.TestSamplerUpdateAndSample [1 ms]
Passed! - Failed: 0, Passed: 46, Skipped: 1, Total: 47, Duration: 313 ms - OpenTelemetry.Sampler.AWS.Tests.dll (net8.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. just a small comment.
|
||
var ruleApplier = new SamplingRuleApplier(this.ClientId, this.Clock, rule, currentStatistics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall this was done to "sort of" implement immutability of the ruleApplier
instances (which is generally a good idea). But as we discussed offline this leads to some race conditions. Can you briefly explain either as a comment in the code or in this PR on why we need to break the immutability and mutate the instance now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change was made mainly to address one race condition. Basically, with the previous implementation, whenever the RulesCache is periodically updated, it would pull in the list of rules available and if the rule already existed, the Statistics were carried over into a new instance of the rule applier. This caused a race condition. Say the rule had a reservoir of 500 requests/sec and FixedRateSampler with 1/sec after the reservoir is depleted. When creating a new instance of the rule, the constructor creates a new RateLimittingSampler with a reservoir of 1/s and fixedRateSampler with with 1/sec as well since that's in the rule definitio until UpdateTargets is called to set the reservoir according to how the rule is set in cloudwatch (500/s). Between creating a new instance and calling UpdateTargets, if we get a bunch of 200 requests/sec that match the rule, only 2 will be sampled, 1 due to reservoir sampler and the other due to falling back to the fixed rate sampler. If the sampler poling is set to refresh each second, then this will be a problem since it can be the case that the reservoir will mostly only be 1 and not the intended 500. With this change, it will retain the RateLimitingSampler with the correct reservoir while only the rule and it's definition will get updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense and lgtm.
Another thing we could have done to preserve immutability was to also pass in the current reservoir size here and create a new RateLimitingSampler
with this reservoir size for a new SamplingRuleApplier
instance. Just a suggestion, not required right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looks good.
I have a question about test suite. Is there any possibility to include it in pipeline? Maybe executing it in dockerized env? It will be great to ensure that we do not introduce any breaking changes during maintenance.
Package was never released, probably do not need to extend CHANGELOG..
We can look into that. Also, I'm trying to merge but the build-test workflow seems like it's bugged and doesn't want to start. Any ideas? |
It is your first contribution to this repo. It means ci needs approval to be executed. |
I see. Looks like the build is passing now but it’s still pending your approval although i already have the one approval from you😅 |
Changes
Updated the Http attributes that were used and replaced them with URL attributes. This is required since some http attributes are no longer used by instrumentation libraries. More info can be found here: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/src/Shared/SemanticConventions.cs#L99-L108
Also applied some small bug fixes to get the sampler to work.
Testing
We have our own test bed for the X-Ray sampler that we use to test the sampling functionality regardless of the language. https://github.com/aws-observability/aws-otel-community/tree/master/centralized-sampling-tests. I followed those testing instructions there and manually instrumented a sample application and the tests are passing which verifies the functionality of the sampler.
Please provide a brief description of the changes here.
For significant contributions please make sure you have completed the following items:
CHANGELOG.md
updated for non-trivial changes