Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sampler.AWS] Part-2: Add rules cache and rule matching logic #1124

Merged
merged 9 commits into from
Apr 26, 2023

Conversation

srprash
Copy link
Contributor

@srprash srprash commented Mar 30, 2023

This PR is a continuation of building the AWS X-Ray Remote Sampler.
See part 1 PR: #1091

Changes

  • Adding a Sampling Rules Cache.

    • This cache holds the sampling rules which are periodically fetched from the X-Ray service.
    • When a new set of sampling rules are fetched, the stateful properties like the Reservoir and the Statistics are preserved from the old set of rules. These stateful properties will be used in making the sampling decisions and keeping the sampling records respectively throughout the duration of the application. Note: The Reservoir and Statistics classes are very bare-bone right now, but will be developed in further PRs.
    • The cache is also responsible for finding the sampling rule that matched a particular request. This sampling rule will be used to make the final sampling decision.
  • Adding the sampling rule matching logic.

    • When an activity/span is created, the sampler will try and match a sampling rule based on the attributes like service name, http method, host, etc.
    • A rule definition can specify matching of these attributes using wildcard characters.
  • The sampler now requires an OpenTelemetry Resource object.

    • The sampler allows users to specify service name and service type (for example AWS::EC2::Instance) in their sampling rule to control sampling on a service level. These attributes are currently obtainable from the OTel SDK Resource configured for the application.
    • There is an open ask to make the OTel SDK Resource available to samplers. In the meantime, the users will have to provide the Resource instance to the sampler themselves.

@srprash srprash force-pushed the xray_sampler_pr_2 branch from 12ec9b6 to c775460 Compare March 30, 2023 18:27
@srprash srprash marked this pull request as ready for review March 30, 2023 20:00
@srprash srprash requested a review from a team March 30, 2023 20:00
@utpilla utpilla added the comp:sampler.aws Things related to OpenTelemetry.Samplers.AWS label Mar 30, 2023
@codecov
Copy link

codecov bot commented Mar 30, 2023

Codecov Report

Merging #1124 (76a7c1b) into main (4cc8429) will increase coverage by 0.27%.
The diff coverage is 84.82%.

❗ Current head 76a7c1b differs from pull request most recent head 9289865. Consider uploading reports for the commit 9289865 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1124      +/-   ##
==========================================
+ Coverage   72.53%   72.80%   +0.27%     
==========================================
  Files         236      243       +7     
  Lines        8570     8781     +211     
==========================================
+ Hits         6216     6393     +177     
- Misses       2354     2388      +34     
Impacted Files Coverage Δ
.../OpenTelemetry.Sampler.AWS/AWSXRaySamplerClient.cs 88.09% <ø> (ø)
src/OpenTelemetry.Sampler.AWS/SystemClock.cs 46.15% <46.15%> (ø)
...lemetry.Sampler.AWS/AWSXRayRemoteSamplerBuilder.cs 73.68% <50.00%> (-18.63%) ⬇️
src/OpenTelemetry.Sampler.AWS/RulesCache.cs 72.54% <72.54%> (ø)
.../OpenTelemetry.Sampler.AWS/AWSXRayRemoteSampler.cs 73.33% <88.23%> (+11.42%) ⬆️
...c/OpenTelemetry.Sampler.AWS/SamplingRuleApplier.cs 92.98% <92.98%> (ø)
src/OpenTelemetry.Sampler.AWS/Matcher.cs 97.91% <97.91%> (ø)
src/OpenTelemetry.Sampler.AWS/Clock.cs 100.00% <100.00%> (ø)
src/OpenTelemetry.Sampler.AWS/FallbackSampler.cs 100.00% <100.00%> (ø)
src/OpenTelemetry.Sampler.AWS/SamplingRule.cs 90.47% <100.00%> (+0.73%) ⬆️
... and 1 more

Copy link
Contributor

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great to review code also by AWS experienced guys.

Maybe @Oberon00?

src/OpenTelemetry.Sampler.AWS/CHANGELOG.md Outdated Show resolved Hide resolved
namespace OpenTelemetry.Sampler.AWS;

// A time keeper for the purpose of this sampler.
internal sealed class Clock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a benefit for using this additional abstraction layer, instread of just using directly DateTime.UtcNow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampler does quite a few things based on time, like the rules cache freshness and rate limited sampling (which would come in the next PR. The DateTime APIs can be used for such logic, but I think for testing the functionality, I need a mocked clock which can be passed in as a Clock type.

src/OpenTelemetry.Sampler.AWS/Matcher.cs Outdated Show resolved Hide resolved
src/OpenTelemetry.Sampler.AWS/AWSXRayRemoteSampler.cs Outdated Show resolved Hide resolved
src/OpenTelemetry.Sampler.AWS/README.md Outdated Show resolved Hide resolved
src/OpenTelemetry.Sampler.AWS/Statistics.cs Outdated Show resolved Hide resolved
test/OpenTelemetry.Sampler.AWS.Tests/TestMatcher.cs Outdated Show resolved Hide resolved
[Fact]
public void TestWildcardMatching()
{
Assert.True(Matcher.WildcardMatch(null, "*"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that null should match anything. I think it needs some documentation for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a null input here would mean that the particular attribute is not set on the span, and so the * should match any value of an attribute whether it is set or not.

@Oberon00
Copy link
Member

Oberon00 commented Apr 5, 2023

Sorry, I don't have any experience with X-Ray, so this PR is out of my area of expertise.

Assert.True(Matcher.WildcardMatch("HelloWorld", "HelloWorld"));
Assert.True(Matcher.WildcardMatch("HelloWorld", "Hello*"));
Assert.True(Matcher.WildcardMatch("HelloWorld", "*World"));
Assert.True(Matcher.WildcardMatch("HelloWorld", "?ello*"));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add two more tests with '*' and '?' in the middle of the sting on the second variable, some with other regex special characters such as '.', and a few more complex ones as well.

I want to make sure we test that ToRegexPattern sufficiently. If we have a bug here, then a customer might accidentally depend on it. It would be painful for them if we fix the bug later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need really complex testing here since the sampling rule options only allow for * and ? for matching.
https://docs.aws.amazon.com/xray/latest/devguide/xray-console-sampling.html#xray-console-sampling-options

}

public bool Expired()
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to lock here? We are not changing state and UpdatedAt cannot be changed back to null as far as I can tell.

Copy link
Contributor Author

@srprash srprash Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need a lock here. The background thread can be writing the UpdatedAt while the main thread could be reading the value when trying to sample a request. We don't want these two to happen concurrently.

src/OpenTelemetry.Sampler.AWS/SamplingRule.cs Outdated Show resolved Hide resolved
src/OpenTelemetry.Sampler.AWS/SamplingRule.cs Outdated Show resolved Hide resolved
test/OpenTelemetry.Sampler.AWS.Tests/Utils.cs Show resolved Hide resolved
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 15, 2023
@github-actions github-actions bot removed the Stale label Apr 17, 2023
{
if (c == '*' || c == '?')
{
return Regex.IsMatch(text, ToRegexPattern(globPattern));
Copy link

@atshaw43 atshaw43 Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have concerns here about regex being slow. Especially since we are using a simplified version. I will leave it up to you on if we should keep it. Might be worth adding a performance test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. A performance benchmark would be a good idea. I can think of adding it once the sampler implementation is complete.

Copy link
Contributor

@Kielek Kielek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some doubts about public API. @utpilla, could you please review it?

Other parts LGTM.

.AddSource(serviceName)
.SetResourceBuilder(resourceBuilder)
.AddConsoleExporter()
.SetSampler(AWSXRayRemoteSampler.Builder(resourceBuilder.Build()) // you must provide a resource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion:

This setup looks quite verbose. I think you could extract this into an extension method. Something like: AddAWSXRayRemoteSampler(this TracerProviderBuilder, Action<AWSXRaySamplerOptions> configure).

That way you would only be exposing the required options and you could reduce the publicAPI surface by marking these methods as internal: AWSXRayRemoteSampler.Builder, SetPollingInterval, SetEndpoint, Build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had earlier of thought of doing the extension method similar to AddXRayTraceId() we have for the custom trace id generator, but I think that is counterintuitive for adding a Sampler. Since the AWSXRayRemoteSampler is just another implementation of Sampler, it should follow the standard procedure of adding a sampler via the SetSampler method.

Regarding the setting of the sampler being verbose, I agree. The customers of OTel Java and GoLang are already configuring the remote sampler in similar way, so I would like to keep the same behavior here as well.

this.rwLock.EnterReadLock();
try
{
return this.UpdatedAt;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is locking around returning a variable needed?

Also, I don't think GetUpdatedAt is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Since the UpdatedAt will be written by rule poller and read by target poller, we need this lock for safe reads.

The GetUpdatedAt will be used in my next PR by the target poller: https://github.com/srprash/opentelemetry-dotnet-contrib/blob/xray_sampler_pr_3/src/OpenTelemetry.Sampler.AWS/AWSXRayRemoteSampler.cs#L173

@Kielek Kielek merged commit 1a58431 into open-telemetry:main Apr 26, 2023
@srprash srprash deleted the xray_sampler_pr_2 branch April 26, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:sampler.aws Things related to OpenTelemetry.Samplers.AWS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants