Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Observability APIs to the SDK and add telemetry provider reference to AWSConfigs and ClientConfig #3344

Conversation

muhammad-othman
Copy link
Member

Description

Added Observability APIs to the SDK along with NoOp implementation that will be used by default when the user doesn't provide a custom implementation.
Added TelemetryProvider to AWSConfigs and ClientConfig, to allow the users to set it globally or per client.

Motivation and Context

Testing

Compared the added Observability APIs with the SRA and OpenTelemetry APIs.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

/// </summary>
/// <param name="scope">The name of the instrumentation scope that uniquely identifies the tracer.</param>
/// <returns>A <see cref="Tracer"/> instance for the specified scope.</returns>
public abstract Tracer GetTracer(string scope);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SRA contained an optional attributes for getTracer, however OTel.GetTracer doesn't accept attributes as a parameter.
I checked c++ implementation and they take this parameter and ignore it when for the OTel wrapper
image
And Kotlin only allows scope for this API
image

I chose to only allow scope rather than allowing an additional attributes parameter that we will be ignored by OTel implementation.

/// This global telemetry provider is used to collect and report telemetry data
/// (such as traces and metrics) for all AWS SDK operations by default.
/// By default, this property is set to an instance of <see cref="DefaultTelemetryProvider"/>,
/// which has no-op implementation by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability I would scratch some of the by default language after operations and no-op implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I will remove the no-op implementation mention here since we changed the plan and we are going to add the full implementation for .net standard.

/// <returns>An IEnumerable of key-value pairs.</returns>
public IEnumerable<KeyValuePair<string, object>> GetAll()
{
return _attributes.AsReadOnly();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is AsReadOnly needed here? IEnumerable already prevents you from modifying the underlying list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed AsReadOnly and used Getter instead.

var index = _attributes.FindIndex(kvp => kvp.Key == key);
if (index != -1)
{
// Replace the existing key-value pair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't replacing the KVP. Comment is incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

/// <param name="attributes">An initial collection of key-value pairs to populate the attributes.</param>
public Attributes(IEnumerable<KeyValuePair<string, object>> attributes)
{
_attributes = new List<KeyValuePair<string, object>>(attributes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows for duplicate keys:

var attributes = new Attributes(new List<KeyValuePair<string, object>>
{
    new KeyValuePair<string, object>("name", "John"),
    new KeyValuePair<string, object>("name", "Fred"),
    new KeyValuePair<string, object>("age", 30),
    new KeyValuePair<string, object>("isStudent", true)
});

Which in turn will cause a problem with Set/Get only using the first one found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After updating the internal attributes to be a dictionary, the duplicates will replace the previous version, similar to the OTel tags behavior

if (index != -1)
{
// Replace the existing key-value pair
return _attributes[index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be return _attributes[index].Value per the method comments. There is also a problem with this method in general because the Value could be null which would indicate that the key wasn't found. Should null be allowed as a value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed after using dictionary.

/// </summary>
public class Attributes
{
private readonly List<KeyValuePair<string, object>> _attributes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a Dictionary or OrderedDictionary? It would solve some of the problems in this class and give you o(1) access.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used regular Dictionary since the OrderedDictionary doesn't match the regular dictionary performance for insertion and deletion, also the order restriction wasn't mentioned in the OTel specifications.

/// </summary>
/// <param name="key">The key of the attribute.</param>
/// <param name="value">The value of the attribute.</param>
public void Set(string key, object value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this allow setting null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update it to remove the existing item if the value is null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have a Remove method? Having null remove within a Set method is a code smell. It is also less discoverable by a user of this class. Or are you following a spec where this is the expected behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec wasn't very detailed and didn't contain any directions on the Attributes API other than set and get in addition to may contain additional methods
I was trying to match how it is implemented in the ActivityTagsCollection https://github.com/dotnet/runtime/blob/5535e31a712343a63f5d7d796cd874e563e5ac14/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivityTagsCollection.cs#L57, but we don't have to stick to what they have if we think it is a code smell, will update it.

Comment on lines +19 to +21
/// Represents an async measurement.
/// An async measurement records values, such as periodically or based on events.
/// The generic type parameter T specifies the type of value being recorded, such as int, long, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend these lines to show up with newlines in intelliSense? They won't without . This same comment applies to other /// comments in this review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand They won't without . because <para> doesn't appear in the formatted markdown 😅

This is how it looks without <para>
image

And here after adding <para> wrapping every sentence
image

I don't think it looks this bad without the newlines, but I'm open to change it if you think it is better this way, I've seen both ways done in our SDK.
For now I only added it in ClientConfig and AWSConfigs, since the comments there are kinda long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks fine/better without the <para> tags. I was only mentioning if you were trying to format your comment which it looks like you were that it wouldn't show up that way in intelliSense.

CLIENT,

/// <summary>
/// Represents a request received from a remote client.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is CLIENT it comes from the service but when it is SERVER it comes from the remote client? Are the comments correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the SRA was more confusing than that, I will call everything a component instead, since a server can be considered a client of another component, and this is how OpenTelemetry calls them.

@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-7517-Observability-Add-Observability-APIs-to-the-SDK branch 2 times, most recently from 9308585 to 257d613 Compare June 16, 2024 21:17
/// </summary>
/// <param name="key">The key of the attribute.</param>
/// <param name="value">The value of the attribute.</param>
public void Set(string key, object value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have a Remove method? Having null remove within a Set method is a code smell. It is also less discoverable by a user of this class. Or are you following a spec where this is the expected behavior?

/// <summary>
/// Initializes a new instance of the <see cref="Attributes"/> class.
/// </summary>
/// <param name="attributes">An initial collection of key-value pairs to populate the attributes.</param>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be documented on what it does with duplicate values. Basically will silently use the last one but should it do that silently or should it throw an exception? Does a specification detail the desired behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, no details from the specs. Will add some details to the comments.

}
else
{
_attributes[key] = value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If key is null this will throw an exception. Is that the desired behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the keys shouldn't be null https://opentelemetry.io/docs/specs/otel/common/#:~:text=The%20attribute%20key%20MUST%20be%20a%20non%2Dnull%20and%20non%2Dempty%20string
So either we throw an exception or leave it to the dictionary to throw the exception, I think it is fine to leave it to the dictionary unless you see any downside to that.

Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest looking into adding the System.Diagnostics implementation to prove out the abstract API will work for our needs. If you want to do that in a different PR that is okay accepting this might need changes when we start trying to do things for real.

namespace Amazon.Runtime.Telemetry
{
/// <summary>
/// Represents a collection of key-value pairs for attributes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what attributes are and what they are used for.

/// <summary>
/// Stops the async measurement.
/// </summary>
public abstract void Stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having to call Stop methods should we use the pattern of making the type implement IDisposable and then we can rely on the dispose pattern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I should remove this stop method, this is one of the discrepancies between OpenTelemtery specs and the .net implementation.
From OTel specs: https://opentelemetry.io/docs/specs/otel/metrics/api/#asynchronous-instrument-api:~:text=Where%20the%20API%20supports%20registration%20of%20callback%20functions%20after%20asynchronous%20instrumentation%20creation%2C%20the%20user%20MUST%20be%20able%20to%20undo%20registration%20of%20the%20specific%20callback%20after%20its%20registration%20by%20some%20means.

And I couldn't find a way to do the unregistration, we should map AsyncMeasurementHandle to ObservableInstrument and there is no way to stop it and we have to dispose the whole meter we used (which now implements IDisposable) based on this open-telemetry/opentelemetry-dotnet#3460 (comment)

Will add a reference for the Meter instead of the stop method.

/// This implementation does nothing when methods are called, and is used as a placeholder or default
/// implementation when tracing is not enabled or required.
/// </summary>
public class NoOpTraceSpan : TraceSpan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put all of the NoOp types in their own files and in their own separate namespace Amazon.Runtime.Telemetry.Tracing.NoOp. Also should they be internal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move them and update their access modifier.

@muhammad-othman
Copy link
Member Author

I would suggest looking into adding the System.Diagnostics implementation to prove out the abstract API will work for our needs. If you want to do that in a different PR that is okay accepting this might need changes when we start trying to do things for real.

@normj When I worked on the design I created a PoC that used System.Diagnostics and I did test tracing and one metric before working on this implementation and both matched the APIs that we have here.
I may find discrepancies between the APIs that we added and System.Diagnostics but it should be minor since the design was based on OpenTelemetry specs, and in this case I would prefer to make changes later since this PR is already kinda long, and we can safely do so as we are merging into a feature branch and we can make any needed changes.

@muhammad-othman muhammad-othman force-pushed the muhamoth/DOTNET-7517-Observability-Add-Observability-APIs-to-the-SDK branch from 257d613 to c31588a Compare June 19, 2024 15:32
@muhammad-othman muhammad-othman merged commit a692e9e into DOTNET-7462-Observability-for-the-NET-SDK Jun 20, 2024
1 check passed
@muhammad-othman muhammad-othman deleted the muhamoth/DOTNET-7517-Observability-Add-Observability-APIs-to-the-SDK branch June 20, 2024 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants