-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Support for external observability providers - Metrics #2015
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
Thanks a lot @roger-zhangg for taking the time in creating this! I'm so excited we're kicking the observability provider discussion. I like the general direction but the For example, here are some initial questions:
If we think in terms of a provider who also brings their
from numbers import Number
from aws_lambda_powertools.metrics.providers import MetricsProvider
class DatadogMetricsProvider(MetricsProvider):
METRIC_LIMIT = 200 # super().add_metric will use that value to validate, etc.
def serialize(metrics: MetricSummary):
# convert metrics into custom format
...
def validate(metrics: MetricSummary) -> bool:
# validate metric schema
... Ad-hoc comments
Let's validate with customers before we implement this one out. I can see why a customer would want to have metric duplicated in N providers, however, this gets expensive really quickly making it often unfeasible. Please, do correct me if I missed or misinterpreted the intent.
This is the core of the RFC that needs some further thinking. A Provider mechanism might be more powerful to extend later.
For now, let's use as a reference to test but not commit to make Datadog provider as part of our source code. We need to circle back with Lambda and Partner teams, so to not get accidentally caught in a race to add the next Observability Partner and accidentally manage tech debt from a 3rd party. OpenTelemetry format, however, would be a good goal. |
Thanks for the comment Heitor! Some thoughts on questions
Further thinkingPlease correct me if I'm wrong. From my perspective, this custom serializer solution would give our customer more freedom on how they want to serialize/process their metrics. I don't think this would contradict with the provider solution you've proposed. They could be two individual ways to define the metrics process(also can work together?). For example, if we have a pre-defined provider class. But our user wants some custom conversion on the metrics before submit. They could still utilize the custom serializer along with the provider. I wonder would it be more smooth to provide the serializer option first and works on the provider part once we have more data? |
I definitely see where you're coming from. From the surface, a serializer would be easy enough to maintain and to use. However, this area is much deeper as you start digging into the ecosystem, how customers use, extend, and how non-developers cognitive load apply.
At a first glance, serializer would solve the immediate need if all we cared was Datadog or NewRelic. The minute you add OpenTelemetry, or additional partners and custom solutions like ELK, we accidentally go down the road of having multiple parameters and exceptions. It gets worse when you factor the different ways one handles metric resolution, metric units, metric validation, high-cardinality, tags/dimensions, validation - it's almost impossible to predict what customers will need next, therefore adding another parameter will accidentally make other customers decision making process harder. Overall, this increases cognitive load, and we lose the value of Powertools that customers love that kicked off this work: "As a customer, I want to use the X Observability Provider but keep Powertools straightforward UX".
If we shift towards a Provider approach, we can now: (1) Create a specification of what the behaviour of a provider should be, (2) Create a CloudWatch EMF Provider from the new specification, and (3) Refactor This opens the door for external providers like Datadog, NewRelic, Lumigo, etc. to create their own Powertools specific provider and offer to their own customers. If customers eventually want to add additional value add features themselves - e.g., company policy is to add a metric for each application cost center, they can easily do so by overriding a provider too 🎉. PS: I wrote this before our 1:1 call. Please do shout out if there are any areas you want me to clarify - THANK YOU for helping out! |
Thanks Heitor, I've updated the RFC accordingly. |
Re-reading... |
Looks so much better, thanks a lot @roger-zhangg! Answering the questions you left out, please let me know if I missed any. Potential challenges
We could add a new Let the provider and the customer own what goes into the args/kwargs, not us (dependency inversion) from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit
from aws_lambda_powertools.utilities.typing import LambdaContext
from datadog_lambda.metrics import DatadogLPMetricsProvider
# Use datadog-defined metrics provider
datadog = DatadogLPMetricsProvider(flush=True, flush_limit=100)
metrics = Metrics(provider=datadog)
@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
metrics.add_metric(name="SuccessfulBooking", unit=MetricUnit.Count, value=1) Metrics init class Metrics(...):
def __init__(self, ...):
self.namespace = resolve_namespace(...)
self.provider = provider
self.provider.inject_context(context={"namespace": self.namespace, ...}) Datadog provider class DatadogLPMetricsProvider(MetricsProvider):
def serialize(metrics: MetricsSummary):
app_name = self.app_name or self.context.get("namespace")
...
It's within the provider responsibility to repurpose data - e.g., If they don't repurpose it, we will send that information for them to be handled during How to define boundary between LP metrics and provider
That's data that I'd expect from your research so we could discuss it. For example, besides Metrics, how do other providers handle Dimensions (datapoint ownership) and High-cardinality (metadata)? Answering this question is key to derive a proposal for a base provider, and figure out if we need to rethink minimum responsibilities beyond Take for instance areas that aren't common: metric
|
Hi @heitorlessa! I need to clarify some doubts I have before proceeding with PR #2194. @roger-zhangg has already done most of the work of refactoring the metrics utility to extend it to accept external providers and now we are in the testing with 2 providers: DataDog and OpenTelemetry. To implement these providers we are finding details that will need to be adapted in this implementation. First I'll write details for each of the providers that we discovered, and then I have a few questions before we move on: 1 - DataDog from ddtrace import tracer
from datadog_lambda.metric import lambda_metric
def lambda_handler(event, context):
# submit a custom metric
lambda_metric(
metric_name='TEST-DATADOG',
value=12.40,
tags=['product:latte', 'order:online']
)
return {
'statusCode': 200,
'body': "Hello"
} The JSON sent to the extension is {
"m": "TEST-DATADOG",
"v": 12.40,
"e": 1683677206,
"t": [
"product:latte",
"order:online"
]
} Looking at the lambda_metric signature I see this: 2 - OpenTelemetry import json
from opentelemetry import metrics
meter = metrics.get_meter("diceroller.meter")
# Now create a counter instrument to make measurements with
roll_counter = meter.create_counter(
"roll_counter",
unit="kb/s",
description="The number of rolls by roll value",
)
def lambda_handler(event, context):
roll_counter.add(1, {"roll.value": 1, "test": "x", "blablabla": "y"})
# TODO implement
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
} The JSON sent to extension is {
"scope":{
"name":"diceroller.meter",
"version":"",
"schema_url":""
},
"metrics":[
{
"name":"roll_counter",
"description":"The number of rolls by roll value",
"unit":"kb/s",
"data":{
"data_points":[
{
"attributes":{
"roll.value":1,
"test":"x",
"blablabla":"y"
},
"start_time_unix_nano":1683844957671570589,
"time_unix_nano":1683844959692710667,
"value":2
}
],
"aggregation_temporality":2,
"is_monotonic":true
}
}
],
"schema_url":""
} Looking at the add signature I see this: I started reviewing NewRelic as well and there are many other differences that make this difficult and may differ from the user experience we have today. For me it makes a lot of sense what you mentioned before:
I think we need to make a decision whether we keep the same user experience we have today and each provider builds their own providers/libraries adapting to our Metric utility or we assume we will make some changes to cover at least these DataDog/OTEL cases and change the user experience by adding more parameters for functions? Sorry for the long text and explanation, but I think we need to discuss all of this before moving on. Thank you |
Hey Leandro, Thanks for this informative summary. I looked into OPTL's metrics and there a big difference in their SDK. They provide different types of Metric Counter that emphasize the relation between a series of Metrics in addition to a simple counter. (like differences from our current implementation:
|
Thank you both, great findings ;) Given what we know now, it's best to take a step back and move into a different direction: a standalone class for each provider Based on your findings, it'd be a never ending catch up game finding lowest common denominators, AND an experience deteriorated filled with half-baked typing and escape hatches -- this fails one of our tenets: Progressive
Moving forwardSuggestions from the top of my head now... Have a go at creating a class per provider with our log_metrics() method. Focus on replicating our capture_cold_start=True first, and raise_on_empty_metrics=True. Think whether there's any value in still buffering metrics in memory with a sane default (reducing blocking I/O). Have a go at either proxying all methods from .provider for the new classes, or recreating their signatures for now with an eye on opportunities to make DX simpler for them. Keep the provider argument even for these standalone classes to make it easier for customers to test by swapping them with an InMemory fake provider (we could even provide that in the future). As you go through this, it'll be more evident the value add Powertools can add, then we focus on that, and later enrich these providers with shortcuts in creating helpful metrics like a quick profiling metric for a closure. Hope that helps |
Thanks Heitor,
I'll start on DataDog one first to see if that works for us. Just to
confirm, you are talking about a new class standalone to our current metric
class right?
…On Tue, May 16, 2023 at 3:24 PM Heitor Lessa ***@***.***> wrote:
typing from my phone; please excuse verbosity.
Thank you both, great findings ;)
Given what we know now, it's best to take a step back and move into a
different direction: *a standalone class for each provider*
Based on your findings, it'd be a never ending catch up game finding
lowest common denominators, AND an experience deteriorated filled with
half-baked typing and escape hatches -- this fails one of our tenets:
Progressive
*Progressive*. Utilities are designed to be incrementally adoptable for
customers at any stage of their Serverless journey. They follow language
idioms and their community’s common practices.
Moving forward
Suggestions from the top of my head now...
Have a go at creating a class per provider with our log_metrics() method.
Focus on replicating our capture_cold_start=True first, and
raise_on_empty_metrics=True.
Think whether there's any value in still buffering metrics in memory with
a sane default (reducing blocking I/O).
Have a go at either proxying all methods from .provider for the new
classes, or recreating their signatures for now with an eye on
opportunities to make DX simpler for them.
Keep the provider argument even for these standalone classes to make it
easier for customers to test by swapping them with an InMemory fake
provider (we could even provide that in the future).
As you go through this, it'll be more evident the value add Powertools can
add, then we focus on that, and later enrich these providers with shortcuts
in creating helpful metrics like a quick profiling metric for a closure.
Hope that helps
—
Reply to this email directly, view it on GitHub
<#2015 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFE5S35ANEPIJYWNV2VDUTXGP5DJANCNFSM6AAAAAAV4P4OHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Updated RFC with new standalone design |
|
Closed as complete! For every new metrics provider, we will open a new edition +PR. |
|
Is this related to an existing feature request or issue?
#1433
#2014
Which AWS Lambda Powertools utility does this relate to?
Metrics
Summary
Problem
Customers has been using Lambda Powertools alongside with third party observability provider for Logging, metrics and tracing. Focusing on metrics part (this RFC is a part of support for observability provider. For Logging please check out this RFC), while Powertools for AWS Lambda (Powertools) provided powerful and easy to use metrics class. However, Powertools only supports AWS CloudWatch Embedded Metric Format (EMF), which is unfriendly to third party observability provider, bringing hardship to our customers when trying to ingest this format into other observability solutions, such as DataDog.
Goal
Goal for this RFC is to enable third party observability provider like Datadog, NewRelic, Lumigo, etc. to create their own Powertools specific metric provider and offer to their own customers. e.g., DataDog metric format. And if customer eventually want to add additional value or add features themselves. They can easily do so by overriding a provider too.
Use case
Typical use case of this is utility would be for customers who use Lambda Powertools to collect metrics and want to have them ingested into a third party observability provider.
Proposal
Current metric experience
The current Powertools’ metrics utility creates custom metrics asynchronously by logging metrics to standard output following Amazon CloudWatch Embedded Metric Format (EMF), and these metrics can then be seen in the CloudWatch console. This utility can aggregate up to 100 metrics using a single CloudWatch EMF object (which is in JSON format).
Metric provider proposal
For this new utility, we propose a new metrics class for observability providers. With an optional parameter
provider
to allow developers to pass in observability provider pre-defined or user custom provider. And the output will be in observability provider friendly format. The below code snippet is a rudimentary look at how this utility can be used and how it will function.The default use case for metrics before is
metrics=Metrics()
After we have this provider feature, Customers can still use original CloudWatch Metrics by
metrics=Metrics()
ormetrics=CloudWatchEMF()
. They can also use provider for third party provider by e.g.:Metrics=DataDogMetrics()
self-defined metrics provider usage
If the customer would like to use another observability provider, or define their own metrics functions, we will define an interface that the customer can implement and pass in to the Metrics class provider parameter
Out of scope
Introduction of third party observability provider dependency to submit metrics to API is out of scope. The configuration included in this project should only support modifying the metrics outputted to log. But on their end, customer can still submit to API if they implement observability provider SDK in metrics' flush function
Potential challenges
How to provide more value to customer with current Metrics provider design
log_metrics
decorator and_add_cold_start_metric
function in the base metrics provider function.Dependencies and Integrations
Dependencies
No additional dependencies required
Changes and Additions to Public Interfaces
Metrics
,MetricsManager
Classprovider
parameter in metrics class.Performance Impact
Little or no performance impact expected
Backwards Compatibility and Upgrade Path
Full backward compatibility, old metrics codes without
provider
parameter will get exact same output as before.Alternative solutions
Acknowledgment
The text was updated successfully, but these errors were encountered: