-
Notifications
You must be signed in to change notification settings - Fork 401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Support for external observability providers - Logging #2014
Comments
reading! |
Thanks a lot @seshubaws for taking the time in creating this! I'm so excited we're kicking the observability provider discussion. As I was reading, one thing struck me: only Python and TypeScript already provide this capability. This means we might not need an actual Provider capability but more of an improvement to the ergonomics for customers to bring their own output. In Python, we use the standard logging library which is absurdly flexible yet why it's so complex to configure it. For Powertools Logger, customers tend to migrate from standard logging library or another loggers. Since we build atop of the standard logging library, we expose the same mechanisms for customers to override and bring their company standards, for example:
from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging.formatter import LambdaPowertoolsFormatter
class CustomFormatter(LambdaPowertoolsFormatter):
def serialize(self, log: dict) -> str:
"""Serialize final structured log dict to JSON str"""
log["event"] = log.pop("message") # rename message key to event
return self.json_serializer(log) # use configured json serializer
logger = Logger(service="payment", logger_formatter=CustomFormatter())
logger.info("hello") In this case, What I think we could do better here but didn't before is to either think of a simpler way since is already powerful enough, OR improve the existing mechanism by typing the What do you think? Answering some comments and questions
Minor correction, Logger sends logs to standard output (stdout). That's why some customers even use it locally for scripts, or in platforms like Fargate/AppRunner/Beanstalk, despite not having access to all features. That's also true for all other languages. The reason we did this is to make it easier for customers to take control over stdout and transform/do as they please, with say sidecar containers... or in case of Lambda... Lambda Extensions Logs API.
We can ask the community. Turns out a customer shared that Logger output is recognized by Datadog out of the box when used with
That's an interesting idea - would you mind expanding the RFC to cover an initial take? We heard some ideas in the past of having a central configuration. Minor edits
|
Thank you for your comments @heitorlessa, it really clarifies a lot! I have revised the original RFC to integrate most of your comments, though I should call out some points:
In the revised RFC, I have proposed supporting Datadog and any other providers we want out of the box to make the current Logger Formatter simpler. We could add the TypedDict in addition to that, or do that as a fast follow, up to you.
I guess if we support Datadog formatting for logs out of the box like
we can just do that behind the scenes. Also, to determine the other providers we want to support out of the box, is this RFC the place where we are asking? Or is there some other forum for that kind of poll specifically?
Of course! Should we have that in a separate RFC though, since this one is just about the Logger and that would involve Logger, Tracer, and Metrics? |
awesome, I somehow missed your message @seshubaws (sorry!!). I'm finishing a documentation issue and going through all RFCs today |
Loved it! Gotta re-read the updated RFC now. As for your questions:
We can use the Poll feature in GitHub Discussions, then drive customers there from our docs, board, Discord, and social media :) It'll also be good to find out which ones don't work outside the box (only DataDog didn't work out as far as I remember).
Good idea! A new RFC would be better. I did learn from another customer interview last week that they tend to use one Logging provider (e.g., CloudWatch), and use another for Metrics and Traces primarily to optimize costs. |
That's it! That's exactly the simple developer experience customers love about Powertools - you're awesome @seshubaws! Feel free to start the implementation (or let us know if you want us to!). Given how related it is, we could also address the I've pinged a Serverless Partner SA for guidance on how to best ensure a fair chance for AWS Partners (whether to support a few out of the box, or do it based on customer demand and Partner's contribution). For now, assume Datadog as an example, as we can revert that easily if we need to. Thanks |
Hey @heitorlessa, |
hey @PierreKiwi no worries ;) Feel free to share them here. I've got one question to help us determine the implementation for DataDog. Q: When using the new Or, would you expect the logging keys to be remapped and follow DataDog initial convention[1]? Context If you were starting a few weeks from now, we'd recommend using Logger(service="payment", logger_formatter=DatadogLogFormatter()) What we're unsure is what the formatter should change - the timestamp value only, or logging keys to match what DataDog suggest as their default logger. I think it's the former since the latter is less rich compared to what we provide out of the box. Thanks a lot!! [1] DataDog log that @seshubaws found. {
"datetime": "2021-11-22 15:32:02,145",
"level": "INFO",
"message": "Request from Alice received",
"requestId": "9b1deb4d-3b7d-4bad"
} |
Hello @heitorlessa, Disclaimer: I want to say that what I going to describe below is what works for us atm and I am more than happy to be corrected and learn better practices. A bit of context first. We do have Lambad functions using AWS Lambda Powertools and Datadog dd-trace-py (both via layers). Our main goal is to have the traces, logs, and metrics correlated and ingested in Datadog with minimum effort. So, in the __init.py__.py of our Lambda, we do usually have something like this: # import from Python standard library (like datetime, os, etc.)
from ddtrace import patch
# Patch specific packages before importing them
patch(
botocore=True, # note: this also patches boto3
logging=True,
requests=True,
)
# pylint: disable=wrong-import-position
import simple_salesforce
from aws_lambda_powertools import Logger
from aws_lambda_powertools.logging.utils import copy_config_to_registered_loggers
from boto3 import Session
# pylint: enable=wrong-import-position
# Extract some environment variables
ENV: str = os.environ.get("ENV", "dev")
LOG_LEVEL = os.environ.get("LOG_LEVEL", "DEBUG")
SAMPLE_RATE = float(os.environ.get("POWERTOOLS_LOGGER_SAMPLE_RATE", 0.1))
SERVICE_NAME = os.environ.get("POWERTOOLS_SERVICE_NAME", "something",)
VERSION = os.environ.get("VERSION", "0.1.0")
# Configure logger (including 3rd party ones)
LOGGER = Logger(
**{
"env": ENV,
"level": LOG_LEVEL,
"location": "%(name)s.%(lineno)d",
"sample_rate": SAMPLE_RATE,
"service": SERVICE_NAME,
"use_rfc3339": True,
"version": VERSION,
}
)
copy_config_to_registered_loggers(
source_logger=LOGGER,
include={
"simple_salesforce",
},
) Ok, so quite a lot to unpack here... Let's start with the We then continue our imports and eventually configure the root AWS Lambda Powertools Examples: apparently the extra
Finally we set up our 3rd party loggers using the handy Ok, so far so good. What does a log statement with this setup look like ? {
"level": "DEBUG",
"location": "my.location.goes.here",
"message": "Calculating since and to fields based on time (2023-04-20 07:32:54+00:00) and delta (5 minutes)...",
"timestamp": "2023-04-20T07:33:02.637+00:00",
"service": "our-service",
"sampling_rate": "0.1",
"env": "prod",
"sample_rate": 0.1,
"version": "1.0.842665798-c1a47030",
"name": "...",
"cold_start": false,
"function_name": "function-name",
"function_memory_size": "256",
"function_arn": "function-arn",
"function_request_id": "my-function-request-id",
"dd.version": "1.0.842665798-c1a47030",
"dd.env": "prod",
"dd.service": "something",
"dd.trace_id": "1262616779456844194",
"dd.span_id": "2600278220644030616",
"xray_trace_id": "1-6440eaae-3f50e7841185b74572f911a2"
} As you can see, patching There is a bit of duplication ( Note: we do use CDK and we leverage the Datadog CDK constructs (see here) so that we do not need to remember how to set those const datadog = new Datadog(this, "datadog", {
apiKeySecretArn: secretDatadog.secretArn, // DD_API_KEY_SECRET_ARN
env: this.env.name, // DD_ENV
enableDatadogLogs: true, // DD_SERVERLESS_LOGS_ENABLED (true by default but let's be explicit)
enableDatadogTracing: true, // DD_TRACE_ENABLED (true by default but let's be explicit)
enableMergeXrayTraces: true, // DD_MERGE_XRAY_TRACES
extensionLayerVersion: config.get("extensions.datadogLayerVersion"),
// Must align with Lambda function Python runtime
pythonLayerVersion: config.get("extensions.datadogPythonLayerVersion"),
service: this.props.serviceName, // DD_SERVICE
tags: repository.getDatadogTags(), // DD_TAGS
version: this.props.version, // DD_VERSION
}); Small disgression, inspired by it, I have built something similar for AWS Lambda Powertools so I do not need to remember how to setup const awsLambdaPowertools = new AwsLambdaPowertools(this, "aws-lambda-powertools", {
layerVersion: config.get("extensions.awsLambdaPowertoolsPythonLayerVersion"),
service: this.props.serviceName, // POWERTOOLS_SERVICE_NAME
}); Ok, back to our sheeps (I am a Kiwi after all !)... At this stage, we got our traces and logs correlated (haven't looked much about metrics just yet) in Datadog so pretty happy about it and the setup is pretty simple. Could it simplified ? With But I would conclude saying that the bare minimum is def. to have Hope this long message with concrete setup and examples helps and please let me know your feedback/suggestions. Regards |
This is "hot chips" @PierreKiwi !! It unblocks us. This means we can limit our DatadogLogFormatter to the timestamp format only and keep it super simple, helping with maintenance and for discoverability (anyone searching for Datadog in the docs). In your case, keep using what you're doing. This new formatter will have the exact same effect as setting LOGGER = Logger(
**{
"env": ENV,
"level": LOG_LEVEL,
"location": "%(name)s.%(lineno)d",
"sample_rate": SAMPLE_RATE,
"service": SERVICE_NAME,
"use_rfc3339": True,
"version": VERSION,
}
) Thank you so much! @seshubaws would you like me to create a PR for this, or would you like to tackle it? Items we need to address in an upcoming PR:
|
PR is up! #2183 |
Great work @heitorlessa :) |
Now merged! Before closing, I'll make a last comment by EOD to summarize key decisions and current UX to ease referencing in the future |
Final comments to close this RFC.
|
Last but not least, huge thank you @seshubaws for working on the RFC and tweaking it several times. Much appreciated!! |
|
This is now released under 2.15.0 version! |
Is this related to an existing feature request or issue?
#1433
Which AWS Lambda Powertools utility does this relate to?
Logger
Summary
This RFC is the first of three that will detail how the Logger, Metrics, and Tracer utilities can be extended to support a bring-your-own-observability-provider format for customers who want to use platforms other than AWS CloudWatch.
This RFC will be specifically for the Logger. Currently, in the Logger for Python (and Typescript), customers can bring their own Logging Formatter and have the option to provide a configuration option to define the output format when they're setting up their Logger in order to allow for better integration with observability solutions other than CloudWatch.
Since this is already quite powerful, we are seeking to improve the existing mechanism by offering out of the box support for Datadog and a few other providers (TBD) for the Logger Formatter.
Metrics RFC: #2015
Tracer RFC: #2030
Use case
The primary use case for this utility would be for developers who want a simplified way to integrate the Logger Formatter with the most popular third party observability providers (ie. DataDog).
Proposal
Current Logger Formatter experience
Below is a simple snippet of how the Logger Formatter currently works:
Logger Formatter proposal
For this new utility, we propose a new parameter to the existing logger that developers can use to specify which observability provider they would like to receive their logs. The below code snippet is a rudimentary look at how this utility can be used and how it will function. Out of the box, we will support DataDog and a few other providers (TBD).
Custom logger usage
If the customer would like to use another observability provider, or define their own logger functions, we will define an interface that the customer can implement and pass in to the Logger class.
Out of scope
Sending logs from Powertools to the customer's desired observability platform will not be in the scope of this project. The implementation should only support modifying the output of the Logger so that the customer can push them to their platform of choice using the Lambda Handler.
Potential challenges
We need to determine which platforms we want to support out-of-the-box (apart from Datadog).
Dependencies and Integrations
We will have to integrate with (and thus, have a dependency on) Datadog and any other platforms we decide to support out-of-the-box.
Alternative solutions
Acknowledgment
The text was updated successfully, but these errors were encountered: