-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add easy support for Datadog and possibly other observability solutions #1433
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
It is so rare to receive such a high quality feature request like this that I want us to take time to reply to you accordingly - please bear with us for an answer next week. Until then, we're laying the ground work in E2E and Integ test framework to give us confidence to offer what you're asking -- either native support or expose the mechanisms we already have for customers to build them. For anyone else reading this, please please add your +1 to the author to help us prioritise it. Thank you for taking the time to share such rich detail. |
Hey @petarlishov, our new (internal) E2E framework took a lot longer than I expected to refactor, so I'm replying tomorrow morning to address your questions and a few asks as we started v2 in parallel. |
I'll break down my answer in categories to make it easier to parse later. Making Lambda Powertools more lightweight
I bet you'll be excited to hear that we've started working on v2 (minor breaking changes) to cut down the final package size to ~464K (compressed) 🎉 In v2, we are making all dependencies optional, e.g. @rubenfonseca is leading V2. We'll create a RFC to discuss trade-offs of relying on Lambda runtime's packages, and how we're thinking of our Lambda Layer v2.
Modularization is our medium-term gameThis is an intermediate stage towards modularization in V3. This would need a major structural change but allow customers to pick and choose what they need, going as far as ~16K package size if one wants to. That however needs a ton of research and testing to make sure it's stable and maintainable - we plan to draft a RFC next year once we're comfortable with v2 outcomes. Despite being a major version, we want to prevent disruptions as much as possible to our customers. We're working on our first upgrade guide, and for v3 we even have the ambition to create a linter plugin to help you upgrade faster. Long-term, this will give us the structure we need to add support to non-Lambda runtimes like Fargate, Glue jobs, etc. We have some customers using it that way, but we haven't put a "it's supported" stamp on it yet. We could even expose our private integ/e2e testing utilities as a package for customers ;) Datadog Log format
I'm not sure if you've tried, but Logger supports Bringing Your Own Logging Formatter without forgoing Logger features and UX. We recommend that option for customers looking to only change the final format without having to maintain a different Logger implementation altogether. I'm not fully aware of what Datadog expects in a structured logging and why they have difficulties to query a JSON field. That said, we're more than happy to investigate any non-breaking change we could do on our side if this spans more than Datadog - feature request please! In V3, we'll be able to create a Datadog Metrics format
We could make this more extensible quite easily - wanna create a RFC? RFC will help us agree on a contract for exposing a Metric Provider (a simple sink pattern), so that customers can use the same UX but have different outputs and validation mechanisms. As of now, we didn't invest much and this part of the code could be easily rearranged until we have a proper Provider - https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/metrics/base.py#L139 Datadog Trace compatibility
Tracing has an undocumented BaseProvider for that intent but we haven't been able to put more thoughts into it. Now that we're making X-Ray SDK optional in v2, this becomes a more interesting conversation to have otherwise we'd be forcing customers to have X-Ray SDK lib when they were using a DataDog Provider. The hardest part in Tracing is patching modules and nomenclatures (e.g., segment/span, patching only X but not Y lib) --- wanna create a RFC for what minimum feature set the I initially wanted to have a drop-in replacement Tracing Provider, but then digging into 3-4 tracing providers' lib I saw how much custom logic each provider does and I became less sure of it - a RFC can help us get there ;) We also looked into Open Telemetry but the cold start was too significant, and it was a moving target in terms of changes too. I think exposing our OverallWe're going towards that direction but we'd love help from the community in helping us define a good contract for Providers (Tracing has already). Right now, our main focus is on operational excellence (E2E test) to ensure V2 can be smooth sailing, and pave the road for our future modularization story. We'll continue to respond to feature requests and greatly appreciate any help we can get - we can't wait to create new utilities and new extensibility mechanisms, but first we need confidence large changes can be made ;) Once again, thank you for creating such a comprehensive issue. These make me personally happy that we have a lot to do but also emphasize that got a community who cares ;) Hopefully that answers your questions and remarks, please let us know otherwise! PS: Join us on Discord, we'd love to have you if you aren't there already. cc @rubenfonseca @leandrodamascena @mploski @am29d @saragerion @sliedig |
Update: We estimate one RFC per feature (Tracer, Metrics, Logger) starting in mid-April/early May. This will help focus the discussion on a standard interface to help customers bring their own provider. Since we launched V2, the only difference for Tracer is that we'd stick with AWS X-Ray SDK provider as the default, while providing a built-in provider for OpenTelemetry - other 3rd party providers (e.g., Datadog, Lumigo, NewRelic, etc.) would be owned by them where we'd be happy to collaborate/coordinate. Thank you all!! |
I'm removing the |
UPDATE: We're adding support in Logger for Datadog as a start. We're working on a POC for Metrics, and adding last refinements for Tracer Providers. For Logger, Datadog was the only provider that required a custom timestamp so we've added a Formatter, and documented our recommendation to use Lambda Extensions to not impact in performance. The only reason we're not adding OTel Log output now is because it's not Final yet - please feel free to open a feature request when that happens (whoever is reading and need that) |
Quick update -- @roger-zhangg is working on the last feature: Observability Provider for Tracer. Once that's done, we'll close this issue, and start investigating an alternative solution for OTel as cold starts haven't significantly improved. |
Use case
I have played around with the Datadog and the AWS Powertools Lambda layers and as one that needs to integrate with Datadog, the Datadog Lambda layer is a good choice for getting that integration set up flawlessly.
But I also really enjoy some of the features that AWS Powertools have incorporated into their Lambda layer and as a developer I find it to be a very useful tool. Last time I checked (a while ago), the AWS Powertools layer was more lightweight as well.
Sadly, because both layers use similar packages (boto3 for example among other things), I believe they are not exactly compatible with each other. And either way, adding both would make our Lambdas' start times much worse as both layers are not exactly light either.
I have one collection of Lambdas using the Datadog layer and another collection using Poewrtools. I have thus noticed some of the differences that make Powertools tricky to easily integrate with Datadog despite being the more useful tool purely from a developer's perspective:
{"m": "Metric name", "v": "Metric value", "e": "Unix timestamp (seconds)", "t": "Array of tags"}
. I know this is different from the embedded metrics format which AWS already provides - https://aws.amazon.com/about-aws/whats-new/2019/11/amazon-cloudwatch-launches-embedded-metric-format but maybe we can have a way to support both based on some configuration setting? Sadly the metrics that Powertools provides at the moment are not easily ingestible into DatadogSolution/User Experience
Provide a configuration or an option to define the format when setting up loggers, metrics and traces, which would allow for better integration with other ovservability solutions other than AWS' CloudWatch and XRay
Alternative solutions
It may be possible to do all these already. Maybe a separate package that adds this compatibility can be created that works well with Powertools, as long as Powertools already has easy methods to manipulate its behaviour in the required ways to allow for the requested integration.
Acknowledgment
The text was updated successfully, but these errors were encountered: