Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): add Datadog observability provider #2906

Merged
merged 44 commits into from
Aug 14, 2023

Conversation

roger-zhangg
Copy link
Member

@roger-zhangg roger-zhangg commented Aug 1, 2023

Issue number:
#2903
#2015

Summary

We didn't included the Datadog metrics provider in our initial metrics provider release due to a dependency conflict. Now that the conflict is resolved, This RFC is for adding back the previously included Datadog metrics provider

Changes

Please provide a summary of what's being changed
If our customer would like to use Datadog for metrics. This pre-defined Datadog metrics provider is the solution. With pre-defined Datadog metrics provider, a current Cloudwatch EMF metrics user can switch to use Datadog metrics by making few lines of code changes.

⚠️ Customer still need to setup either a Datadog Lambda Layer or a Datadog log forwarder to send the metrics.

User experience

from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.metrics.provider import DatadogMetrics
from aws_lambda_powertools.metrics.provider import DatadogMetricsProvider

# Use datadog-defined metrics provider
provider = DatadogMetricsProvider()
metrics = DatadogMetrics(provider=provider)

@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
    metrics.add_metric(name="SuccessfulBooking", value=1, tags={"product":"ticket","order":"online"})
    
# JSON output to log
# {
#     "m": "SuccessfulBooking", 
#     "v": 1,
#     "e": 1678509106, 
#     "t": "['product:ticket', 'order:online']"
# }

    metrics.add_metric(name="SuccessfulBooking", value=1, product="ticket", order="online")
# JSON output to log
# {
#     "m": "SuccessfulBooking", 
#     "v": 1,
#     "e": 1678509106, 
#     "t": "['product:ticket', 'order:online']"
# }

How this provider works

stateDiagram-v2
    direction LR
    LambdaCode: Lambda code with Powertools for AWS Lambda
    DatadogSDK: Datadog SDK
    DatadogExtension: Datadog Lambda Extension
    Datadog: Datadog Dashboard

    LambdaCode --> DatadogSDK
    DatadogSDK --> DatadogExtension
    DatadogExtension --> Datadog
Loading

Please share what the user experience looks like before and after this change

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

refactor base and exception to resolve a circular import issue
add datadog provider tests
@roger-zhangg roger-zhangg requested a review from a team as a code owner August 1, 2023 22:52
@boring-cyborg boring-cyborg bot added dependencies Pull requests that update a dependency file metrics tests labels Aug 1, 2023
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 1, 2023
@heitorlessa
Copy link
Contributor

heitorlessa commented Aug 2, 2023

E2E tests are failing on Metrics after the provider merge - let's hold on before we continue here.

@github-actions github-actions bot added the feature New feature or functionality label Aug 2, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 2, 2023

Codecov Report

Patch coverage: 99.26% and project coverage change: +0.13% 🎉

Comparison is base (0485c8a) 96.42% compared to head (400486c) 96.56%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2906      +/-   ##
===========================================
+ Coverage    96.42%   96.56%   +0.13%     
===========================================
  Files          171      175       +4     
  Lines         7690     7825     +135     
  Branches       570     1476     +906     
===========================================
+ Hits          7415     7556     +141     
+ Misses         219      217       -2     
+ Partials        56       52       -4     
Files Changed Coverage Δ
aws_lambda_powertools/metrics/metrics.py 96.36% <ø> (ø)
...da_powertools/metrics/provider/datadog/warnings.py 83.33% <83.33%> (ø)
aws_lambda_powertools/metrics/provider/base.py 95.65% <100.00%> (+0.41%) ⬆️
...da_powertools/metrics/provider/datadog/__init__.py 100.00% <100.00%> (ø)
...bda_powertools/metrics/provider/datadog/datadog.py 100.00% <100.00%> (ø)
...bda_powertools/metrics/provider/datadog/metrics.py 100.00% <100.00%> (ø)
aws_lambda_powertools/shared/constants.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@leandrodamascena
Copy link
Contributor

E2E tests are failing on Metrics after the provider merge - let's hold on before we continue here.

We fixed the problem in the PR #2910. We can go ahead with PR.

@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 9, 2023
@leandrodamascena
Copy link
Contributor

leandrodamascena commented Aug 9, 2023

Still missing in this PR:

  • Add default tags to provide a similar experience as default_dimenions in the EMF provider.
  • Clean the code
  • Improve docstrings
  • Create documentation
  • Check code coverage and add # pragma: no cover to parts of code that we cannot cover with functional/unit tests.

@leandrodamascena leandrodamascena changed the title feat(metrics): re-adding Datadog metrics provider feat(metrics): add Datadog metrics provider Aug 9, 2023
@boring-cyborg boring-cyborg bot added the commons label Aug 9, 2023
@leandrodamascena
Copy link
Contributor

Hi @roger-zhangg and @heitorlessa! I think this PR is ready to review and I have some information to make easy the review and tests:

1 - TAGS
Datadog uses tags instead of dimensions and you can define one tag per metric. Therefore, a tag defined in the add_metric method takes precedence over a tag defined in the set_default_tags method or default_tags in @metrics.log_metrics.

2 - ENV
Similar to CloudWatchEMFProvider, the customer can set POWERTOOLS_METRICS_NAMESPACE to define the default metrics namespace. If the customer doesn't inform a namespace, then default will be assumed.

3 - Flushing metrics
We support both methods to flush a metrics:
3.1 - Using datadog SDK: https://docs.datadoghq.com/serverless/installation/python/?tab=datadogcli#monitor-custom-business-logic
3.2 - Using Datadog Log Forward: https://docs.datadoghq.com/logs/guide/forwarder/?tab=cloudformation

The customer just need to set the env variable DD_FLUSH_TO_LOG or setting flush_to_log in the Provider/Metric class.

We have these two supports thanks to @roger-zhangg fantastic job investigating supported methods.

4 - Live test

4.1 - Clone this branch
4.2 - Create a SAM application using any supported Python runtime
4.3 - Add in requirements.txt the directory where you cloned the branch. E.g: /home/leandro/DEVEL-PYTHON/lambda-powertools
4.4 - Add the code below in your app.py

from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.metrics.provider.datadog import DatadogMetrics, DatadogProvider

provider = DatadogProvider()
metrics = DatadogMetrics(provider=provider)
metrics.set_default_tags(powertools="datadog")

@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event: dict, context: LambdaContext):
    
    #DATADOG
    metrics.add_metric(name="PRDataDog", value=1)

4.4 - If you want to deploy to your AWS account and send metrics to Datadog then follow steps 4.5 and 4.6, otherwise skip to 4.7

4.5 - In template.yaml you need to add environment variables DD_API_KEY and DD_SITE with respective values - you need to create an account in Datadog for this.
4.6 - Add the layers arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Extension:45 and arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Python310:75. Deploy to Lambda and test.

4.7 - If you want to test locally to see if the metrics are created, just add the environment variable DD_FLUSH_TO_LOG with value "True" and execute your Lambda locally.

It is still missing create the documentation, but I think the code is done.

Thanks so much again @roger-zhang! We can only get this far because you've done most of the work. 🚀

@roger-zhangg
Copy link
Member Author

Thanks Leandro for the refactor! Looks good to me

Copy link
Contributor

@heitorlessa heitorlessa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UX suggestions to improve the customer experience in setting default tags, and overriding tags within a add_metric operation.

@heitorlessa
Copy link
Contributor

heitorlessa commented Aug 14, 2023

Final tasks on docs for me in this last pass -- next time, we should really do this in a separate PR given the level of detail we need.


  • Consider /provider for each observability provider
  • Collapse navigation by default due to size
  • Enhance Datadog mermaid
  • Provide minimal terminologies to improve customers experience
  • Correct typo in terminologies
  • Remove duplicate messaging
  • Use blockquote instead of note in prime reading areas e.g., Install note on Datadog Forwarder
  • Shorten install wording to give info upfront
  • Fix SAM highlights
  • Keep add_metrics simple without provider, focusing on timestamp / no timestamp
  • Fix env vars placement
  • Ensure default tags call out the DatadogMetrics instance
  • Call out that Datadog Forwarder is a legacy option
  • Move Datadog Forwarder as an advanced, largely to stay out of focus
  • Ensure metric tag is called out instead of tag
  • Ensure we call out Datadog SDK config too (e.g., API Key)
  • Add callout why we use DD_FLUSH_TO_LOG in tests
  • Call out how to pass API Key for Datadog (e.g., DD_API_KEY or DD_. Secer)
  • Missing tags validation
  • Missing test for timestamp feature
  • Missing test for default_tags feature with two DatadogMetrics instance
  • Missing test for tags warning

Separate PR

  • Consider Event Handler index just like Metrics after nav collapse
  • Add mermaid for Metrics core too
  • Add Observability Provider section in Metrics.md

@heitorlessa heitorlessa changed the title feat(metrics): add Datadog metrics provider feat(metrics): add Datadog observability provider Aug 14, 2023
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 10 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@heitorlessa
Copy link
Contributor

approved - LGTM! I'd only ask to create a separate issue to investigate testing when datadog SDK isn't available, so we're fully covered.

@leandrodamascena leandrodamascena merged commit 8ca9017 into aws-powertools:develop Aug 14, 2023
@leandrodamascena leandrodamascena linked an issue Aug 14, 2023 that may be closed by this pull request
2 tasks
heitorlessa added a commit to seshubaws/aws-lambda-powertools-python that referenced this pull request Aug 15, 2023
heitorlessa added a commit to seshubaws/aws-lambda-powertools-python that referenced this pull request Aug 15, 2023
* develop:
  chore: cleanup, add test for single and nested
  fix(parameters): make cache aware of single vs multiple calls
  docs(roadmap): add GovCloud and China region item (aws-powertools#2960)
  docs(metrics): update Datadog integration diagram (aws-powertools#2954)
  chore(ci): changelog rebuild (aws-powertools#2958)
  chore(deps-dev): bump cfn-lint from 0.79.6 to 0.79.7 (aws-powertools#2956)
  chore(deps): bump actions/setup-node from 3.7.0 to 3.8.0 (aws-powertools#2957)
  chore(deps-dev): bump xenon from 0.9.0 to 0.9.1 (aws-powertools#2955)
  feat(metrics): add Datadog observability provider (aws-powertools#2906)
  feat(event_handler): allow stripping route prefixes using regexes (aws-powertools#2521)
  chore(ci): changelog rebuild (aws-powertools#2952)
  chore(deps): bump pypa/gh-action-pypi-publish from 1.8.9 to 1.8.10 (aws-powertools#2946)
  chore(deps): bump gitpython from 3.1.31 to 3.1.32 in /docs (aws-powertools#2948)
  chore(deps-dev): bump aws-cdk from 2.90.0 to 2.91.0 (aws-powertools#2947)
  chore(ci): changelog rebuild (aws-powertools#2945)
  chore(deps-dev): bump the boto-typing group with 1 update (aws-powertools#2944)
  chore(deps): bump pypa/gh-action-pypi-publish from 1.8.8 to 1.8.9 (aws-powertools#2943)

Signed-off-by: heitorlessa <[email protected]>
seshubaws added a commit to seshubaws/aws-lambda-powertools-python that referenced this pull request Sep 8, 2023
* fix(parameters): make cache aware of single vs multiple calls

Signed-off-by: heitorlessa <[email protected]>

* chore: cleanup, add test for single and nested

Signed-off-by: heitorlessa <[email protected]>

* chore(deps): bump pypa/gh-action-pypi-publish from 1.8.8 to 1.8.9 (aws-powertools#2943)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump the boto-typing group with 1 update (aws-powertools#2944)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Leandro Damascena <[email protected]>

* chore(ci): changelog rebuild (aws-powertools#2945)

Co-authored-by: Powertools for AWS Lambda (Python) bot <[email protected]>

* chore(deps-dev): bump aws-cdk from 2.90.0 to 2.91.0 (aws-powertools#2947)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump gitpython from 3.1.31 to 3.1.32 in /docs (aws-powertools#2948)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump pypa/gh-action-pypi-publish from 1.8.9 to 1.8.10 (aws-powertools#2946)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Leandro Damascena <[email protected]>

* chore(ci): changelog rebuild (aws-powertools#2952)

Co-authored-by: Powertools for AWS Lambda (Python) bot <[email protected]>

* feat(event_handler): allow stripping route prefixes using regexes (aws-powertools#2521)

Co-authored-by: Roy Assis <[email protected]>
Co-authored-by: Ruben Fonseca <[email protected]>

* feat(metrics): add Datadog observability provider (aws-powertools#2906)

Co-authored-by: Leandro Damascena <[email protected]>
Co-authored-by: heitorlessa <[email protected]>

* chore(deps-dev): bump xenon from 0.9.0 to 0.9.1 (aws-powertools#2955)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump actions/setup-node from 3.7.0 to 3.8.0 (aws-powertools#2957)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps-dev): bump cfn-lint from 0.79.6 to 0.79.7 (aws-powertools#2956)

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Leandro Damascena <[email protected]>

* chore(ci): changelog rebuild (aws-powertools#2958)

Co-authored-by: Powertools for AWS Lambda (Python) bot <[email protected]>

* docs(metrics): update Datadog integration diagram (aws-powertools#2954)

Co-authored-by: Leandro Damascena <[email protected]>

* docs(roadmap): add GovCloud and China region item (aws-powertools#2960)

* fix(parameters): make cache aware of single vs multiple calls

Signed-off-by: heitorlessa <[email protected]>

* chore: cleanup, add test for single and nested

Signed-off-by: heitorlessa <[email protected]>

* chore(test): remove itsdangerous from perf test

Signed-off-by: heitorlessa <[email protected]>

* chore(deps): remove itsdangerous dependencies

* chore: disable sockets in encryption sdk tests

Signed-off-by: heitorlessa <[email protected]>

* refactor(tests): use a test double

* chore: address make pr errors

Signed-off-by: heitorlessa <[email protected]>

---------

Signed-off-by: heitorlessa <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Leandro Damascena <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Powertools for AWS Lambda (Python) bot <[email protected]>
Co-authored-by: roy <[email protected]>
Co-authored-by: Roy Assis <[email protected]>
Co-authored-by: Ruben Fonseca <[email protected]>
Co-authored-by: Roger Zhang <[email protected]>
Co-authored-by: aal80 <[email protected]>
Co-authored-by: Seshu Brahma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commons dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation feature New feature or functionality metrics size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC: Add Datadog metrics provider
4 participants