Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Openmetrics metrics collection #10752

Merged
merged 46 commits into from
Dec 8, 2021
Merged

Support Openmetrics metrics collection #10752

merged 46 commits into from
Dec 8, 2021

Conversation

ChristineTChen
Copy link
Contributor

@ChristineTChen ChristineTChen commented Nov 30, 2021

What does this PR do?

This PR adds support to collect metrics via Openmetrics/Prometheus
Py3 only.

Waiting on #10753

Metrics:

  • gauges are 1:1
  • counters end with .count if the name is suffixed with _total (this is a breaking change)

Motivation

Benefits to OpenmetricsV2:

  • consistency
  • easy to support new metrics in the future
  • clusters/tags already extracted to labels

Other:
Istio users can benefit from collecting Prometheus metrics because Envoy admin port is reserved.

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached

urseberry
urseberry previously approved these changes Nov 30, 2021
Copy link
Contributor

@urseberry urseberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for docs

@github-actions
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@github-actions
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@github-actions
Copy link

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

@github-actions
Copy link

github-actions bot commented Dec 1, 2021

The validations job has failed; please review the Files changed tab for possible suggestions to resolve.

Copy link
Contributor

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

envoy/datadog_checks/envoy/check.py Outdated Show resolved Hide resolved
if sample.name.endswith("_total"):
parsed_sample_name = re.match("(.*)_total$", sample.name).groups()[0]

compiled_name = re.compile(metric_pattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's cache the compiled patterns in the outer scope using collections.defaultdict

envoy/tests/legacy/test_bench.py Outdated Show resolved Hide resolved
#
# cache_metrics: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is cache_metrics no longer supported? It looks like it's still used in the original implementation:

self.caching_metrics = self.instance.get('cache_metrics', True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm documenting openmetrics options only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have the legacy options still be in spec.yaml, but just have them hidden? Similar to when Istio was updated to only include OpenMetricsV2 options but legacy options were still included albeit hidden.

'included_metrics': [r'envoy\.cluster\.'],
'excluded_metrics': [r'envoy\.cluster\.out\.'],
},
'collect_server_info': {
Copy link
Contributor

@yzhan289 yzhan289 Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that collect_server_info is still kept for tests, but it's no longer in your conf.yaml.example. It's still in the legacy implementation. Should this be included as a config option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment, the readme links to the legacy implementation conf.yaml.example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to collect the version still

ofek
ofek previously approved these changes Dec 8, 2021
@ChristineTChen ChristineTChen merged commit 3114930 into master Dec 8, 2021
@ChristineTChen ChristineTChen deleted the cc/envoy-prom branch December 8, 2021 21:42
@mcblair
Copy link

mcblair commented Dec 17, 2021

@ChristineTChen I'm unable to tell if this functionality is present in any of the 7-rc agent images on gcr. I'm running envoy as a part of istio, and intended to follow the path of the new check.py.

My conf.d section in values.yaml is dead simple:

datadog:
  confd:
    envoy.yaml: |-
      init_config:
      instances:
        - openmetrics_endpoint: http://localhost:15090/stats/prometheus?usedonly

Yet the check still chokes on

stats_url is required

Then I found the source code leading me to this PR. Can someone point me to an agent image that contains this?

cswatt pushed a commit that referenced this pull request Jan 5, 2022
* Add logic for Envoy Openmetricsv2

* Add label remapper

* Add new metrics

* Finish adding other metrics

* reorganize metrics that should be transformed

* Introduce openmetrics_endpoint config option

* Add watchdog metrics transformers

* Add some more label extraction metrics

* refactor tests to move to legacy

* Add legacy and non legacy fixtures to test files

* Update readme

* Bump base req

* Fix style

* Mark legacy metrics

* Fix watchdog counter name

* document prometheus metrics in metadata csv

* Fix metadata csv

* Add e2e test

* FIx style

* Fix metadata format for validation

* Flaky metrics

* Only support openmetrics in latest api v3

* Fix test imports

* Enable Openmetrics option by default

* Fix import

* Fix style

* Update readme

* Update config stats_url wording

* Fix envoy import

* Remove py27 for openmetrics version

* Openmetrics endpoint should be optional

* Account for flaky metrics

* Document service checks

* Use unique name

* Update envoy/tests/legacy/test_bench.py

Co-authored-by: Ofek Lev <[email protected]>

* Move metrics map to metrics.py

* Update with feedback

* Use lambda

* simplify match

* Refactor metadata utils

* Support metadata collection in V2

* Use urlunparse

* Reintroduce legacy config options as hidden

Co-authored-by: Ofek Lev <[email protected]>
@coignetp coignetp mentioned this pull request Jan 13, 2022
4 tasks
@ofek ofek mentioned this pull request Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants