Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[envoy] new integration #1156

Merged
merged 15 commits into from
Mar 13, 2018
Merged

[envoy] new integration #1156

merged 15 commits into from
Mar 13, 2018

Conversation

ofek
Copy link
Contributor

@ofek ofek commented Feb 25, 2018

Notes

The 305 metrics were taken from:

✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/cluster_manager/cluster_stats.html#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/stats.html#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/stats#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/mongo_proxy_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/redis_proxy_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/rate_limit_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/client_ssl_auth_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/tcp_proxy_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/rds#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/buffer_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/dynamodb_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/grpc_http1_bridge_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/ip_tagging_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/rate_limit_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/router_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/cluster_manager/cds#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/runtime#statistics

Additional Notes

  • I've excluded undocumented metrics
  • We must exclude these and these until Stat name collisions envoyproxy/envoy#2683 is resolved.
  • To avoid enumerating every HTTP status code, I have intentionally excluded:
    • upstream_rq_<*> from here and here
    • each upstream_rq_total_xxx and upstream_rq_time_xxx from here
  • I've excluded experimental 3rd party metrics:
    • DynamoDB capacity.<operation_name>.__partition_id=<last_seven_characters_from_partition_id> from here
  • I've excluded deprecated metrics:
    • cluster.outlier_detection.ejections_total and cluster.outlier_detection.ejections_consecutive_5xx from here

Team

In an effort to simplify things for the new testing approach:

  • flake8 configuration is now in tox.ini, eliminating the need for setup.cfg.
  • tests/requirements.txt is now requirements-dev.txt.

Review

@DataDog/croissant

and

@mattklein123 @htuch @mrice32 @jmarantz Please feel free to chime in. You would be particularly interested in:

the parser (implementation is the same as discussed in Slack)
https://github.com/DataDog/integrations-core/pull/1156/files#diff-26242bcecd44a7dd35184c58c4813ed9

our descriptions of Envoy (tell me what you'd like it to say)
https://github.com/DataDog/integrations-core/pull/1156/files#diff-80bc075ec660cedefbacfbb08a41c682R5
https://github.com/DataDog/integrations-core/pull/1156/files#diff-4d2ed940a01536f3edd5499b23f267ccR4

@ofek ofek changed the title add Envoy integration [envoy] new integration Mar 7, 2018
@ofek ofek requested review from masci, hush-hush, truthbk and gmmeyer March 7, 2018 07:38
@ofek ofek added this to the 6.1 milestone Mar 7, 2018
@ofek ofek requested a review from a team March 7, 2018 16:53
# supply a list of tags. The admin endpoint must be accessible.
# https://www.envoyproxy.io/docs/envoy/latest/operations/admin

- stats_url: http://localhost:80/stats
Copy link

@htuch htuch Mar 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had a stats sink for emitting to DataDog, https://github.com/envoyproxy/data-plane-api/blob/master/envoy/config/metrics/v2/stats.proto#L157, this is a push model. How come we have pull here? I'd echo the same security concerns as in envoyproxy/data-plane-api#523. CC: @taiki45

edit: fixed tag to Taiki

Copy link
Contributor Author

@ofek ofek Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @htuch and @taiki45,

Thanks for sharing your concerns with us! We understand them, and this is what we are doing at the moment to address them:

  1. Until Admin endpoint security envoyproxy/envoy#2763 is resolved, we are offering our customers an alternative way to access Envoy's /stats endpoint.
  2. Our envoy integration supports SSL verification of the Envoy endpoint (and this requires Envoy to support SSL). If the SSL verification fails for any reason when enabled, the integration will not collect metrics.
  3. Our integration has the option to support basic authentication (but this requires Envoy to implement basic authentication).
  4. Our integration does not need to be aware of RBAC, but when Envoy implements it, our integration will use it transparently (via basic authentication).

At the moment, we do not see the benefit of supporting the dogstatsd connector, because that still does not prevent the admin endpoint on Envoy from being exposed to the entire trusted network. A customer could use that to be more secure in some network scenarios, but we would also like to provide our customers with a working agent integration for Envoy in its current form.

We also need to do some massaging of metrics in the integration for our backend, which the dogstatsd connector currently does not allow. Additionally, we are finding our parser a bit more resilient to stat name collisions. @jmarantz mentioned that he has a similar implementation for which he’ll issue a PR to Envoy soon.

We would be happy to enforce authentication by default, provided that Envoy implements it. We hope this addresses your concerns. Thanks again for sharing your thoughts with us!

Regards,
The Agent team at Datadog

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this all seems very reasonable. I'd add a couple of points for consideration:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch That's an interesting idea. Would I simply change "address": "tcp://0.0.0.0:8001" to "address": "tcp://127.0.0.1:8001"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofek Yes, that's right. And I assume that prevents the entire admin endpoints from being exposed.

I agree to move to pull-based stats because it can be easily scaled than push-based one, but I'd leave some comments to clarify the points.

At the moment, we do not see the benefit of supporting the dogstatsd connector, because that still does not prevent the admin endpoint on Envoy from being exposed to the entire trusted network. A customer could use that to be more secure in some network scenarios, but we would also like to provide our customers with a working agent integration for Envoy in its current form.

AFAIK, the datadog agent will be running in the customer's host (the same host as running Envoy process). If it's correct, we might be able to send stats to datadog agent without opening any Envoy admin endpoints. The scenario is:

  • Configure Envoy admin endpoint to bind local loopback address (127.0.0.1 in IPv4).
  • Send Envoy stats to datadog agent running on the same host using UDP with dog_statsd sink.

We also need to do some massaging of metrics in the integration for our backend, which the dogstatsd connector currently does not allow.

I suppose you can do that modifications with writing a statsd relay component which receives stats and modifies the stats then re-pushes to somewhere (probably datadog agent?).

IIUC, the current admin's /stats endpoint is missing histogram metrics. envoyproxy/envoy#1947 This can be resolved with envoyproxy/envoy#2736 which will implement pull-based stats with stats sink (I didn't fully take a look the PR yet).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ofek ack.

Copy link
Contributor

@gmmeyer gmmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot more to review, it'll probably come in stages, but these are the first two things that need to be fixed

@@ -0,0 +1 @@
requests==2.18.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to generate the requirements.txt through the general mechanism we use for this: pip-compile --generate-hashes --output-file requirements.txt requirements.in

this is a requirement conflict with datadog-agent-base and a lot of other checks. Please use the version we're already using. If you want to upgrade, you can do that in another PR as it requires a good bit of work to ensure there aren't any conflicts.

Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few comments and a couple of questions but overall this is extremely good material! Tests are 💯with high coverage, 🍰

tasks.py Outdated
'disk',
'kubelet',
'vsphere',
'envoy',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's try to keep the list alphabetically sorted for better readability

self.log.warning(msg)
return

get_method = getattr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: why this alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid a global lookup many times during the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it that much slower?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me, mind to add a comment above that line? So it doesn't get wiped by mistake in the future.

@@ -0,0 +1,2 @@
class UnknownMetric(Exception):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add the license header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -0,0 +1,1228 @@
from .utils import make_metric_tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: license header

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

from .metrics import METRIC_PREFIX, METRIC_TREE, METRICS


def parse_metric(metric):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a docstring roughly describing what this function does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

return METRIC_PREFIX + metric, tags, METRICS[metric]['method']


def reassemble_addresses(seq):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a docstring roughly describing what this function does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


for metric in metrics:
parts = metric.split('.')
mapping = metric_tree
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We make mapping reference the root of the tree at every iteration to create the new branches.

gmmeyer
gmmeyer previously approved these changes Mar 12, 2018
Copy link
Contributor

@gmmeyer gmmeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything I was worried about has been addressed now! Once @masci's comments are addressed feel free to merge!

@ofek ofek merged commit 02d8d55 into master Mar 13, 2018
@ofek ofek deleted the ofek/envoy branch March 13, 2018 00:03
- stats_url: http://localhost:80/stats

# tags:
# - instance:foo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you missed, probably you can use fixed string tags to distinguish Envoy instances. https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/metrics/v2/stats.proto#config-metrics-v2-tagspecifier I don't know the tag can work with admin /stats, but I'm using the fixed tag with dog_statsd sink. Please note the fixed tag feature is only available with Envoy >= v1.6.0.

Copy link
Contributor Author

@ofek ofek Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! We actually allow users to specify any tag they wish, which, as you said, is especially useful for distinguishing between different instances of an integration for metric aggregation.

@masci masci modified the milestones: 6.1, 6.1/5.23 Mar 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants