-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[envoy] new integration #1156
[envoy] new integration #1156
Conversation
# supply a list of tags. The admin endpoint must be accessible. | ||
# https://www.envoyproxy.io/docs/envoy/latest/operations/admin | ||
|
||
- stats_url: http://localhost:80/stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had a stats sink for emitting to DataDog, https://github.com/envoyproxy/data-plane-api/blob/master/envoy/config/metrics/v2/stats.proto#L157, this is a push model. How come we have pull here? I'd echo the same security concerns as in envoyproxy/data-plane-api#523. CC: @taiki45
edit: fixed tag to Taiki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing your concerns with us! We understand them, and this is what we are doing at the moment to address them:
- Until Admin endpoint security envoyproxy/envoy#2763 is resolved, we are offering our customers an alternative way to access
Envoy
's/stats
endpoint. - Our
envoy
integration supports SSL verification of theEnvoy
endpoint (and this requiresEnvoy
to support SSL). If the SSL verification fails for any reason when enabled, the integration will not collect metrics. - Our integration has the option to support basic authentication (but this requires
Envoy
to implement basic authentication). - Our integration does not need to be aware of RBAC, but when
Envoy
implements it, our integration will use it transparently (via basic authentication).
At the moment, we do not see the benefit of supporting the dogstatsd
connector, because that still does not prevent the admin endpoint on Envoy
from being exposed to the entire trusted network. A customer could use that to be more secure in some network scenarios, but we would also like to provide our customers with a working agent integration for Envoy
in its current form.
We also need to do some massaging of metrics in the integration for our backend, which the dogstatsd
connector currently does not allow. Additionally, we are finding our parser a bit more resilient to stat name collisions. @jmarantz mentioned that he has a similar implementation for which he’ll issue a PR to Envoy
soon.
We would be happy to enforce authentication by default, provided that Envoy
implements it. We hope this addresses your concerns. Thanks again for sharing your thoughts with us!
Regards,
The Agent team at Datadog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this all seems very reasonable. I'd add a couple of points for consideration:
- With your example at https://github.com/DataDog/integrations-core/tree/ofek/envoy/envoy#secured-stats-endpoint, should you have the admin bind only to 127.0.0.1 to avoid incoming connections from NICs?
- I probably worded the above a little strongly; until Admin endpoint security envoyproxy/envoy#2763 is resolved we don't have a great story for access outside a trusted network (although I like your example as a temporary workaround, you could post it to Admin endpoint security envoyproxy/envoy#2763 once the above fix is made). We've recently been discussing this in Documentation for hystrix dashboard support feature (issue #1244) envoyproxy/data-plane-api#523 for Hystrix and agreed to separate out the admin security concerns from pull-based stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch That's an interesting idea. Would I simply change "address": "tcp://0.0.0.0:8001"
to "address": "tcp://127.0.0.1:8001"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ofek Yes, that's right. And I assume that prevents the entire admin endpoints from being exposed.
I agree to move to pull-based stats because it can be easily scaled than push-based one, but I'd leave some comments to clarify the points.
At the moment, we do not see the benefit of supporting the dogstatsd connector, because that still does not prevent the admin endpoint on Envoy from being exposed to the entire trusted network. A customer could use that to be more secure in some network scenarios, but we would also like to provide our customers with a working agent integration for Envoy in its current form.
AFAIK, the datadog agent will be running in the customer's host (the same host as running Envoy process). If it's correct, we might be able to send stats to datadog agent without opening any Envoy admin endpoints. The scenario is:
- Configure Envoy admin endpoint to bind local loopback address (127.0.0.1 in IPv4).
- Send Envoy stats to datadog agent running on the same host using UDP with dog_statsd sink.
We also need to do some massaging of metrics in the integration for our backend, which the dogstatsd connector currently does not allow.
I suppose you can do that modifications with writing a statsd relay component which receives stats and modifies the stats then re-pushes to somewhere (probably datadog agent?).
IIUC, the current admin's /stats
endpoint is missing histogram metrics. envoyproxy/envoy#1947 This can be resolved with envoyproxy/envoy#2736 which will implement pull-based stats with stats sink (I didn't fully take a look the PR yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ofek ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a lot more to review, it'll probably come in stages, but these are the first two things that need to be fixed
envoy/requirements.in
Outdated
@@ -0,0 +1 @@ | |||
requests==2.18.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to generate the requirements.txt through the general mechanism we use for this: pip-compile --generate-hashes --output-file requirements.txt requirements.in
this is a requirement conflict with datadog-agent-base and a lot of other checks. Please use the version we're already using. If you want to upgrade, you can do that in another PR as it requires a good bit of work to ensure there aren't any conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few comments and a couple of questions but overall this is extremely good material! Tests are 💯with high coverage, 🍰
tasks.py
Outdated
'disk', | ||
'kubelet', | ||
'vsphere', | ||
'envoy', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's try to keep the list alphabetically sorted for better readability
envoy/datadog_checks/envoy/envoy.py
Outdated
self.log.warning(msg) | ||
return | ||
|
||
get_method = getattr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious: why this alias?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid a global lookup many times during the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it that much slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me, mind to add a comment above that line? So it doesn't get wiped by mistake in the future.
@@ -0,0 +1,2 @@ | |||
class UnknownMetric(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we add the license header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
@@ -0,0 +1,1228 @@ | |||
from .utils import make_metric_tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: license header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
from .metrics import METRIC_PREFIX, METRIC_TREE, METRICS | ||
|
||
|
||
def parse_metric(metric): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a docstring roughly describing what this function does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
return METRIC_PREFIX + metric, tags, METRICS[metric]['method'] | ||
|
||
|
||
def reassemble_addresses(seq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a docstring roughly describing what this function does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
|
||
for metric in metrics: | ||
parts = metric.split('.') | ||
mapping = metric_tree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make mapping
reference the root of the tree at every iteration to create the new branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think everything I was worried about has been addressed now! Once @masci's comments are addressed feel free to merge!
- stats_url: http://localhost:80/stats | ||
|
||
# tags: | ||
# - instance:foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you missed, probably you can use fixed string tags to distinguish Envoy instances. https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/metrics/v2/stats.proto#config-metrics-v2-tagspecifier I don't know the tag can work with admin /stats
, but I'm using the fixed tag with dog_statsd sink. Please note the fixed tag feature is only available with Envoy >= v1.6.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We actually allow users to specify any tag they wish, which, as you said, is especially useful for distinguishing between different instances of an integration for metric aggregation.
Notes
The 305 metrics were taken from:
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/cluster_manager/cluster_stats.html#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/stats.html#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/stats#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/mongo_proxy_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/redis_proxy_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/rate_limit_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/client_ssl_auth_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/network_filters/tcp_proxy_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_conn_man/rds#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/buffer_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/dynamodb_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/grpc_http1_bridge_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/ip_tagging_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/rate_limit_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/router_filter#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/cluster_manager/cds#statistics
✔️ https://www.envoyproxy.io/docs/envoy/latest/configuration/runtime#statistics
Additional Notes
upstream_rq_<*>
from here and hereupstream_rq_total_xxx
andupstream_rq_time_xxx
from herecapacity.<operation_name>.__partition_id=<last_seven_characters_from_partition_id>
from herecluster.outlier_detection.ejections_total
andcluster.outlier_detection.ejections_consecutive_5xx
from hereTeam
In an effort to simplify things for the new testing approach:
flake8
configuration is now intox.ini
, eliminating the need forsetup.cfg
.tests/requirements.txt
is nowrequirements-dev.txt
.Review
@DataDog/croissant
and
@mattklein123 @htuch @mrice32 @jmarantz Please feel free to chime in. You would be particularly interested in:
the parser (implementation is the same as discussed in Slack)
https://github.com/DataDog/integrations-core/pull/1156/files#diff-26242bcecd44a7dd35184c58c4813ed9
our descriptions of Envoy (tell me what you'd like it to say)
https://github.com/DataDog/integrations-core/pull/1156/files#diff-80bc075ec660cedefbacfbb08a41c682R5
https://github.com/DataDog/integrations-core/pull/1156/files#diff-4d2ed940a01536f3edd5499b23f267ccR4