-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xDS-related metrics #4634
Add xDS-related metrics #4634
Conversation
The xDS/config update-related metrics in the Envoy integration are currently partially out-of-sync with what Envoy reports. Shore up these metrics to be accurate.
(), | ||
), | ||
'method': 'monotonic_count', | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR. Looks good to me overall.
Could you manage to test all/some of those new metrics in e2e ? By adding metrics here
integrations-core/envoy/tests/test_e2e.py
Lines 12 to 161 in 81792d0
'envoy.cluster.bind_errors', | |
'envoy.cluster.lb_healthy_panic', | |
'envoy.cluster.lb_local_cluster_not_ok', | |
'envoy.cluster.lb_recalculate_zone_structures', | |
'envoy.cluster.lb_subsets_active', | |
'envoy.cluster.lb_subsets_created', | |
'envoy.cluster.lb_subsets_fallback', | |
'envoy.cluster.lb_subsets_removed', | |
'envoy.cluster.lb_subsets_selected', | |
'envoy.cluster.lb_zone_cluster_too_small', | |
'envoy.cluster.lb_zone_no_capacity_left', | |
'envoy.cluster.lb_zone_number_differs', | |
'envoy.cluster.lb_zone_routing_all_directly', | |
'envoy.cluster.lb_zone_routing_cross_zone', | |
'envoy.cluster.lb_zone_routing_sampled', | |
'envoy.cluster.max_host_weight', | |
'envoy.cluster.membership_change', | |
'envoy.cluster.membership_healthy', | |
'envoy.cluster.membership_total', | |
'envoy.cluster.retry_or_shadow_abandoned', | |
'envoy.cluster.update_attempt', | |
'envoy.cluster.update_empty', | |
'envoy.cluster.update_failure', | |
'envoy.cluster.update_success', | |
'envoy.cluster.upstream_cx_active', | |
'envoy.cluster.upstream_cx_close_notify', | |
'envoy.cluster.upstream_cx_connect_attempts_exceeded', | |
'envoy.cluster.upstream_cx_connect_fail', | |
'envoy.cluster.upstream_cx_connect_timeout', | |
'envoy.cluster.upstream_cx_destroy', | |
'envoy.cluster.upstream_cx_destroy_local', | |
'envoy.cluster.upstream_cx_destroy_local_with_active_rq', | |
'envoy.cluster.upstream_cx_destroy_remote', | |
'envoy.cluster.upstream_cx_destroy_remote_with_active_rq', | |
'envoy.cluster.upstream_cx_destroy_with_active_rq', | |
'envoy.cluster.upstream_cx_http1_total', | |
'envoy.cluster.upstream_cx_http2_total', | |
'envoy.cluster.upstream_cx_max_requests', | |
'envoy.cluster.upstream_cx_none_healthy', | |
'envoy.cluster.upstream_cx_overflow', | |
'envoy.cluster.upstream_cx_protocol_error', | |
'envoy.cluster.upstream_cx_rx_bytes_buffered', | |
'envoy.cluster.upstream_cx_rx_bytes_total', | |
'envoy.cluster.upstream_cx_total', | |
'envoy.cluster.upstream_cx_tx_bytes_buffered', | |
'envoy.cluster.upstream_cx_tx_bytes_total', | |
'envoy.cluster.upstream_flow_control_backed_up_total', | |
'envoy.cluster.upstream_flow_control_drained_total', | |
'envoy.cluster.upstream_flow_control_paused_reading_total', | |
'envoy.cluster.upstream_flow_control_resumed_reading_total', | |
'envoy.cluster.upstream_rq_active', | |
'envoy.cluster.upstream_rq_cancelled', | |
'envoy.cluster.upstream_rq_completed', | |
'envoy.cluster.upstream_rq_maintenance_mode', | |
'envoy.cluster.upstream_rq_pending_active', | |
'envoy.cluster.upstream_rq_pending_failure_eject', | |
'envoy.cluster.upstream_rq_pending_overflow', | |
'envoy.cluster.upstream_rq_pending_total', | |
'envoy.cluster.upstream_rq_per_try_timeout', | |
'envoy.cluster.upstream_rq_retry', | |
'envoy.cluster.upstream_rq_retry_overflow', | |
'envoy.cluster.upstream_rq_retry_success', | |
'envoy.cluster.upstream_rq_rx_reset', | |
'envoy.cluster.upstream_rq_timeout', | |
'envoy.cluster.upstream_rq_total', | |
'envoy.cluster.upstream_rq_tx_reset', | |
'envoy.cluster.version', | |
'envoy.cluster_manager.active_clusters', | |
'envoy.cluster_manager.cluster_added', | |
'envoy.cluster_manager.cluster_modified', | |
'envoy.cluster_manager.cluster_removed', | |
'envoy.cluster_manager.warming_clusters', | |
'envoy.http.downstream_cx_active', | |
'envoy.http.downstream_cx_destroy', | |
'envoy.http.downstream_cx_destroy_active_rq', | |
'envoy.http.downstream_cx_destroy_local', | |
'envoy.http.downstream_cx_destroy_local_active_rq', | |
'envoy.http.downstream_cx_destroy_remote', | |
'envoy.http.downstream_cx_destroy_remote_active_rq', | |
'envoy.http.downstream_cx_drain_close', | |
'envoy.http.downstream_cx_http1_active', | |
'envoy.http.downstream_cx_http1_total', | |
'envoy.http.downstream_cx_http2_active', | |
'envoy.http.downstream_cx_http2_total', | |
'envoy.http.downstream_cx_idle_timeout', | |
'envoy.http.downstream_cx_protocol_error', | |
'envoy.http.downstream_cx_rx_bytes_buffered', | |
'envoy.http.downstream_cx_rx_bytes_total', | |
'envoy.http.downstream_cx_ssl_active', | |
'envoy.http.downstream_cx_ssl_total', | |
'envoy.http.downstream_cx_total', | |
'envoy.http.downstream_cx_tx_bytes_buffered', | |
'envoy.http.downstream_cx_tx_bytes_total', | |
'envoy.http.downstream_flow_control_paused_reading_total', | |
'envoy.http.downstream_flow_control_resumed_reading_total', | |
'envoy.http.downstream_rq_1xx', | |
'envoy.http.downstream_rq_2xx', | |
'envoy.http.downstream_rq_3xx', | |
'envoy.http.downstream_rq_4xx', | |
'envoy.http.downstream_rq_5xx', | |
'envoy.http.downstream_rq_active', | |
'envoy.http.downstream_rq_http1_total', | |
'envoy.http.downstream_rq_http2_total', | |
'envoy.http.downstream_rq_non_relative_path', | |
'envoy.http.downstream_rq_response_before_rq_complete', | |
'envoy.http.downstream_rq_rx_reset', | |
'envoy.http.downstream_rq_too_large', | |
'envoy.http.downstream_rq_total', | |
'envoy.http.downstream_rq_tx_reset', | |
'envoy.http.downstream_rq_ws_on_non_ws_route', | |
'envoy.http.no_cluster', | |
'envoy.http.no_route', | |
'envoy.http.rq_direct_response', | |
'envoy.http.rq_redirect', | |
'envoy.http.rq_total', | |
'envoy.http.rs_too_large', | |
'envoy.http.tracing.client_enabled', | |
'envoy.http.tracing.health_check', | |
'envoy.http.tracing.not_traceable', | |
'envoy.http.tracing.random_sampling', | |
'envoy.http.tracing.service_forced', | |
'envoy.listener.downstream_cx_active', | |
'envoy.listener.downstream_cx_destroy', | |
'envoy.listener.downstream_cx_total', | |
'envoy.listener.http.downstream_rq_1xx', | |
'envoy.listener.http.downstream_rq_2xx', | |
'envoy.listener.http.downstream_rq_3xx', | |
'envoy.listener.http.downstream_rq_4xx', | |
'envoy.listener.http.downstream_rq_5xx', | |
'envoy.listener_manager.listener_added', | |
'envoy.listener_manager.listener_create_failure', | |
'envoy.listener_manager.listener_create_success', | |
'envoy.listener_manager.listener_modified', | |
'envoy.listener_manager.listener_removed', | |
'envoy.listener_manager.total_listeners_active', | |
'envoy.listener_manager.total_listeners_draining', | |
'envoy.listener_manager.total_listeners_warming', | |
'envoy.runtime.load_error', | |
'envoy.runtime.load_success', | |
'envoy.runtime.num_keys', | |
'envoy.runtime.override_dir_exists', | |
'envoy.runtime.override_dir_not_exists', | |
'envoy.server.days_until_first_cert_expiring', | |
'envoy.server.live', | |
'envoy.server.memory_allocated', | |
'envoy.server.memory_heap_size', | |
'envoy.server.parent_connections', | |
'envoy.server.total_connections', | |
'envoy.server.uptime', | |
'envoy.server.version', |
That would probably need some changes in setup files here: https://github.com/DataDog/integrations-core/tree/81792d0e48f8083fb288a411662ba4f1a39ba894/envoy/tests/docker/default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed another commit which adds this, but I'm seeing test failures with the newly-added metrics despite verifying that those metrics are properly parsed in the unit tests from my last commit, and verifying that the Envoy instance is now reporting those metrics with the added controlplane implementation. Do you know what might be causing that?
EDIT: Huh, looks like they passed just fine in CI. Guess it was a quirk on my machine 😄
3b62c50
to
dc92e48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@csssuf You did an absolutely excellent job here, thanks!!!
What does this PR do?
The xDS/config update-related metrics in the Envoy integration are
currently partially out-of-sync with what Envoy reports. Shore up these
metrics to be accurate.
Motivation
Monitoring communication between Envoys and their control plane.
Review checklist (to be filled by reviewers)
changelog/
andintegration/
labels attached