[envoy integration] Metrics missing #12855

Shuanglu · 2022-09-02T07:19:16Z

Note: If you have a feature request, you should contact support so the request can be properly tracked.

Output of the info page

root@datadog-cluster-agent-69bc84c5c-rrkch:/# datadog-cluster-agent status
Getting the status from the agent.
2022-09-02 07:00:49 UTC | CLUSTER | WARN | (pkg/util/log/log.go:591 in func1) | Agent configuration relax permissions constraint on the secret backend cmd, Group can read and exec

===============================
Datadog Cluster Agent (v1.22.0)
===============================

  Status date: 2022-09-02 07:00:49.867 UTC (1662102049867)
  Agent start: 2022-08-30 08:45:54.797 UTC (1661849154797)
  Pid: 1
  Go Version: go1.17.11
  Build arch: amd64
  Agent flavor: cluster_agent
  Check Runners: 4
  Log Level: WARN

  Paths
  =====
    Config File: /etc/datadog-agent/datadog-cluster.yaml
    conf.d: /etc/datadog-agent/conf.d

  Clocks
  ======
    System time: 2022-09-02 07:00:49.867 UTC (1662102049867)

  Hostnames
  =========
    ec2-hostname: ****
    host_aliases: [***]
    hostname: ****
    instance-id: ***
    socket-fqdn: datadog-cluster-agent-69bc84c5c-rrkch
    socket-hostname: datadog-cluster-agent-69bc84c5c-rrkch
    hostname provider: container
    unused hostname providers:
      aws: Unable to determine hostname from EC2: Get "http://169.254.169.254/latest/meta-data/instance-id": dial tcp 169.254.169.254:80: connect: connection refused
      azure: azure_hostname_style is set to 'os'
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: GCE metadata API error: Get "http://169.254.169.254/computeMetadata/v1/instance/hostname": dial tcp 169.254.169.254:80: connect: connection refused

  Metadata
  ========

Leader Election
===============
  Leader Election Status:  Running
  Leader Name is: datadog-cluster-agent-69bc84c5c-r6r98
  Last Acquisition of the lease: Fri, 26 Aug 2022 14:02:50 UTC
  Renewed leadership: Fri, 02 Sep 2022 07:00:41 UTC
  Number of leader transitions: 13 transitions

Custom Metrics Server
=====================

  Data sources
  ------------
  URL: https://api.datadoghq.com

  
  ConfigMap name: default/datadog-custom-metrics
  External Metrics
  ----------------
    Total: 0
    Valid: 0
    

Cluster Checks Dispatching
==========================
  Status: Follower, redirecting to leader at 10.42.224.6

Admission Controller
====================
  
    Webhooks info
    -------------
      MutatingWebhookConfigurations name: datadog-webhook
      Created at: 2022-06-01T07:04:25Z
      ---------
        Name: datadog.webhook.config
        CA bundle digest: 4a037a372da419e0
        Object selector: &LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:admission.datadoghq.com/enabled,Operator:NotIn,Values:[false],},},}
        Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
        Service: default/datadog-cluster-agent-admission-controller - Port: 443 - Path: /injectconfig
      ---------
        Name: datadog.webhook.tags
        CA bundle digest: 4a037a372da419e0
        Object selector: &LabelSelector{MatchLabels:map[string]string{},MatchExpressions:[]LabelSelectorRequirement{LabelSelectorRequirement{Key:admission.datadoghq.com/enabled,Operator:NotIn,Values:[false],},},}
        Rule 1: Operations: [CREATE] - APIGroups: [] - APIVersions: [v1] - Resources: [pods]
        Service: default/datadog-cluster-agent-admission-controller - Port: 443 - Path: /injecttags
  
    Secret info
    -----------
    Secret name: webhook-certificate
    Secret namespace: default
    Created at: 2022-06-01T07:04:25Z
    CA bundle digest: 4a037a372da419e0
    Duration before certificate expiration: 6528h3m34.106622362s

=========
Collector
=========

  Running Checks
  ==============
    
    kubernetes_apiserver
    --------------------
      Instance ID: kubernetes_apiserver [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubernetes_apiserver.d/conf.yaml.default
      Total Runs: 16,860
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-09-02 07:00:42 UTC (1662102042000)
      Last Successful Execution Date : 2022-09-02 07:00:42 UTC (1662102042000)
      
    
    orchestrator
    ------------
      Instance ID: orchestrator:*** [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/orchestrator.d/conf.yaml.default
      Total Runs: 25,290
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2022-09-02 07:00:47 UTC (1662102047000)
      Last Successful Execution Date : 2022-09-02 07:00:47 UTC (1662102047000)
      
=========
Forwarder
=========

  Transactions
  ============
    Cluster: 0
    ClusterRole: 0
    ClusterRoleBinding: 0
    CronJob: 0
    DaemonSet: 0
    Deployment: 0
    Dropped: 0
    HighPriorityQueueFull: 0
    Ingress: 0
    Job: 0
    Node: 0
    PersistentVolume: 0
    PersistentVolumeClaim: 0
    Pod: 0
    ReplicaSet: 0
    Requeued: 300
    Retried: 94
    RetryQueueSize: 0
    Role: 0
    RoleBinding: 0
    Service: 0
    ServiceAccount: 0
    StatefulSet: 0

  Transaction Successes
  =====================
    Total number: 33719
    Successes By Endpoint:
      check_run_v1: 16,859
      intake: 1
      series_v1: 16,859

  Transaction Errors
  ==================
    Total number: 11
    Errors By Type:
      DNSErrors: 11

  On-disk storage
  ===============
    On-disk storage is disabled. Configure `forwarder_storage_max_size_in_bytes` to enable it.

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 1f056

=====================
Orchestrator Explorer
=====================
  Collection Status: Clusterchecks are activated but still warming up, the collection could be running on CLC Runners. To verify that we need the clusterchecks to be warmed up.
  Cluster Name: ***
  Cluster ID: ****
  Container scrubbing: enabled

  ======================
  Orchestrator Endpoints
  ======================
    https://orchestrator.datadoghq.com - API Key ending with: *****

  Status: Follower, cluster agent leader is: datadog-cluster-agent-69bc84c5c-r6r98

Additional environment details (Operating System, Cloud provider, etc):
There is a support case 901101 but didn't make much progress

Steps to reproduce the issue:

I have istio installed in my cluster and I need some metrics from envoy level hence I configured below on the app pods to scrape the envoy metrics.

        ad.datadoghq.com/istio-proxy.check_names: '["envoy"]'
        ad.datadoghq.com/istio-proxy.init_configs: '[{}]'
        ad.datadoghq.com/istio-proxy.instances: |
            [
              {
                "openmetrics_endpoint": "http://%%host%%:15090/stats/prometheus",
                "histogram_buckets_as_distributions": "true",
                "log_requests": "true",
                "extra_metrics": 
                  [
                    {
                      "envoy_cluster_upstream_rq_time": 
                        {
                          "name": "cluster.upstream_rq_time"
                          "type": "histogram"
                        }
                    }
                  ]
              }
            ]

send some traffic from one pod to the other. From the metrics endpoint and prometheus

Describe the results you received:
I could find these metrics but in datadog explorer, I could not find them. Except for the 1st one, others are included in your metrics dict

cluster.upstream_rq_time
cluster.upstream_cx_rx_bytes_total
cluster.upstream_cx_tx_bytes_total
listener.downstream_cx_length_ms
cluster.upstream_rq_xx (raw metrics are with specific status code. I'm guess the agent will parse it?)
some metrics has the data but different from the raw metrics or prometheus scrapes. did datadog/query did some aggregation in the metrics explorer?
the support requested me to add 'status_url' but I guess it won't work for v2 integration?
some metrics 'type' are different from the type exposed from the pod. like the 'counter' is converted to 'rate'. Is this expected or somewhere has misconfiguration
Describe the results you expected:
scrape those metrics

Additional information you deem important (e.g. issue happens only occasionally):

The text was updated successfully, but these errors were encountered:

yzhan289 · 2022-09-09T19:50:56Z

Hi 👋 , taking a look at your configuration and the metrics missing, I think the issue is that you are using the OpenMetrics implementation of the check rather than the legacy check. Except for the first one, the other metrics are listed here: https://github.com/DataDog/integrations-core/blob/7.38.2/envoy/metadata.csv.

If you want to collect those legacy metrics, can you take a look at the legacy configuration found here: https://github.com/DataDog/integrations-core/tree/7.33.x/envoy

yzhan289 · 2022-09-12T14:18:16Z

Let us know if you run into any issues, but I'll close this issue for now!

Shuanglu · 2022-09-13T03:09:38Z

The 'legacy' metrics you mentioned are listed in 'PROMETHEUS_METRICS_MAP'. Do they still need 'legacy check'?
In addition, if only legacy check can work, does it mean the v2 integration collect different metrics than v1?

burningalchemist · 2022-11-03T23:30:54Z

@yzhan289 having the same issue I bet extra_metrics field is non-effective. I believe envoy_cluster_upstream_rq_time is important to have as a part of the integration to balance the existing envoy.http.downstream_rq_time while staying with openmetrics_endpoint. Would you mind reopening the issue?

@Shuanglu in the meantime did you find a solution?

Shuanglu · 2022-11-04T01:20:10Z

@yzhan289 having the same issue I bet extra_metrics field is non-effective. I believe envoy_cluster_upstream_rq_time is important to have as a part of the integration to balance the existing envoy.http.downstream_rq_time while staying with openmetrics_endpoint. Would you mind reopening the issue?

@Shuanglu in the meantime did you find a solution?

nope... currently we use dogstatsd to submit our metrics...

burningalchemist · 2022-11-04T16:49:11Z

@Shuanglu Oh I see, thanks. Yeah, that's a blocker for sure. 🤔

yzhan289 · 2022-11-04T18:58:32Z

Hi, thanks @Shuanglu and @burningalchemist for bringing this up. I'll open a ticket internally to investigate the missing metrics and reopen this ticket. We will update this card if there are any new changes!

yzhan289 · 2022-11-04T19:20:09Z

For listener.downstream_cx_length_ms, are you able to get listener.downstream_cx_length_ms.count? Looks like we are transforming the metrics matching downstream_cx with a .count: https://github.com/DataDog/integrations-core/blob/7.38.2/envoy/datadog_checks/envoy/check.py#L82-L92. It doesn't look like this is happening for any metrics with upstream_cx, so that may be something we need to look into.

burningalchemist · 2022-11-05T19:01:42Z

Hey @yzhan289, yes, listener.downstream_cx_length_ms.count works well 👍 It seems that upstream_rq_time metric is simply ignored.

I'd also check if we are talking about the same thing as you're mentioning connection upstream metrics while I'm referring to request upstream metrics. I've shared the link below:
https://www.envoyproxy.io/docs/envoy/latest/configuration/upstream/cluster_manager/cluster_stats#dynamic-http-statistics

burningalchemist · 2022-11-18T15:21:28Z

Hey @yzhan289, any updates on the issue? 🙂

yzhan289 · 2022-11-18T20:52:54Z

Hey @burningalchemist , unfortunately we don't have any updates on this.

burningalchemist · 2023-01-17T21:10:56Z

@yzhan289, I've managed to make the extra_metrics field in annotations work using one of the latest releases, also got required hints from the linked PR. 👍 I think the issue can be closed.

@Shuanglu, let me know if you need any help or something, happy to share. 😃

yzhan289 · 2023-01-18T15:47:00Z

@burningalchemist Yay glad to hear! I will close this issue now.

yzhan289 closed this as completed Sep 12, 2022

yzhan289 reopened this Nov 4, 2022

steveny91 mentioned this issue Jan 5, 2023

Fix metric mapping of counter metrics in the Openmetrics V2 version of the check #13573

Merged

yzhan289 closed this as completed Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[envoy integration] Metrics missing #12855

[envoy integration] Metrics missing #12855

Shuanglu commented Sep 2, 2022 •

edited

Loading

yzhan289 commented Sep 9, 2022

yzhan289 commented Sep 12, 2022

Shuanglu commented Sep 13, 2022

burningalchemist commented Nov 3, 2022

Shuanglu commented Nov 4, 2022

burningalchemist commented Nov 4, 2022

yzhan289 commented Nov 4, 2022

yzhan289 commented Nov 4, 2022

burningalchemist commented Nov 5, 2022 •

edited

Loading

burningalchemist commented Nov 18, 2022

yzhan289 commented Nov 18, 2022

burningalchemist commented Jan 17, 2023

yzhan289 commented Jan 18, 2023

[envoy integration] Metrics missing #12855

[envoy integration] Metrics missing #12855

Comments

Shuanglu commented Sep 2, 2022 • edited Loading

yzhan289 commented Sep 9, 2022

yzhan289 commented Sep 12, 2022

Shuanglu commented Sep 13, 2022

burningalchemist commented Nov 3, 2022

Shuanglu commented Nov 4, 2022

burningalchemist commented Nov 4, 2022

yzhan289 commented Nov 4, 2022

yzhan289 commented Nov 4, 2022

burningalchemist commented Nov 5, 2022 • edited Loading

burningalchemist commented Nov 18, 2022

yzhan289 commented Nov 18, 2022

burningalchemist commented Jan 17, 2023

yzhan289 commented Jan 18, 2023

Shuanglu commented Sep 2, 2022 •

edited

Loading

burningalchemist commented Nov 5, 2022 •

edited

Loading