Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CW Agent not recognizing enhanced_container_insights=true using EKS/ConfigMap #1030

Closed
kangadrewie opened this issue Feb 11, 2024 · 2 comments

Comments

@kangadrewie
Copy link

Describe the bug
Recently, the eks-charts team supported passing enhanced_container_insights into the CW Agent ConfigMap (aws/eks-charts#1041)

However, it appears despite my ConfigMap being updated to have Enhanced Observability turned on, the CW agent is not recognising it. Taking a look at the config translator config, none of the if enhancedContainerInsightsEnabled { conditions succeed. So, no enhanced metrics are added to the awsemf/containerinsights exporter config.

Is anyone able to reproduce this issue on EKS?

Steps to reproduce
Use v.0.0.10 for Enhanced Observability metrics in EKS - aws/eks-charts#1041

What did you expect to see?
Enhanced Observability metrics being pushed to CW / Agent's OTEL config updated when enhanced_container_insights=true

What did you see instead?
Default Kubernetes Container Insights config being loaded.

What version did you use?
Version: v1.300032.3b392

What config did you use?
Config: (e.g. the agent json config file)

Environment
OS: (e.g., "Ubuntu 20.04")

Additional context

ConfigMap

apiVersion: v1
data:
  cwagentconfig.json: |
    {
      "logs": {
        "metrics_collected": {
          "kubernetes": {
            "cluster_name": "<cluster-name>",
            "enhanced_container_insights": "true",
            "metrics_collection_interval": 60
          }
        },
        "force_flush_interval": 5
      }
    }
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: aws-cloudwatch-metrics
    meta.helm.sh/release-namespace: kube-addons
  creationTimestamp: "2024-02-10T18:21:50Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: aws-cloudwatch-metrics
    app.kubernetes.io/version: 1.300032.2b361
    helm.sh/chart: aws-cloudwatch-metrics-0.0.10
  name: aws-cloudwatch-metrics
  namespace: kube-addons

Cloudwatch agent pod logs

D! [EC2] Found active network interface
I! imds retry client will retry 1 timesI! Detected the instance is EC2
2024/02/11 13:15:15 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json ...
/opt/aws/amazon-cloudwatch-agent/bin/default_linux_config.json does not exist or cannot read. Skipping it.
2024/02/11 13:15:15 Reading json config file path: /etc/cwagentconfig/..2024_02_11_13_15_14.2575907930/cwagentconfig.json ...
2024/02/11 13:15:15 Find symbolic link /etc/cwagentconfig/..data
2024/02/11 13:15:15 Find symbolic link /etc/cwagentconfig/cwagentconfig.json
2024/02/11 13:15:15 Reading json config file path: /etc/cwagentconfig/cwagentconfig.json ...
2024/02/11 13:15:15 I! Valid Json input schema.
I! Trying to detect region from ec2
2024/02/11 13:15:15 I! attempt to access ECS task metadata to determine whether I'm running in ECS.
2024/02/11 13:15:16 W! retry [0/3], unable to get http response from http://<>/v2/metadata, error: unable to get response from http://<>/v2/metadata, error: Get "http://<>/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024/02/11 13:15:17 W! retry [1/3], unable to get http response from http://<>/v2/metadata, error: unable to get response from http://<>/v2/metadata, error: Get "http://<>/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024/02/11 13:15:18 W! retry [2/3], unable to get http response from http://<>/v2/metadata, error: unable to get response from http://<>/v2/metadata, error: Get "http://<>/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2024/02/11 13:15:18 I! access ECS task metadata fail with response unable to get response from http://<>/v2/metadata, error: Get "http://<>/v2/metadata": context deadline exceeded (Client.Timeout exceeded while awaiting headers), assuming I'm not running in ECS.
2024/02/11 13:15:18 Configuration validation first phase succeeded
2024/02/11 13:15:18 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2024/02/11 13:15:18 D! config [agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = "ip-<>.ec2.internal"
  interval = "60s"
  logfile = ""
  logtarget = "lumberjack"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = true
  precision = ""
  quiet = false
  round_interval = false

[outputs]

  [[outputs.cloudwatchlogs]]
    force_flush_interval = "5s"
    log_stream_name = "ip-<>.ec2.internal"
    mode = "EC2"
    region = "us-east-1"
    region_type = "EC2M"
2024/02/11 13:15:18 I! Config has been translated into YAML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.yaml
2024/02/11 13:15:18 D! config connectors: {}
exporters:
    awsemf/containerinsights:
        certificate_file_path: ""
        detailed_metrics: false
        dimension_rollup_option: NoDimensionRollup
        disable_metric_extraction: false
        eks_fargate_container_insights_enabled: false
        endpoint: ""
        enhanced_container_insights: false
        imds_retries: 1
        local_mode: false
        log_group_name: /aws/containerinsights/{ClusterName}/performance
        log_retention: 0
        log_stream_name: '{NodeName}'
        max_retries: 2
        metric_declarations:
            - dimensions:
                - - ClusterName
                  - Namespace
                  - PodName
                - - ClusterName
                - - ClusterName
                  - Namespace
                  - Service
                - - ClusterName
                  - Namespace
              label_matchers: []
              metric_name_selectors:
                - pod_cpu_utilization
                - pod_memory_utilization
                - pod_network_rx_bytes
                - pod_network_tx_bytes
                - pod_cpu_utilization_over_pod_limit
                - pod_memory_utilization_over_pod_limit
            - dimensions:
                - - ClusterName
                  - Namespace
                  - PodName
              label_matchers: []
              metric_name_selectors:
                - pod_number_of_container_restarts
            - dimensions:
                - - ClusterName
                  - Namespace
                  - PodName
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - pod_cpu_reserved_capacity
                - pod_memory_reserved_capacity
            - dimensions:
                - - ClusterName
                  - InstanceId
                  - NodeName
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - node_cpu_utilization
                - node_memory_utilization
                - node_network_total_bytes
                - node_cpu_reserved_capacity
                - node_memory_reserved_capacity
                - node_number_of_running_pods
                - node_number_of_running_containers
            - dimensions:
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - node_cpu_usage_total
                - node_cpu_limit
                - node_memory_working_set
                - node_memory_limit
            - dimensions:
                - - ClusterName
                  - InstanceId
                  - NodeName
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - node_filesystem_utilization
            - dimensions:
                - - ClusterName
                  - Namespace
                  - Service
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - service_number_of_running_pods
            - dimensions:
                - - ClusterName
                  - Namespace
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - namespace_number_of_running_pods
            - dimensions:
                - - ClusterName
              label_matchers: []
              metric_name_selectors:
                - cluster_node_count
                - cluster_failed_node_count
        metric_descriptors: []
        middleware: agenthealth/logs
        namespace: ContainerInsights
        no_verify_ssl: false
        num_workers: 8
        output_destination: cloudwatch
        parse_json_encoded_attr_values:
            - Sources
            - kubernetes
        profile: ""
        proxy_address: ""
        region: us-east-1
        request_timeout_seconds: 30
        resource_arn: ""
        resource_to_telemetry_conversion:
            enabled: true
        retain_initial_value_of_delta_metric: false
        role_arn: ""
        shared_credentials_file: []
        version: "0"
extensions:
    agenthealth/logs:
        is_usage_data_enabled: true
        stats:
            operations:
                - PutLogEvents
processors:
    batch/containerinsights:
        metadata_cardinality_limit: 1000
        metadata_keys: []
        send_batch_max_size: 0
        send_batch_size: 8192
        timeout: 5s
receivers:
    awscontainerinsightreceiver:
        add_container_name_metric_label: false
        add_full_pod_name_metric_label: false
        add_service_as_attribute: true
        certificate_file_path: ""
        cluster_name: <cluster-name>
        collection_interval: 1m0s
        container_orchestrator: eks
        enable_control_plane_metrics: false
        endpoint: ""
        imds_retries: 1
        leader_lock_name: cwagent-clusterleader
        leader_lock_using_config_map_only: true
        local_mode: false
        max_retries: 0
        no_verify_ssl: false
        num_workers: 0
        prefer_full_pod_name: false
        profile: ""
        proxy_address: ""
        region: us-east-1
        request_timeout_seconds: 0
        resource_arn: ""
        role_arn: ""
        shared_credentials_file: []
service:
    extensions:
        - agenthealth/logs
    pipelines:
        metrics/containerinsights:
            exporters:
                - awsemf/containerinsights
            processors:
                - batch/containerinsights
            receivers:
                - awscontainerinsightreceiver
    telemetry:
        logs:
            development: false
            disable_caller: false
            disable_stacktrace: false
            encoding: console
            error_output_paths: []
            initial_fields: {}
            level: info
            output_paths: []
            sampling:
                enabled: true
                initial: 2
                thereafter: 500
                tick: 10s
        metrics:
            address: ""
            level: None
            readers: []
        resource: {}
        traces:
            processors: []
            propagators: []
2024-02-11T13:15:18Z I! Starting AmazonCloudWatchAgent CWAgent/1.300032.3b392 (go1.21.5; linux; amd64) with log file  with log target lumberjack
@kangadrewie
Copy link
Author

Not a CW Agent issue -- is a config issue in the helm chart, PR referenced above to fix.

@ivan-sukhomlyn
Copy link

@kangadrewie, thanks for taking care of that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants