[awsemfexporter] Group exported metrics by labels #2317

mxiamxia · 2021-02-10T01:10:45Z

This PR is the 2nd part of splitting #1891 which was originally done by @kohrapha.
Currently, each incoming metric is pushed to CloudWatch logs as a separate log. However, many metrics share the same labels so this results in a lot of duplicate data. To solve this, this PR implements batching of metrics by their labels such that metrics with the same set of labels will be exported together.

Specifically, metrics are batched together if they have the same:

label names + values
namespace
timestamp
log group name
log stream name

The batched metrics are further split up if metric_declarations are defined. Currently, the filtered metrics are split up by the metric declaration rules they match. Since they have the same labels, they will have the same dimensions if they match the same metric declaration rules.
Caveat: 2 groups of filtered metrics can still share the same dimension sets if their metric declarations result in the same dimension set. We currently don't perform this check to group the 2 groups together.

Implementation Details

Since this PR includes a lot of refactoring, I will give an overview of how the new metric translation logic works. Given a list of ResourceMetrics via emfExporter.pushMetricsData,

For each ResourceMetrics in the list, we will add its metrics into groupedMetrics (a map consisting of batched metrics).
For each metric within the ResourceMetrics, we create a CWMetricMetadata which consists of metadata (i.e. namespace, timestamp, log group, log stream, instrumentation library name) associated with the given metric. This will be added to groupedMetrics for future processing.
We extract the DataPoints from each metric. For each DataPoint, we define its "group key" using its labels, namespace, timestamp, log group, and log stream. We use this group key to add the metric to its corresponding group in groupedMetrics.
After translating all OT Metrics into groupedMetrics, we iterate through each group and translate it into CWMetric. In this stage, we will filter out metrics if there are metric declarations defined and set the dimensions for exported metrics (w/ rolled-up dimensions).
Finally, we translate the CWMetric into an EMF log and push it to CloudWatch using the appropriate log group and log stream found in the group's CWMetricMetadata.

Testing:
Tests were added for new functions and tests for modified functions were updated. Additionally, this PR was tested in a sample environment using an NGINX server on EKS. Given the following config (same as in #2):

exporters:
  awsemf:
    log_group_name: 'awscollector-test'
    region: 'us-west-2'
    log_stream_name: metric-declarations
    dimension_rollup_option: 'NoDimensionRollup'
    metric_declarations:
    - dimensions: [['Service', 'Namespace'], ['pod_name', 'container_name']]
      metric_name_selectors:
      - '^go_memstats_alloc_bytes_total$'
    - dimensions: [['app_kubernetes_io_component', 'Namespace'], ['app_kubernetes_io_name'], ['Invalid', 'Namespace']]
      metric_name_selectors:
      - '^go_goroutines$'
    - dimensions: [['Namespace', 'app_kubernetes_io_component', 'Namespace']]
      metric_name_selectors:
      - '^go_.+$'

we get the following cases:

batch with matched metrics

{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "_aws": {
        "CloudWatchMetrics": [
            {
                "Namespace": "kubernetes-service-endpoints",
                "Dimensions": [
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_memstats_heap_alloc_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_threads",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_alloc_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_gc_cpu_fraction",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_heap_released_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_mcache_inuse_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_objects",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_last_gc_time_seconds",
                        "Unit": "s"
                    },
                    {
                        "Name": "go_memstats_mcache_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_frees_total",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_stack_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_buck_hash_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_idle_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_lookups_total",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_mallocs_total",
                        "Unit": ""
                    },
                    {
                        "Name": "go_memstats_mspan_inuse_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_next_gc_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_other_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_gc_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_heap_inuse_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_mspan_sys_bytes",
                        "Unit": "By"
                    },
                    {
                        "Name": "go_memstats_stack_inuse_bytes",
                        "Unit": "By"
                    }
                ]
            },
            {
                "Namespace": "kubernetes-service-endpoints",
                "Dimensions": [
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ],
                    [
                        "app_kubernetes_io_name"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_goroutines",
                        "Unit": ""
                    }
                ]
            },
            {
                "Namespace": "kubernetes-service-endpoints",
                "Dimensions": [
                    [
                        "Namespace",
                        "Service"
                    ],
                    [
                        "container_name",
                        "pod_name"
                    ],
                    [
                        "Namespace",
                        "app_kubernetes_io_component"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "go_memstats_alloc_bytes_total",
                        "Unit": ""
                    }
                ]
            }
        ],
        "Timestamp": 1606931694465
    },
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "go_goroutines": 89,
    "go_memstats_alloc_bytes": 8168512,
    "go_memstats_alloc_bytes_total": 78897.33333333333,
    "go_memstats_buck_hash_sys_bytes": 1504910,
    "go_memstats_frees_total": 939.7833333333333,
    "go_memstats_gc_cpu_fraction": 0.000016842131408600387,
    "go_memstats_gc_sys_bytes": 5698672,
    "go_memstats_heap_alloc_bytes": 8168512,
    "go_memstats_heap_idle_bytes": 54452224,
    "go_memstats_heap_inuse_bytes": 10690560,
    "go_memstats_heap_objects": 58592,
    "go_memstats_heap_released_bytes": 51896320,
    "go_memstats_heap_sys_bytes": 65142784,
    "go_memstats_last_gc_time_seconds": 1606931634.4573667,
    "go_memstats_lookups_total": 0,
    "go_memstats_mallocs_total": 866.4166666666666,
    "go_memstats_mcache_inuse_bytes": 3472,
    "go_memstats_mcache_sys_bytes": 16384,
    "go_memstats_mspan_inuse_bytes": 149192,
    "go_memstats_mspan_sys_bytes": 229376,
    "go_memstats_next_gc_bytes": 12224112,
    "go_memstats_other_sys_bytes": 760066,
    "go_memstats_stack_inuse_bytes": 1966080,
    "go_memstats_stack_sys_bytes": 1966080,
    "go_memstats_sys_bytes": 75318272,
    "go_threads": 15,
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "kubernetes_node": "ip-192-168-46-33.us-west-2.compute.internal",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-ld9wg",
    "process_cpu_seconds_total": 0.0016666666666666757,
    "process_max_fds": 1048576,
    "process_open_fds": 38,
    "process_resident_memory_bytes": 46612480,
    "process_start_time_seconds": 1606928481.44,
    "process_virtual_memory_bytes": 761430016,
    "process_virtual_memory_max_bytes": -1,
    "promhttp_metric_handler_requests_in_flight": 1
}

batch with no matched metrics

{
    "Namespace": "eks-aoc",
    "Service": "my-nginx-ingress-nginx-controller-metrics",
    "app_kubernetes_io_component": "controller",
    "app_kubernetes_io_instance": "my-nginx",
    "app_kubernetes_io_managed_by": "Helm",
    "app_kubernetes_io_name": "ingress-nginx",
    "app_kubernetes_io_version": "0.40.2",
    "container_name": "controller",
    "controller_class": "nginx",
    "controller_namespace": "eks-aoc",
    "controller_pod": "my-nginx-ingress-nginx-controller-77d5fd6977-ld9wg",
    "helm_sh_chart": "ingress-nginx-3.7.1",
    "host": "a7710ecaa12b540be99c5bfd5ee07a1f-266546424.us-west-2.elb.amazonaws.com",
    "ingress": "ingress-nginx-demo",
    "kubernetes_node": "ip-192-168-46-33.us-west-2.compute.internal",
    "method": "GET",
    "namespace": "eks-traffic",
    "nginx_ingress_controller_bytes_sent": {
        "Max": 10000000,
        "Min": 10,
        "Count": 114,
        "Sum": 21888
    },
    "nginx_ingress_controller_request_duration_seconds": {
        "Max": 10,
        "Min": 0.005,
        "Count": 114,
        "Sum": 0.029000000000000026
    },
    "nginx_ingress_controller_request_size": {
        "Max": 100,
        "Min": 10,
        "Count": 114,
        "Sum": 15960
    },
    "nginx_ingress_controller_response_duration_seconds": {
        "Max": 10,
        "Min": 0.005,
        "Count": 114,
        "Sum": 0.020000000000000018
    },
    "nginx_ingress_controller_response_size": {
        "Max": 10,
        "Min": 0.005,
        "Count": 114,
        "Sum": 21888
    },
    "path": "/banana",
    "pod_name": "my-nginx-ingress-nginx-controller-77d5fd6977-ld9wg",
    "service": "banana-service",
    "status": "200"
}

codecov · 2021-02-10T01:29:36Z

Codecov Report

Merging #2317 (868dc42) into main (e43c235) will increase coverage by 1.02%.
The diff coverage is 99.04%.

@@            Coverage Diff             @@
##             main    #2317      +/-   ##
==========================================
+ Coverage   72.72%   73.75%   +1.02%     
==========================================
  Files         410      412       +2     
  Lines       25355    25475     +120     
==========================================
+ Hits        18440    18789     +349     
+ Misses       6368     6133     -235     
- Partials      547      553       +6

Flag	Coverage Δ
integration	`69.26% <ø> (?)`
unit	`72.77% <99.04%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
exporter/awsemfexporter/metric_translator.go	`97.70% <98.19%> (-0.79%)`	⬇️
exporter/awsemfexporter/datapoint.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/emf_exporter.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/grouped_metric.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/metric_declaration.go	`100.00% <100.00%> (ø)`
exporter/awsemfexporter/util.go	`100.00% <100.00%> (+5.55%)`	⬆️
internal/common/testing/container/container.go	`73.68% <0.00%> (ø)`
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e43c235...868dc42. Read the comment docs.

receiver/stanzareceiver/go.sum

exporter/awsemfexporter/metric_translator.go

exporter/awsemfexporter/metric_declaration.go

shaochengwang

Thanks

mxiamxia · 2021-02-17T19:11:37Z

@bogdandrutu Kindly ping for reviewing and merge. Thanks.

tigrannajaryan · 2021-02-18T16:10:34Z

@mxiamxia please resolve comments that are addressed and resolve merge conflicts.

bogdandrutu · 2021-02-19T18:07:27Z

@mxiamxia this needs a rebase

mxiamxia · 2021-02-21T06:31:55Z

Rebased the commits and Resolved the conflicts.

…est (#2317)

mxiamxia requested review from anuraaga and djaglowski as code owners February 10, 2021 01:10

mxiamxia requested a review from a team February 10, 2021 01:10

github-actions bot assigned bogdandrutu Feb 10, 2021

shaochengwang reviewed Feb 11, 2021

View reviewed changes

receiver/stanzareceiver/go.sum Outdated Show resolved Hide resolved

shaochengwang reviewed Feb 11, 2021

View reviewed changes

exporter/awsemfexporter/metric_translator.go Show resolved Hide resolved

exporter/awsemfexporter/metric_declaration.go Outdated Show resolved Hide resolved

mxiamxia force-pushed the batch_metrics branch 7 times, most recently from 780abbb to 32be859 Compare February 15, 2021 21:51

shaochengwang approved these changes Feb 16, 2021

View reviewed changes

mxiamxia force-pushed the batch_metrics branch 2 times, most recently from e229569 to ce8e948 Compare February 16, 2021 01:16

anuraaga approved these changes Feb 16, 2021

View reviewed changes

mxiamxia force-pushed the batch_metrics branch 7 times, most recently from 60f0eb4 to a38f82c Compare February 21, 2021 05:30

mxiamxia force-pushed the batch_metrics branch from a38f82c to 6078537 Compare February 21, 2021 05:31

mxiamxia requested a review from jpkrohling as a code owner February 21, 2021 05:31

Batch the metrics with the same dimensions into the same EMF log request

868dc42

mxiamxia force-pushed the batch_metrics branch from 6078537 to 868dc42 Compare February 21, 2021 06:29

bogdandrutu merged commit 0761ee3 into open-telemetry:main Feb 22, 2021

mxiamxia mentioned this pull request Mar 11, 2021

REQUEST: New membership for mxiamxia open-telemetry/community#677

Closed

6 tasks

pmatyjasek-sumo referenced this pull request in pmatyjasek-sumo/opentelemetry-collector-contrib Apr 28, 2021

Batch the metrics with the same dimensions into the same EMF log requ…

e5aaa53

…est (#2317)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[awsemfexporter] Group exported metrics by labels #2317

[awsemfexporter] Group exported metrics by labels #2317

mxiamxia commented Feb 10, 2021

codecov bot commented Feb 10, 2021 •

edited

Loading

shaochengwang left a comment

mxiamxia commented Feb 17, 2021

tigrannajaryan commented Feb 18, 2021

bogdandrutu commented Feb 19, 2021

mxiamxia commented Feb 21, 2021

[awsemfexporter] Group exported metrics by labels #2317

[awsemfexporter] Group exported metrics by labels #2317

Conversation

mxiamxia commented Feb 10, 2021

Implementation Details

codecov bot commented Feb 10, 2021 • edited Loading

Codecov Report

shaochengwang left a comment

Choose a reason for hiding this comment

mxiamxia commented Feb 17, 2021

tigrannajaryan commented Feb 18, 2021

bogdandrutu commented Feb 19, 2021

mxiamxia commented Feb 21, 2021

codecov bot commented Feb 10, 2021 •

edited

Loading