[Metricbeat] Adding AWS Cloudwatch Metricset #11734

kaiyan-sheng · 2019-04-09T22:50:06Z

Amazon CloudWatch monitors AWS resources and applications that run on AWS. A lot of services (eg: EC2, RDS, SQS...) sends monitoring metrics to Cloudwatch periodically for users to monitor/determine the health and performance of their resources. We are in the process of adding metricsets for services but since there are so many, it will be good to have a "free-form" cloudwatch metricset to get monitoring metrics from a user-defined service even if this service is not supported by a separate metricset yet.

Basic idea for cloudwatch metricset is to read from config file aws.yml for period, start-time, end-time, and metric-data-queries parameters and then pass them into get-metric-data cloudwatch api, similar to use https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-data.html.
period is how frequent this cloudwatch metricset will start a new collection cycle.
start-time is the timestamp indicating the earliest data to be returned.
end-time is the time stamp indicating the latest data to be returned.
metric-data-queries is the location of the json file that contains what are the metric queries to be returned. For example:

[
    {
        "Id": "e1",
        "Expression": "m1 / m2",
        "Label": "ErrorRate"
    },
    {
        "Id": "m1",
        "MetricStat": {
            "Metric": {
                "Namespace": "MyApplication",
                "MetricName": "Errors",
                "Dimensions": [
                    {
                        "Name": "FunctionName",
                        "Value": "MyFunc"
                    }
                ]
            },
            "Period": 300,
            "Stat": "Sum",
            "Unit": "Count"
        },
        "ReturnData": false
    },
    {
        "Id": "m2",
        "MetricStat": {
            "Metric": {
                "Namespace": "MyApplication",
                "MetricName": "Invocations",
                "Dimensions": [
                    {
                        "Name": "FunctionName",
                        "Value": "MyFunc"
                    }
                ]
            },
            "Period": 300,
            "Stat": "Sum",
            "Unit": "Count"
        },
        "ReturnData": false
    }
]

After this query succeed, some process needs to be added to convert get-metric-data output to events that eventually pushed into ES. There might be some schema/mapping needs to be read from aws.yml for this as well.

In the current existing metricsets, we are using list-metrics to get a list of metrics for a specific namespace. This list of metrics can be used as input to GetMetricData replacing metric-data-queries parameter.

Metricbeat Module / Dataset release checklist

This checklist is intended for Devs which create or update a module to make sure modules are consistent.

Modules

For a metricset to go GA, the following criteria should be met:

Metricbeat module

Example data.json exists and an automated way to generate it exists (go test -data)
Test environment in Docker exist for integration tests

The text was updated successfully, but these errors were encountered:

kaiyan-sheng · 2019-04-09T22:51:00Z

@roncohen and @exekias Please feel free to comment on this 😄 Thanks for the great idea.

exekias · 2019-04-10T08:51:13Z

I think this metricset would be really useful! I have a few questions:

Perhaps we can handle the time parameters (start-time, end-time and period) for the user? we already have a period in all metricsets. start-time and end-time would be adjusted to get the last point in each run?
Instead of using a file with the queries, it would be possible to put them in the config, as YAML (YAML is a superset of JSON). ie:

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: MyApplication
      metricname: Invocations
      dimensions:
        - name: FunctionName
          value: MyFunc

kaiyan-sheng · 2019-04-10T14:21:26Z

@exekias Yeah end-time is probably not necessary at all. If we support start-time, then user can give a start-time that can be in the past. So when the first collection starts, cloudwatch metricset can pick up all the metrics from start-time to now first. But that might introduce some potential performance issues. For example, if the start-time is too old, then there might be too many metrics for the first collection.

Good suggestion on the config 👍

kaiyan-sheng · 2019-04-10T14:31:44Z

If we use list-metrics to get all metrics that are available for GetMetricData, it will save users a lot of time/effort to generate the config but maybe won't be has handy if users have specific metrics to query in mind.
list-metrics --namespace "AWS/SNS" output:

{
    "Metrics": [
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "PublishSize"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "PublishSize"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfNotificationsFailed"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfNotificationsDelivered"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfMessagesPublished"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfMessagesPublished"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfNotificationsDelivered"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfNotificationsFailed"
        }
    ]
}

exekias · 2019-04-11T09:52:54Z

It sounds to me that some people will be interested in catching all metrics from a namespace, so we could allow for full namespace retrieval when no metricname is defined, I would expect this to be more common than cherry picking just a few metrics.

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: AWS/SNS

kaiyan-sheng · 2019-04-11T14:31:23Z

@exekias Yep, I agree. It will be a much easier setup for users (when extra api query/data transfer cost is not a problem).

roncohen · 2019-04-15T14:44:42Z

great to get this effort started @kaiyan-sheng!

Some ideas:

reg. start time: could we query for the metric in ES to get the most recent datapoint and continue from there? We could default to pulling in the last 1h if there's no existing data for a metric. metricbeat might get restarted and it would be great to continue from where it got to.
could we have a list of default periods for each namespace? e.g. S3 looks like it should default to 1 day. Should be overridable in the config.

kaiyan-sheng · 2019-04-15T19:34:57Z

Good point @roncohen! For your second idea, I ran into this problem in #11798 just now. For each namespace, the collection period should be different.
Currently the configuration looks like:

- module: aws
  period: 86400s
  metricsets:
    - "cloudwatch"
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'
  default_region: '${AWS_REGION:us-west-1}'
  cloudwatch_metrics:
    - namespace: AWS/S3

First question is: if we should support cloudwatch metricset to config/collect from more than one namespace.
Then the configuration will look like:

- module: aws
  period: 300s
  metricsets:
    - "cloudwatch"
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'
  default_region: '${AWS_REGION:us-west-1}'
  cloudwatch_metrics:
    - namespace: AWS/EC2
    - namespace: AWS/S3
    - namespace: AWS/SQS

Second question is: if we support multiple namespaces like the config above, then we need to have default period for each namespace hardcoded in the code. For example, for AWS/EC2 it will be 300s, for AWS/S3 will be 86400s and for AWS/SQS will be 300s.

We can introduce a separate period config for each namespace but I feel that just make everything confusing. cloudwatch as a metricset should only have one period, which is the frequency that this metricset will run. @exekias What's your thought on this?

exekias · 2019-04-16T08:17:55Z

I think supporting multiple namespaces is useful in many cases, specially when the period is the same. As period is a default setting, I wouldn't introduce a new config key for this. Users can always configure several instances of the metricset when they require different periods per namespace.

exekias · 2019-04-16T08:20:01Z

btw, for readability, period should accept other units, so something like 24h should work

kaiyan-sheng self-assigned this Apr 9, 2019

kaiyan-sheng mentioned this issue Apr 9, 2019

AWS Metricbeat Module #10115

Closed

7 tasks

kaiyan-sheng added the Team:Integrations Label for the Integrations team label Apr 9, 2019

exekias added the discuss Issue needs further discussion. label Apr 10, 2019

kaiyan-sheng mentioned this issue Apr 12, 2019

[Metricbeat] Adding cloudwatch metricset into AWS module #11798

Merged

kaiyan-sheng added the [zube]: In Progress label Apr 15, 2019

kaiyan-sheng added [zube]: In Review and removed [zube]: In Progress labels Apr 22, 2019

alvarolobato removed the discuss Issue needs further discussion. label Apr 23, 2019

kaiyan-sheng closed this as completed May 2, 2019

kaiyan-sheng added [zube]: Done and removed [zube]: In Review labels May 2, 2019

andresrc removed the [zube]: Done label May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat] Adding AWS Cloudwatch Metricset #11734

[Metricbeat] Adding AWS Cloudwatch Metricset #11734

kaiyan-sheng commented Apr 9, 2019

kaiyan-sheng commented Apr 9, 2019 •

edited

Loading

exekias commented Apr 10, 2019

kaiyan-sheng commented Apr 10, 2019

kaiyan-sheng commented Apr 10, 2019

exekias commented Apr 11, 2019

kaiyan-sheng commented Apr 11, 2019

roncohen commented Apr 15, 2019 •

edited

Loading

kaiyan-sheng commented Apr 15, 2019

exekias commented Apr 16, 2019

exekias commented Apr 16, 2019

[Metricbeat] Adding AWS Cloudwatch Metricset #11734

[Metricbeat] Adding AWS Cloudwatch Metricset #11734

Comments

kaiyan-sheng commented Apr 9, 2019

Metricbeat Module / Dataset release checklist

Modules

Metricbeat module

kaiyan-sheng commented Apr 9, 2019 • edited Loading

exekias commented Apr 10, 2019

kaiyan-sheng commented Apr 10, 2019

kaiyan-sheng commented Apr 10, 2019

exekias commented Apr 11, 2019

kaiyan-sheng commented Apr 11, 2019

roncohen commented Apr 15, 2019 • edited Loading

kaiyan-sheng commented Apr 15, 2019

exekias commented Apr 16, 2019

exekias commented Apr 16, 2019

kaiyan-sheng commented Apr 9, 2019 •

edited

Loading

roncohen commented Apr 15, 2019 •

edited

Loading