Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] Adding AWS Cloudwatch Metricset #11734

Closed
13 tasks
kaiyan-sheng opened this issue Apr 9, 2019 · 10 comments
Closed
13 tasks

[Metricbeat] Adding AWS Cloudwatch Metricset #11734

kaiyan-sheng opened this issue Apr 9, 2019 · 10 comments
Assignees
Labels
Team:Integrations Label for the Integrations team

Comments

@kaiyan-sheng
Copy link
Contributor

Amazon CloudWatch monitors AWS resources and applications that run on AWS. A lot of services (eg: EC2, RDS, SQS...) sends monitoring metrics to Cloudwatch periodically for users to monitor/determine the health and performance of their resources. We are in the process of adding metricsets for services but since there are so many, it will be good to have a "free-form" cloudwatch metricset to get monitoring metrics from a user-defined service even if this service is not supported by a separate metricset yet.

Basic idea for cloudwatch metricset is to read from config file aws.yml for period, start-time, end-time, and metric-data-queries parameters and then pass them into get-metric-data cloudwatch api, similar to use https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/get-metric-data.html.
period is how frequent this cloudwatch metricset will start a new collection cycle.
start-time is the timestamp indicating the earliest data to be returned.
end-time is the time stamp indicating the latest data to be returned.
metric-data-queries is the location of the json file that contains what are the metric queries to be returned. For example:

[
    {
        "Id": "e1",
        "Expression": "m1 / m2",
        "Label": "ErrorRate"
    },
    {
        "Id": "m1",
        "MetricStat": {
            "Metric": {
                "Namespace": "MyApplication",
                "MetricName": "Errors",
                "Dimensions": [
                    {
                        "Name": "FunctionName",
                        "Value": "MyFunc"
                    }
                ]
            },
            "Period": 300,
            "Stat": "Sum",
            "Unit": "Count"
        },
        "ReturnData": false
    },
    {
        "Id": "m2",
        "MetricStat": {
            "Metric": {
                "Namespace": "MyApplication",
                "MetricName": "Invocations",
                "Dimensions": [
                    {
                        "Name": "FunctionName",
                        "Value": "MyFunc"
                    }
                ]
            },
            "Period": 300,
            "Stat": "Sum",
            "Unit": "Count"
        },
        "ReturnData": false
    }
]

After this query succeed, some process needs to be added to convert get-metric-data output to events that eventually pushed into ES. There might be some schema/mapping needs to be read from aws.yml for this as well.

In the current existing metricsets, we are using list-metrics to get a list of metrics for a specific namespace. This list of metrics can be used as input to GetMetricData replacing metric-data-queries parameter.

Metricbeat Module / Dataset release checklist

This checklist is intended for Devs which create or update a module to make sure modules are consistent.

Modules

For a metricset to go GA, the following criteria should be met:

  • Supported versions are documented
  • Supported operating systems are documented (if applicable)
  • Integration tests exist
  • System tests exist
  • Automated checks that all fields are documented
  • Documentation
  • Fields follow ECS and naming conventions
  • Dashboards exists (if applicable)
  • Kibana Home Tutorial (if applicable)
    • Open issue in EUI repo to add icon for module if not already exists.
    • Open PR against Kibana repo with tutorial. Examples can be found here.

Metricbeat module

  • Example data.json exists and an automated way to generate it exists (go test -data)
  • Test environment in Docker exist for integration tests
@kaiyan-sheng
Copy link
Contributor Author

kaiyan-sheng commented Apr 9, 2019

@roncohen and @exekias Please feel free to comment on this 😄 Thanks for the great idea.

@kaiyan-sheng kaiyan-sheng self-assigned this Apr 9, 2019
@kaiyan-sheng kaiyan-sheng added the Team:Integrations Label for the Integrations team label Apr 9, 2019
@exekias exekias added the discuss Issue needs further discussion. label Apr 10, 2019
@exekias
Copy link
Contributor

exekias commented Apr 10, 2019

I think this metricset would be really useful! I have a few questions:

  • Perhaps we can handle the time parameters (start-time, end-time and period) for the user? we already have a period in all metricsets. start-time and end-time would be adjusted to get the last point in each run?

  • Instead of using a file with the queries, it would be possible to put them in the config, as YAML (YAML is a superset of JSON). ie:

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: MyApplication
      metricname: Invocations
      dimensions:
        - name: FunctionName
          value: MyFunc

@kaiyan-sheng
Copy link
Contributor Author

@exekias Yeah end-time is probably not necessary at all. If we support start-time, then user can give a start-time that can be in the past. So when the first collection starts, cloudwatch metricset can pick up all the metrics from start-time to now first. But that might introduce some potential performance issues. For example, if the start-time is too old, then there might be too many metrics for the first collection.

Good suggestion on the config 👍

@kaiyan-sheng
Copy link
Contributor Author

If we use list-metrics to get all metrics that are available for GetMetricData, it will save users a lot of time/effort to generate the config but maybe won't be has handy if users have specific metrics to query in mind.
list-metrics --namespace "AWS/SNS" output:

{
    "Metrics": [
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "PublishSize"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "PublishSize"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfNotificationsFailed"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfNotificationsDelivered"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "NotifyMe"
                }
            ],
            "MetricName": "NumberOfMessagesPublished"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfMessagesPublished"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfNotificationsDelivered"
        },
        {
            "Namespace": "AWS/SNS",
            "Dimensions": [
                {
                    "Name": "TopicName",
                    "Value": "CFO"
                }
            ],
            "MetricName": "NumberOfNotificationsFailed"
        }
    ]
}

@exekias
Copy link
Contributor

exekias commented Apr 11, 2019

It sounds to me that some people will be interested in catching all metrics from a namespace, so we could allow for full namespace retrieval when no metricname is defined, I would expect this to be more common than cherry picking just a few metrics.

- module: aws
  metricsets: ["cloudwatch"]
  cloudwatch.metrics:
    - namespace: AWS/SNS

@kaiyan-sheng
Copy link
Contributor Author

@exekias Yep, I agree. It will be a much easier setup for users (when extra api query/data transfer cost is not a problem).

@roncohen
Copy link
Contributor

roncohen commented Apr 15, 2019

great to get this effort started @kaiyan-sheng!

Some ideas:

  • reg. start time: could we query for the metric in ES to get the most recent datapoint and continue from there? We could default to pulling in the last 1h if there's no existing data for a metric. metricbeat might get restarted and it would be great to continue from where it got to.
  • could we have a list of default periods for each namespace? e.g. S3 looks like it should default to 1 day. Should be overridable in the config.

@kaiyan-sheng
Copy link
Contributor Author

Good point @roncohen! For your second idea, I ran into this problem in #11798 just now. For each namespace, the collection period should be different.
Currently the configuration looks like:

- module: aws
  period: 86400s
  metricsets:
    - "cloudwatch"
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'
  default_region: '${AWS_REGION:us-west-1}'
  cloudwatch_metrics:
    - namespace: AWS/S3

First question is: if we should support cloudwatch metricset to config/collect from more than one namespace.
Then the configuration will look like:

- module: aws
  period: 300s
  metricsets:
    - "cloudwatch"
  access_key_id: '${AWS_ACCESS_KEY_ID:""}'
  secret_access_key: '${AWS_SECRET_ACCESS_KEY:""}'
  session_token: '${AWS_SESSION_TOKEN:""}'
  default_region: '${AWS_REGION:us-west-1}'
  cloudwatch_metrics:
    - namespace: AWS/EC2
    - namespace: AWS/S3
    - namespace: AWS/SQS

Second question is: if we support multiple namespaces like the config above, then we need to have default period for each namespace hardcoded in the code. For example, for AWS/EC2 it will be 300s, for AWS/S3 will be 86400s and for AWS/SQS will be 300s.

We can introduce a separate period config for each namespace but I feel that just make everything confusing. cloudwatch as a metricset should only have one period, which is the frequency that this metricset will run. @exekias What's your thought on this?

@exekias
Copy link
Contributor

exekias commented Apr 16, 2019

I think supporting multiple namespaces is useful in many cases, specially when the period is the same. As period is a default setting, I wouldn't introduce a new config key for this. Users can always configure several instances of the metricset when they require different periods per namespace.

@exekias
Copy link
Contributor

exekias commented Apr 16, 2019

btw, for readability, period should accept other units, so something like 24h should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants