Skip to content

Commit

Permalink
[AWS] Add number_of_workers and latency to all CloudWatch Logs ba…
Browse files Browse the repository at this point in the history
…sed integrations (#5794)

* Add number_of_workers and latency

All the CloudWatch Logs based integration should have these options
available to them.

The `number_of_workers` is essential to increase the number of workers when
users decide to use `log_group_name_prefix`.

The `latency` can be required on the busiest log groups to deal with
potential latency.

* Add advanced options docs to all CloudWatch logs
  • Loading branch information
zmoog authored Apr 12, 2023
1 parent a0f0502 commit fe8795c
Show file tree
Hide file tree
Showing 38 changed files with 470 additions and 13 deletions.
18 changes: 18 additions & 0 deletions packages/aws/_dev/build/docs/cloudtrail.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,24 @@ When you configure the AWS integration, you can collect data from as many AWS se
For step-by-step instructions on how to set up an integration, see the
{{ url "getting-started-observability" "Getting started" }} guide.

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

The `cloudtrail` data stream collects AWS CloudTrail logs. CloudTrail monitors events like
Expand Down
20 changes: 15 additions & 5 deletions packages/aws/_dev/build/docs/cloudwatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,23 @@ When you configure the AWS integration, you can collect data from as many AWS se
For step-by-step instructions on how to set up an integration, see the
{{ url "getting-started-observability" "Getting started" }} guide.

### Advanced
### Advanced options

#### Latency
#### CloudWatch

Log events on the busies log groups may require a longer time before they are available to CloudWatch Logs.
The CloudWatch logs input has several advanced options to fit specific use cases.

The CloudWatch integration offers the `latency` setting to cope with this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.
##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

Expand All @@ -61,4 +71,4 @@ CloudWatch logs to monitor, store, and access log files from different sources.

{{event "cloudwatch_metrics"}}

{{fields "cloudwatch_metrics"}}
{{fields "cloudwatch_metrics"}}
18 changes: 18 additions & 0 deletions packages/aws/_dev/build/docs/ec2.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,24 @@ When you configure the AWS integration, you can collect data from as many AWS se
For step-by-step instructions on how to set up an integration, see the
{{ url "getting-started-observability" "Getting started" }} guide.

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

The `ec2` data stream supports both EC2 logs stored in AWS CloudWatch and EC2 logs stored in Amazon S3.
Expand Down
20 changes: 19 additions & 1 deletion packages/aws/_dev/build/docs/elb.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,24 @@ For an application load balancer, see [enable access log for application load ba

For a network load balancer, see [enable access log for network load balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest//network/load-balancer-access-logs.html).

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

The `elb` dataset collects logs from AWS ELBs.
Expand All @@ -59,4 +77,4 @@ The `elb` dataset collects logs from AWS ELBs.

{{event "elb_metrics"}}

{{fields "elb_metrics"}}
{{fields "elb_metrics"}}
18 changes: 18 additions & 0 deletions packages/aws/_dev/build/docs/firewall.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,24 @@ When you configure the AWS integration, you can collect data from as many AWS se
For step-by-step instructions on how to set up an integration, see the
{{ url "getting-started-observability" "Getting started" }} guide.

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

The `firewall_logs` dataset collects AWS Network Firewall logs. Users can use these logs to
Expand Down
18 changes: 18 additions & 0 deletions packages/aws/_dev/build/docs/route53.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,24 @@ When you configure the AWS integration, you can collect data from as many AWS se
For step-by-step instructions on how to set up an integration, see the
{{ url "getting-started-observability" "Getting started" }} guide.

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

### Public Hosted Zone logs
Expand Down
18 changes: 18 additions & 0 deletions packages/aws/_dev/build/docs/vpcflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,24 @@ This integration supports various plain text VPC flow log formats:
${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${type} ${pkt-srcaddr} ${pkt-dstaddr} ${region} ${az-id} ${sublocation-type} ${sublocation-id} ${pkt-src-aws-service} ${pkt-dst-aws-service} ${flow-direction} ${traffic-path}
```

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

> Note: The Parquet format is not supported.
Expand Down
18 changes: 18 additions & 0 deletions packages/aws/_dev/build/docs/waf.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,24 @@ When you configure the AWS integration, you can collect data from as many AWS se
For step-by-step instructions on how to set up an integration, see the
{{ url "getting-started-observability" "Getting started" }} guide.

### Advanced options

#### CloudWatch

The CloudWatch logs input has several advanced options to fit specific use cases.

##### Latency

AWS CloudWatch Logs sometimes takes extra time to make the latest logs available to clients like the Agent.

The CloudWatch integration offers the `latency` setting to address this scenario. Latency translates the query's time range to consider the CloudWatch Logs latency. For example, a `5m` latency means the integration will query CloudWatch for logs available 5 minutes ago.

##### Number of workers

If you are collecting log events from multiple log groups using `log_group_name_prefix`, you should review the value of the `number_of_workers`.

The `number_of_workers` setting defines the number of workers assigned to reading from log groups. Each log group matching the `log_group_name_prefix` requires a worker to keep log ingestion as close to real-time as possible. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.

## Logs reference

The `waf` dataset is specifically for WAF logs. Export logs from Kinesis Data Firehose to Amazon S3 bucket which has SQS notification setup already.
Expand Down
7 changes: 6 additions & 1 deletion packages/aws/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.33.3"
changes:
- description: Add number_of_workers and latency to all CloudWatch Logs based integrations.
type: enhancement
link: https://github.com/elastic/integrations/pull/5794
- version: "1.33.2"
changes:
- description: Add missing permissions in the AWS Billing integration documentation.
Expand All @@ -12,7 +17,7 @@
- version: "1.33.0"
changes:
- description: Add latency configuration option on the CloudWatch Logs integration.
type: bugfix
type: enhancement
link: https://github.com/elastic/integrations/pull/5777
- version: "1.32.2"
changes:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ scan_frequency: {{ scan_frequency }}
api_sleep: {{ api_sleep }}
{{/if}}

{{#if latency }}
latency: {{ latency }}
{{/if}}
{{#if number_of_workers }}
number_of_workers: {{ number_of_workers }}
{{/if}}

{{#if credential_profile_name}}
credential_profile_name: {{credential_profile_name}}
{{/if}}
Expand Down
13 changes: 13 additions & 0 deletions packages/aws/data_stream/cloudtrail/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,19 @@ streams:
show_user: false
default: 200ms
description: This is used to sleep between AWS FilterLogEvents API calls inside the same collection period. `FilterLogEvents` API has a quota of 5 transactions per second (TPS)/account/Region. This value should only be adjusted when there are multiple Filebeats or multiple Filebeat inputs collecting logs from the same region and AWS account.
- name: latency
type: text
title: Latency
multi: false
required: false
show_user: false
description: "The amount of time required for the logs to be available to CloudWatch Logs. Sample values, `1m` or `5m` — see Golang [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) for more details. Latency translates the query's time range to consider the CloudWatch Logs latency. Example: `5m` means that the integration will query CloudWatch to search for logs available 5 minutes ago."
- name: number_of_workers
type: integer
title: Number of workers
required: false
show_user: false
description: The number of workers assigned to reading from log groups. Each worker will read log events from one of the log groups matching `log_group_name_prefix`. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.
- name: tags
type: text
title: Tags
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ api_sleep: {{ api_sleep }}
{{#if latency }}
latency: {{ latency }}
{{/if}}
{{#if number_of_workers }}
number_of_workers: {{ number_of_workers }}
{{/if}}

{{#if credential_profile_name}}
credential_profile_name: {{credential_profile_name}}
Expand Down
6 changes: 6 additions & 0 deletions packages/aws/data_stream/cloudwatch_logs/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,12 @@ streams:
required: false
show_user: false
description: "The amount of time required for the logs to be available to CloudWatch Logs. Sample values, `1m` or `5m` — see Golang [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) for more details. Latency translates the query's time range to consider the CloudWatch Logs latency. Example: `5m` means that the integration will query CloudWatch to search for logs available 5 minutes ago."
- name: number_of_workers
type: integer
title: Number of workers
required: false
show_user: false
description: The number of workers assigned to reading from log groups. Each worker will read log events from one of the log groups matching the `log_group_name_prefix`. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.
- name: tags
type: text
title: Tags
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ scan_frequency: {{ scan_frequency }}
api_sleep: {{ api_sleep }}
{{/if}}

{{#if latency }}
latency: {{ latency }}
{{/if}}
{{#if number_of_workers }}
number_of_workers: {{ number_of_workers }}
{{/if}}

{{#if credential_profile_name}}
credential_profile_name: {{credential_profile_name}}
{{/if}}
Expand Down
13 changes: 13 additions & 0 deletions packages/aws/data_stream/ec2_logs/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,19 @@ streams:
show_user: false
default: 200ms
description: This is used to sleep between AWS FilterLogEvents API calls inside the same collection period. `FilterLogEvents` API has a quota of 5 transactions per second (TPS)/account/Region. This value should only be adjusted when there are multiple Filebeats or multiple Filebeat inputs collecting logs from the same region and AWS account.
- name: latency
type: text
title: Latency
multi: false
required: false
show_user: false
description: "The amount of time required for the logs to be available to CloudWatch Logs. Sample values, `1m` or `5m` — see Golang [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) for more details. Latency translates the query's time range to consider the CloudWatch Logs latency. Example: `5m` means that the integration will query CloudWatch to search for logs available 5 minutes ago."
- name: number_of_workers
type: integer
title: Number of workers
required: false
show_user: false
description: The number of workers assigned to reading from log groups. Each worker will read log events from one of the log groups matching `log_group_name_prefix`. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.
- name: tags
type: text
title: Tags
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ scan_frequency: {{ scan_frequency }}
api_sleep: {{ api_sleep }}
{{/if}}

{{#if latency }}
latency: {{ latency }}
{{/if}}
{{#if number_of_workers }}
number_of_workers: {{ number_of_workers }}
{{/if}}

{{#if credential_profile_name}}
credential_profile_name: {{credential_profile_name}}
{{/if}}
Expand Down
13 changes: 13 additions & 0 deletions packages/aws/data_stream/elb_logs/manifest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,19 @@ streams:
show_user: false
default: 200ms
description: This is used to sleep between AWS FilterLogEvents API calls inside the same collection period. `FilterLogEvents` API has a quota of 5 transactions per second (TPS)/account/Region. This value should only be adjusted when there are multiple Filebeats or multiple Filebeat inputs collecting logs from the same region and AWS account.
- name: latency
type: text
title: Latency
multi: false
required: false
show_user: false
description: "The amount of time required for the logs to be available to CloudWatch Logs. Sample values, `1m` or `5m` — see Golang [time.ParseDuration](https://pkg.go.dev/time#ParseDuration) for more details. Latency translates the query's time range to consider the CloudWatch Logs latency. Example: `5m` means that the integration will query CloudWatch to search for logs available 5 minutes ago."
- name: number_of_workers
type: integer
title: Number of workers
required: false
show_user: false
description: The number of workers assigned to reading from log groups. Each worker will read log events from one of the log groups matching `log_group_name_prefix`. For example, if `log_group_name_prefix` matches five log groups, then `number_of_workers` should be set to `5`. The default value is `1`.
- name: tags
type: text
title: Tags
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ scan_frequency: {{ scan_frequency }}
api_sleep: {{ api_sleep }}
{{/if}}

{{#if latency }}
latency: {{ latency }}
{{/if}}
{{#if number_of_workers }}
number_of_workers: {{ number_of_workers }}
{{/if}}

{{#if credential_profile_name}}
credential_profile_name: {{credential_profile_name}}
{{/if}}
Expand Down
Loading

0 comments on commit fe8795c

Please sign in to comment.