Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Module and Example for ECS cluster monitoring with ecs_observer #211

Merged
merged 38 commits into from
Nov 2, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
727674c
Adding Module and Example for ECS cluster monitoring with ecs_observer
ruchimo Aug 9, 2023
1900fc8
Adding Module and Example for ECS cluster monitoring with ecs_observer
ruchimo Aug 9, 2023
930721a
Merge branch 'main' into main
bonclay7 Aug 9, 2023
c1ac8b8
Incorporating PR comments
ruchimo Aug 10, 2023
52c14f4
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Aug 10, 2023
5968b45
Merge branch 'main' into main
ruchimo Aug 10, 2023
e2e7af7
Restructuring Examples and modules folder for ECS, Added content in m…
ruchimo Aug 10, 2023
3be9980
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Aug 10, 2023
0ef4294
Fixing path as per PR comments
ruchimo Aug 14, 2023
6071dbd
Parameterzing the config files, incorporated PR review comments
ruchimo Aug 14, 2023
63b1db9
Merge branch 'main' into main
bonclay7 Aug 17, 2023
3ca49bd
Adding condition for AMP WS and fixing AMP endpoint
ruchimo Aug 24, 2023
54a2a28
Merge branch 'main' into main
bonclay7 Aug 25, 2023
ac1058d
Adding Document for ECS Monitoring and parameterized some variables
ruchimo Sep 1, 2023
23c073a
Added sample dashboard
ruchimo Sep 1, 2023
c1d8303
Adding Document for ECS Monitoring and parameterized some variables
ruchimo Sep 1, 2023
47be776
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Sep 1, 2023
c272480
Merge branch 'main' into main
ruchimo Sep 1, 2023
b2ae876
Fixing failures detected by pre-commit
ruchimo Sep 4, 2023
298c7d5
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Sep 4, 2023
a94d212
Merge branch 'main' into main
ruchimo Sep 4, 2023
6170820
Fixing failures detected by pre-commit
ruchimo Sep 4, 2023
f591aff
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Sep 4, 2023
2343c10
Merge branch 'main' into main
ruchimo Sep 5, 2023
47364af
Merge branch 'main' into main
bonclay7 Sep 21, 2023
bf318db
Merge branch 'main' into main
bonclay7 Oct 25, 2023
86ebf4e
Fixing failures detected by pre-commit
ruchimo Oct 27, 2023
70f7e44
Pre-commit fixes
bonclay7 Oct 29, 2023
bdcdc0d
Fixing failures detected by pre-commit
ruchimo Oct 30, 2023
289a01d
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Oct 30, 2023
90ffa83
Fixing failures detected by pre-commit
ruchimo Oct 30, 2023
4440d0c
Pre-commit
bonclay7 Oct 30, 2023
0f126e7
Fixing HIGH security alerts detected by pre-commit
ruchimo Oct 30, 2023
d4d7c8e
Merge branch 'main' of github.com:ruchimo/terraform-aws-observability…
ruchimo Oct 30, 2023
a1c1821
Fixing HIGH security alerts detected by pre-commit
ruchimo Oct 31, 2023
59355d6
Fixing HIGH security alerts detected by pre-commit, 31stOct
ruchimo Oct 31, 2023
8919e6c
Add links after merge
bonclay7 Nov 2, 2023
382987b
2ndNov - Added condiotnal creation for Grafana WS and module versions…
ruchimo Nov 2, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 4 additions & 13 deletions modules/ecs-monitoring/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,14 @@ This module provides ECS cluster monitoring with the following resources:

- AWS Distro For OpenTelemetry Operator and Collector for Metrics and Traces
- Creates Grafana Dashboards on Amazon Managed Grafana.
- Create SSM Parameter to store the ADOT config yaml file
- Creates SSM Parameter to store and distribute the ADOT config file

## Pre-requisites
* ECS Cluster with EC2 using examples --> ecs-cluster-with-vpc
* Create a `Prometheus Workspace` either using the Console or using the commented code under modules/ecs-monitoring/main.tf.
* Update your exisitng App(workload) *ECS Task Definition* to add below label/environment variable
- Set ***ECS_PROMETHEUS_EXPORTER_PORT*** to point to the containerPort where the Prometheus metrics are exposed
- Set ***Java_EMF_Metrics*** to true. The CloudWatch agent uses this flag to generated the embedded metric format in the log event.
* Make sure to update the placeholder values in the below files
- configs/config.yaml
- region
- cluster_name
- cluster_region
- prometheusremotewrite --> endpoint
- task-definitions/otel_collector.json
- awslogs-region

This module makes use of the below open source projects:
* [aws-managed-grafana](https://github.com/terraform-aws-modules/terraform-aws-managed-service-grafana)
Expand All @@ -32,7 +24,7 @@ See examples using this Terraform modules in the **Amazon ECS** section of [this

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.1.0 |
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.0.0 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 5.0.0 |

## Providers
Expand All @@ -46,7 +38,6 @@ See examples using this Terraform modules in the **Amazon ECS** section of [this
| Name | Source | Version |
|------|--------|---------|
| <a name="module_managed_grafana_default"></a> [managed\_grafana\_default](#module\_managed\_grafana\_default) | terraform-aws-modules/managed-service-grafana/aws | n/a |
| <a name="module_managed_prometheus_default"></a> [managed\_prometheus\_default](#module\_managed\_prometheus\_default) | terraform-aws-modules/managed-service-prometheus/aws | n/a |

## Resources

Expand All @@ -62,6 +53,8 @@ See examples using this Terraform modules in the **Amazon ECS** section of [this
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_aws_ecs_cluster_name"></a> [aws\_ecs\_cluster\_name](#input\_aws\_ecs\_cluster\_name) | Name of your ECS cluster | `string` | n/a | yes |
| <a name="input_ecs_adot_cpu"></a> [ecs\_adot\_cpu](#input\_ecs\_adot\_cpu) | CPU to be allocated for the ADOT ECS TASK | `string` | `"256"` | no |
| <a name="input_ecs_adot_mem"></a> [ecs\_adot\_mem](#input\_ecs\_adot\_mem) | Memory to be allocated for the ADOT ECS TASK | `string` | `"512"` | no |
| <a name="input_executionRoleArn"></a> [executionRoleArn](#input\_executionRoleArn) | ARN of the IAM Execution Role | `string` | n/a | yes |
| <a name="input_taskRoleArn"></a> [taskRoleArn](#input\_taskRoleArn) | ARN of the IAM Task Role | `string` | n/a | yes |

Expand All @@ -71,6 +64,4 @@ See examples using this Terraform modules in the **Amazon ECS** section of [this
|------|-------------|
| <a name="output_grafana_workspace_endpoint"></a> [grafana\_workspace\_endpoint](#output\_grafana\_workspace\_endpoint) | The endpoint of the Grafana workspace |
| <a name="output_grafana_workspace_id"></a> [grafana\_workspace\_id](#output\_grafana\_workspace\_id) | The ID of the Grafana workspace |
| <a name="output_prometheus_workspace_endpoint"></a> [prometheus\_workspace\_endpoint](#output\_prometheus\_workspace\_endpoint) | Prometheus endpoint available for this workspace |
| <a name="output_prometheus_workspace_id"></a> [prometheus\_workspace\_id](#output\_prometheus\_workspace\_id) | Identifier of the workspace |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
36 changes: 18 additions & 18 deletions modules/ecs-monitoring/configs/config.yaml
ruchimo marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,52 +1,52 @@
extensions:
sigv4auth:
region: "us-east-1"
region: "${aws_region}"
service: "aps"
ecs_observer: # extension type is ecs_observer
cluster_name: 'aot-test-cluster' # cluster name need to configured manually
cluster_region: 'us-east-1' # region can be configured directly or use AWS_REGION env var
result_file: '/etc/ecs_sd_targets.yaml' # the directory for file must already exists
refresh_interval: 60s
cluster_name: "${cluster_name}" # cluster name need to configured manually
cluster_region: "${cluster_region}" # region can be configured directly or use AWS_REGION env var
result_file: "/etc/ecs_sd_targets.yaml" # the directory for file must already exists
refresh_interval: ${refresh_interval}
job_label_name: prometheus_job
# JMX
docker_labels:
- port_label: 'ECS_PROMETHEUS_EXPORTER_PORT'
- port_label: "ECS_PROMETHEUS_EXPORTER_PORT"

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
endpoint: ${otlpGrpcEndpoint}
http:
endpoint: 0.0.0.0:4318
endpoint: ${otlpHttpEndpoint}
prometheus:
config:
scrape_configs:
- job_name: "ecssd"
file_sd_configs:
- files:
- '/etc/ecs_sd_targets.yaml'
- "/etc/ecs_sd_targets.yaml"
relabel_configs:
- source_labels: [ __meta_ecs_cluster_name ]
- source_labels: [__meta_ecs_cluster_name]
action: replace
target_label: ClusterName
- source_labels: [ __meta_ecs_service_name ]
- source_labels: [__meta_ecs_service_name]
action: replace
target_label: ServiceName
- source_labels: [ __meta_ecs_task_definition_family ]
- source_labels: [__meta_ecs_task_definition_family]
action: replace
target_label: TaskDefinitionFamily
- source_labels: [ __meta_ecs_task_launch_type ]
- source_labels: [__meta_ecs_task_launch_type]
action: replace
target_label: LaunchType
- source_labels: [ __meta_ecs_container_name ]
- source_labels: [__meta_ecs_container_name]
action: replace
target_label: container_name
- action: labelmap
regex: ^__meta_ecs_container_labels_(.+)$
replacement: '$$1'
replacement: "$$1"
awsecscontainermetrics:
collection_interval: 15s
collection_interval: ${ecs_metrics_collection_interval}

processors:
resource:
Expand Down Expand Up @@ -111,7 +111,7 @@ processors:

exporters:
prometheusremotewrite:
bonclay7 marked this conversation as resolved.
Show resolved Hide resolved
endpoint: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-f9a9b3d8-511e-4640-9b2d-15fbd53f7209/api/v1/remote_write"
endpoint: "${amp_remote_write_ep}"
auth:
authenticator: sigv4auth
logging:
Expand All @@ -127,4 +127,4 @@ service:
metrics/ecs:
receivers: [awsecscontainermetrics]
processors: [filter]
exporters: [logging, prometheusremotewrite]
exporters: [logging, prometheusremotewrite]
24 changes: 24 additions & 0 deletions modules/ecs-monitoring/locals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,28 @@ locals {
region = data.aws_region.current.name
name = "amg-ex-${replace(basename(path.cwd), "_", "-")}"
description = "AWS Managed Grafana service for ${local.name}"

default_otel_values = {
aws_region = data.aws_region.current.name
cluster_name = var.aws_ecs_cluster_name
cluster_region = data.aws_region.current.name
refresh_interval = "60s"
ecs_metrics_collection_interval = "15s"
amp_remote_write_ep = "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/ws-f9a9b3d8-511e-4640-9b2d-15fbd53f7209/api/v1/remote_write"
ruchimo marked this conversation as resolved.
Show resolved Hide resolved
otlpGrpcEndpoint = "0.0.0.0:4317"
otlpHttpEndpoint = "0.0.0.0:4318"
}

ruchimo marked this conversation as resolved.
Show resolved Hide resolved
ssm_param_value = yamlencode(
templatefile("${path.module}/configs/config.yaml", local.default_otel_values)
)

container_def_default_values = {
container_name = "adot_new"
otel_image_ver = "v0.31.0"
ruchimo marked this conversation as resolved.
Show resolved Hide resolved
aws_region = data.aws_region.current.name
}

container_definitions = templatefile("${path.module}/task_definitions/otel_collector.json", local.container_def_default_values)

}
15 changes: 9 additions & 6 deletions modules/ecs-monitoring/main.tf
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# SSM Parameter
# SSM Parameter for storing and distrivuting the ADOT config
resource "aws_ssm_parameter" "adot-config" {
name = "/observability_aws/otel_collector_conf"
name = "/terraform-aws-observability/otel_collector_config"
description = "SSM parameter for aws-observability-accelerator/otel-collector-config"
type = "String"
value = yamlencode(file("configs/config.yaml"))
value = local.ssm_param_value
tier = "Intelligent-Tiering"
}

############################################
Expand All @@ -16,8 +17,10 @@ module "managed_grafana_default" {
associate_license = false
}

#####################
## Commented this module, as AMP workspace is a pre-requiste for this solution.
## You can use this code to create a AMP workspace
#####################

# module "managed_prometheus_default" {
# source = "terraform-aws-modules/managed-service-prometheus/aws"
Expand All @@ -33,9 +36,9 @@ resource "aws_ecs_task_definition" "adot_ecs_prometheus" {
execution_role_arn = var.executionRoleArn
network_mode = "bridge"
requires_compatibilities = ["EC2"]
cpu = "256"
memory = "512"
container_definitions = file("task_definitions/otel_collector.json")
cpu = var.ecs_adot_cpu
memory = var.ecs_adot_mem
container_definitions = local.container_definitions
}

############################################
Expand Down
16 changes: 8 additions & 8 deletions modules/ecs-monitoring/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ output "grafana_workspace_endpoint" {
value = module.managed_grafana_default.workspace_endpoint
}

output "prometheus_workspace_id" {
description = "Identifier of the workspace"
value = module.managed_prometheus_default.workspace_id
}
# output "prometheus_workspace_id" {
# description = "Identifier of the workspace"
# value = module.managed_prometheus_default.workspace_id
# }

output "prometheus_workspace_endpoint" {
description = "Prometheus endpoint available for this workspace"
value = module.managed_prometheus_default.workspace_prometheus_endpoint
}
# output "prometheus_workspace_endpoint" {
# description = "Prometheus endpoint available for this workspace"
# value = module.managed_prometheus_default.workspace_prometheus_endpoint
# }
6 changes: 3 additions & 3 deletions modules/ecs-monitoring/task-definitions/otel_collector.json
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
[
{
"name": "adot",
"image": "amazon/aws-otel-collector:v0.31.0",
"name": "${container_name}",
"image": "amazon/aws-otel-collector:${otel_image_ver}",
"secrets": [
{
"name": "AOT_CONFIG_CONTENT",
"valueFrom": "/aws-observability-accelerator/otel-collector-config"
"valueFrom": "/terraform-aws-observability/otel_collector_config"
}
],
"logConfiguration": {
Expand Down
12 changes: 12 additions & 0 deletions modules/ecs-monitoring/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,15 @@ variable "executionRoleArn" {
description = "ARN of the IAM Execution Role"
type = string
}

variable "ecs_adot_cpu" {
description = "CPU to be allocated for the ADOT ECS TASK"
type = string
default = "256"
}

variable "ecs_adot_mem" {
description = "Memory to be allocated for the ADOT ECS TASK"
type = string
default = "512"
}
2 changes: 1 addition & 1 deletion modules/ecs-monitoring/versions.tf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
terraform {
required_version = ">= 1.1.0"
required_version = ">= 1.0.0"

required_providers {
aws = {
Expand Down