diff --git a/docs/patterns/existing-eks-observability-accelerators/existing-eks-nginx-observability.md b/docs/patterns/existing-eks-observability-accelerators/existing-eks-nginx-observability.md new file mode 100644 index 00000000..94b255ca --- /dev/null +++ b/docs/patterns/existing-eks-observability-accelerators/existing-eks-nginx-observability.md @@ -0,0 +1,229 @@ +# Existing EKS Cluster Nginx Observability Accelerator + +## Architecture + +The following figure illustrates the architecture of the pattern we will be deploying for Existing EKS Cluster NGINX pattern, using Open Source tools such as AWS Distro for OpenTelemetry (ADOT), Amazon Managed Grafana workspace and Prometheus. + + +The current example deploys the AWS Distro for OpenTelemetry Operator for Amazon EKS with its requirements and make use of an existing Amazon Managed Grafana workspace. It creates a new Amazon Managed Service for Prometheus workspace. And You will gain both visibility on the cluster and NGINX based applications. + + +## Objective + +This pattern aims to add Observability on top of an existing EKS cluster and NGINX workloads, with open source managed AWS services. + +## Prerequisites: + +Ensure that you have installed the following tools on your machine: + +1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) +2. [kubectl](https://Kubernetes.io/docs/tasks/tools/) +3. [cdk](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) +4. [npm](https://docs.npmjs.com/cli/v8/commands/npm-install) + +You will also need: + +1. Either an existing EKS cluster, or you can setup a new one with [Single New EKS Cluster Observability Accelerator](../single-new-eks-observability-accelerators/single-new-eks-cluster.md) +2. An OpenID Connect (OIDC) provider, associated to the above EKS cluster (Note: Single EKS Cluster Pattern takes care of that for you) + +## Deploying + +1. Edit `~/.cdk.json` by setting the name of your existing cluster: + +```json + "context": { + ... + "existing.cluster.name": "...", + ... + } +``` + +2. Edit `~/.cdk.json` by setting the kubectl role name; if you used Single New EKS Cluster Observability Accelerator to setup your cluster, the kubectl role name would be provided by the output of the deployment, on your command-line interface (CLI): + +```json + "context": { + ... + "existing.kubectl.rolename":"...", + ... + } +``` + +3. Amazon Managed Grafana workspace: To visualize metrics collected, you need an Amazon Managed Grafana workspace. If you have an existing workspace, create an environment variable as described below. To create a new workspace, visit [our supporting example for Grafana](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/) + +!!! note +For the URL `https://g-xyz.grafana-workspace.us-east-1.amazonaws.com`, the workspace ID would be `g-xyz` + +```bash +export AWS_REGION= +export COA_AMG_WORKSPACE_ID=g-xxx +export COA_AMG_ENDPOINT_URL=https://g-xyz.grafana-workspace.us-east-1.amazonaws.com +``` + +!!! warning +Setting up environment variables `COA_AMG_ENDPOINT_URL` and `AWS_REGION` is mandatory for successful execution of this pattern. + +4. GRAFANA API KEY: Amazon Managed Grafana provides a control plane API for generating Grafana API keys. + +```bash +export AMG_API_KEY=$(aws grafana create-workspace-api-key \ + --key-name "grafana-operator-key" \ + --key-role "ADMIN" \ + --seconds-to-live 432000 \ + --workspace-id $COA_AMG_WORKSPACE_ID \ + --query key \ + --output text) +``` + +5. AWS SSM Parameter Store for GRAFANA API KEY: Update the Grafana API key secret in AWS SSM Parameter Store using the above new Grafana API key. This will be referenced by Grafana Operator deployment of our solution to access Amazon Managed Grafana from Amazon EKS Cluster + +```bash +aws ssm put-parameter --name "/cdk-accelerator/grafana-api-key" \ + --type "SecureString" \ + --value $AMG_API_KEY \ + --region $AWS_REGION +``` + +6. Install project dependencies by running `npm install` in the main folder of this cloned repository. + +7. The actual settings for dashboard urls are expected to be specified in the CDK context. Generically it is inside the cdk.json file of the current directory or in `~/.cdk.json` in your home directory. + +Example settings: Update the context in `cdk.json` file located in `cdk-eks-blueprints-patterns` directory + +```typescript + "context": { + "fluxRepository": { + "name": "grafana-dashboards", + "namespace": "grafana-operator", + "repository": { + "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator", + "name": "grafana-dashboards", + "targetRevision": "main", + "path": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + "values": { + "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json", + "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json", + "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json", + "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json", + "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json", + "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json", + "GRAFANA_NGINX_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/nginx/nginx.json" + }, + "kustomizations": [ + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/nginx" + } + ] + }, + "nginx.pattern.enabled": true + } +``` + +8. Once all pre-requisites are set you are ready to deploy the pipeline. Run the following command from the root of this repository to deploy the pipeline stack: + +```bash +make build +make pattern existing-eks-opensource-observability deploy +``` + +## Deploy an example Nginx application + +In this section we will deploy sample application and extract metrics using AWS OpenTelemetry collector. + +1. Add NGINX ingress controller add-on into [lib/existing-eks-opensource-observability-pattern/index.ts](../../../lib/existing-eks-opensource-observability-pattern/index.ts) in add-on array. +``` + const addOns: Array = [ + new blueprints.addons.CloudWatchLogsAddon({ + logGroupPrefix: `/aws/eks/${stackId}`, + logRetentionDays: 30 + }), + new blueprints.addons.XrayAdotAddOn(), + new blueprints.addons.FluxCDAddOn({"repositories": [fluxRepository]}), + new GrafanaOperatorSecretAddon(), + new blueprints.addons.NginxAddOn({ + name: "ingress-nginx", + chart: "ingress-nginx", + repository: "https://kubernetes.github.io/ingress-nginx", + version: "4.7.2", + namespace: "nginx-ingress-sample", + values: { + controller: { + metrics: { + enabled: true, + service: { + annotations: { + "prometheus.io/port": "10254", + "prometheus.io/scrape": "true" + } + } + } + } + } + }), + ]; +``` + +2. Deploy pattern again +``` +make pattern existing-eks-opensource-observability deploy +``` + +3. Verify if the application is running +``` +kubectl get pods -n nginx-ingress-sample +``` + +4. Set an EXTERNAL-IP variable to the value of the EXTERNAL-IP column in the row of the NGINX ingress controller. +``` +EXTERNAL_IP=your-nginx-controller-external-ip +``` + +5. Start some sample NGINX traffic by entering the following command. +``` +SAMPLE_TRAFFIC_NAMESPACE=nginx-sample-traffic +curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_traffic/nginx-traffic/nginx-traffic-sample.yaml | +sed "s/{{external_ip}}/$EXTERNAL_IP/g" | +sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" | +kubectl apply -f - +``` + +## Verify the resources + +``` +kubectl get pod -n nginx-sample-traffic +``` + +## Visualization +1. Prometheus datasource on Grafana +- After a successful deployment, this will open the Prometheus datasource configuration on Grafana. You should see a notification confirming that the Amazon Managed Service for Prometheus workspace is ready to be used on Grafana. + +2. Grafana dashboards +- Go to the Dashboards panel of your Grafana workspace. You should see a list of dashboards under the `Observability Accelerator Dashboards`. + +![Dashboard](../images/nginx-dashboard.png) + +3. Amazon Managed Service for Prometheus rules and alerts +- Open the Amazon Managed Service for Prometheus console and view the details of your workspace. Under the Rules management tab, you should find new rules deployed. + +To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html) + +## Verify the resources + +Please see [Single New Nginx Observability Accelerator](../single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md). + +## Teardown + +You can teardown the whole CDK stack with the following command: + +```bash +make pattern existing-eks-opensource-observability destroy +``` + +If you setup your cluster with Single New EKS Cluster Observability Accelerator, you also need to run: + +```bash +make pattern single-new-eks-cluster destroy +``` diff --git a/docs/patterns/images/nginx-dashboard.png b/docs/patterns/images/nginx-dashboard.png new file mode 100644 index 00000000..fc9302a0 Binary files /dev/null and b/docs/patterns/images/nginx-dashboard.png differ diff --git a/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md b/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md new file mode 100644 index 00000000..2e78fc24 --- /dev/null +++ b/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md @@ -0,0 +1,137 @@ +# Single New EKS Cluster Open Source Observability Accelerator - monitoring Nginx applications + +## Objective + +This pattern demonstrates how to use the _New EKS Cluster Open Source Observability Accelerator_ with Nginx based workloads. + +## Prerequisites + +Ensure that you have installed the following tools on your machine. + +1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) +2. [kubectl](https://Kubernetes.io/docs/tasks/tools/) +3. [cdk](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) +4. [npm](https://docs.npmjs.com/cli/v8/commands/npm-install) + +## Deploying + +Please follow the _Deploying_ instructions of the [New EKS Cluster Open Source Observability Accelerator](./single-new-eks-opensource-observability.md) pattern, except for step 7, where you need to replace "context" in `~/.cdk.json` with the following: + +```typescript + "context": { + "fluxRepository": { + "name": "grafana-dashboards", + "namespace": "grafana-operator", + "repository": { + "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator", + "name": "grafana-dashboards", + "targetRevision": "main", + "path": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + "values": { + "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json", + "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json", + "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json", + "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json", + "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json", + "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json", + "GRAFANA_NGINX_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/nginx/nginx.json" + }, + "kustomizations": [ + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/nginx" + } + ] + }, + "nginx.pattern.enabled": true + } +``` + +!! warning This scenario might need larger worker node for the pod. + + +Once completed the rest of the _Deploying_ steps, you can move on with the deployment of the Nginx workload. + +## Deploy an example Nginx application + +In this section we will deploy sample application and extract metrics using AWS OpenTelemetry collector. + +1. Add NGINX ingress controller add-on into [lib/single-new-eks-opensource-observability-pattern/index.ts](../../../lib/single-new-eks-opensource-observability-pattern/index.ts) in add-on array. +``` + const addOns: Array = [ + new blueprints.addons.CloudWatchLogsAddon({ + logGroupPrefix: `/aws/eks/${stackId}`, + logRetentionDays: 30 + }), + new blueprints.addons.XrayAdotAddOn(), + new blueprints.addons.FluxCDAddOn({"repositories": [fluxRepository]}), + new GrafanaOperatorSecretAddon(), + new blueprints.addons.NginxAddOn({ + name: "ingress-nginx", + chart: "ingress-nginx", + repository: "https://kubernetes.github.io/ingress-nginx", + version: "4.7.2", + namespace: "nginx-ingress-sample", + values: { + controller: { + metrics: { + enabled: true, + service: { + annotations: { + "prometheus.io/port": "10254", + "prometheus.io/scrape": "true" + } + } + } + } + } + }), + ]; +``` + +2. Deploy pattern again +``` +make pattern single-new-eks-opensource-observability deploy +``` + +3. Verify if the application is running +``` +kubectl get pods -n nginx-ingress-sample +``` + +4. Set an EXTERNAL-IP variable to the value of the EXTERNAL-IP column in the row of the NGINX ingress controller. +``` +EXTERNAL_IP=your-nginx-controller-external-ip +``` + +5. Start some sample NGINX traffic by entering the following command. +``` +SAMPLE_TRAFFIC_NAMESPACE=nginx-sample-traffic +curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/master/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/sample_traffic/nginx-traffic/nginx-traffic-sample.yaml | +sed "s/{{external_ip}}/$EXTERNAL_IP/g" | +sed "s/{{namespace}}/$SAMPLE_TRAFFIC_NAMESPACE/g" | +kubectl apply -f - +``` + +## Verify the resources + +``` +kubectl get pod -n nginx-sample-traffic +``` + +## Visualization + +Login to your Grafana workspace and navigate to the Dashboards panel. You should see a new dashboard named `NGINX`, under `Observability Accelerator Dashboards`. + +![Dashboard](../images/nginx-dashboard.png) + +## Teardown + +You can teardown the whole CDK stack with the following command: + +```bash +make pattern single-new-eks-opensource-observability destroy +``` diff --git a/lib/common/resources/amp-config/nginx/alerting-rules.yml b/lib/common/resources/amp-config/nginx/alerting-rules.yml new file mode 100644 index 00000000..aa03da81 --- /dev/null +++ b/lib/common/resources/amp-config/nginx/alerting-rules.yml @@ -0,0 +1,31 @@ +groups: + - name: Nginx-HTTP-4xx-error-rate + rules: + - alert: metric:alerting_rule + expr: sum(rate(nginx_http_requests_total{status=~"^4.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5 + for: 1m + labels: + severity: critical + annotations: + summary: Nginx high HTTP 4xx error rate (instance {{ $labels.instance }}) + description: "Too many HTTP requests with status 4xx (> 5%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - name: Nginx-HTTP-5xx-error-rate + rules: + - alert: metric:alerting_rule + expr: sum(rate(nginx_http_requests_total{status=~"^5.."}[1m])) / sum(rate(nginx_http_requests_total[1m])) * 100 > 5 + for: 1m + labels: + severity: critical + annotations: + summary: Nginx high HTTP 5xx error rate (instance {{ $labels.instance }}) + description: "Too many HTTP requests with status 5xx (> 5%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" + - name: Nginx-high-latency + rules: + - alert: metric:alerting_rule + expr: histogram_quantile(0.99, sum(rate(nginx_http_request_duration_seconds_bucket[2m])) by (host, node)) > 3 + for: 2m + labels: + severity: warning + annotations: + summary: Nginx latency high (instance {{ $labels.instance }}) + description: "Nginx p99 latency is higher than 3 seconds\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" \ No newline at end of file diff --git a/lib/existing-eks-opensource-observability-pattern/index.ts b/lib/existing-eks-opensource-observability-pattern/index.ts index 32bcec5b..03563eff 100644 --- a/lib/existing-eks-opensource-observability-pattern/index.ts +++ b/lib/existing-eks-opensource-observability-pattern/index.ts @@ -9,7 +9,7 @@ import * as eks from 'aws-cdk-lib/aws-eks'; export default class ExistingEksOpenSourceobservabilityPattern { async buildAsync(scope: cdk.App, id: string) { - + const stackId = `${id}-observability-accelerator`; const clusterName = utils.valueFromContext(scope, "existing.cluster.name", undefined); const kubectlRoleName = utils.valueFromContext(scope, "existing.kubectl.rolename", undefined); @@ -19,7 +19,7 @@ export default class ExistingEksOpenSourceobservabilityPattern { const ampWorkspaceName = process.env.COA_AMP_WORKSPACE_NAME! || 'observability-amp-Workspace'; const ampWorkspace = blueprints.getNamedResource(ampWorkspaceName) as unknown as amp.CfnWorkspace; const ampEndpoint = ampWorkspace.attrPrometheusEndpoint; - const ampWorkspaceArn = ampWorkspace.attrArn; + const ampWorkspaceArn = ampWorkspace.attrArn; const amgEndpointUrl = process.env.COA_AMG_ENDPOINT_URL; const sdkCluster = await blueprints.describeCluster(clusterName, region); // get cluster information using EKS APIs const vpcId = sdkCluster.resourcesVpcConfig?.vpcId; @@ -70,6 +70,19 @@ export default class ExistingEksOpenSourceobservabilityPattern { ); } + if (utils.valueFromContext(scope, "nginx.pattern.enabled", false)) { + ampAddOnProps.openTelemetryCollector = { + manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestParameterMap: { + javaScrapeSampleLimit: 1000, + javaPrometheusMetricsEndpoint: "/metrics" + } + }; + ampAddOnProps.ampRules?.ruleFilePaths.push( + __dirname + '/../common/resources/amp-config/nginx/alerting-rules.yml' + ); + } + Reflect.defineMetadata("ordered", true, blueprints.addons.GrafanaOperatorAddon); const addOns: Array = [ new blueprints.addons.CloudWatchLogsAddon({ @@ -77,7 +90,7 @@ export default class ExistingEksOpenSourceobservabilityPattern { logRetentionDays: 30 }), new blueprints.addons.XrayAdotAddOn(), - new blueprints.addons.FluxCDAddOn({"repositories": [fluxRepository]}), + new blueprints.addons.FluxCDAddOn({ "repositories": [fluxRepository] }), new GrafanaOperatorSecretAddon(), ]; diff --git a/lib/single-new-eks-opensource-observability-pattern/index.ts b/lib/single-new-eks-opensource-observability-pattern/index.ts index af95f4a5..7ce62951 100644 --- a/lib/single-new-eks-opensource-observability-pattern/index.ts +++ b/lib/single-new-eks-opensource-observability-pattern/index.ts @@ -52,6 +52,19 @@ export default class SingleNewEksOpenSourceobservabilityPattern { ); } + if (utils.valueFromContext(scope, "nginx.pattern.enabled", false)) { + ampAddOnProps.openTelemetryCollector = { + manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestParameterMap: { + javaScrapeSampleLimit: 1000, + javaPrometheusMetricsEndpoint: "/metrics" + } + }; + ampAddOnProps.ampRules?.ruleFilePaths.push( + __dirname + '/../common/resources/amp-config/nginx/alerting-rules.yml' + ); + } + Reflect.defineMetadata("ordered", true, blueprints.addons.GrafanaOperatorAddon); const addOns: Array = [ new blueprints.addons.CloudWatchLogsAddon({ @@ -60,7 +73,7 @@ export default class SingleNewEksOpenSourceobservabilityPattern { }), new blueprints.addons.XrayAdotAddOn(), new blueprints.addons.FluxCDAddOn({"repositories": [fluxRepository]}), - new GrafanaOperatorSecretAddon(), + new GrafanaOperatorSecretAddon() ]; ObservabilityBuilder.builder()