diff --git a/docs/patterns/existing-eks-observability-accelerators/existing-eks-apiserver-observability.md b/docs/patterns/existing-eks-observability-accelerators/existing-eks-apiserver-observability.md new file mode 100644 index 00000000..8572f01e --- /dev/null +++ b/docs/patterns/existing-eks-observability-accelerators/existing-eks-apiserver-observability.md @@ -0,0 +1,162 @@ +# Existing EKS Cluster Observability Accelerator - API Server Monitoring + +## Objective + +This pattern aims to add Observability on top of an existing EKS cluster and adds API server monitoring, with open source managed AWS services. + +## Prerequisites: + +Ensure that you have installed the following tools on your machine: + +1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) +2. [kubectl](https://Kubernetes.io/docs/tasks/tools/) +3. [cdk](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) +4. [npm](https://docs.npmjs.com/cli/v8/commands/npm-install) + +You will also need: + +1. Either an existing EKS cluster, or you can setup a new one with [Single New EKS Cluster Observability Accelerator](../single-new-eks-observability-accelerators/single-new-eks-cluster.md) +2. An OpenID Connect (OIDC) provider, associated to the above EKS cluster (Note: Single EKS Cluster Pattern takes care of that for you) + +## Deploying + +1. Edit `~/.cdk.json` by setting the name of your existing cluster: + +```json + "context": { + ... + "existing.cluster.name": "...", + ... + } +``` + +2. Edit `~/.cdk.json` by setting the kubectl role name; if you used Single New EKS Cluster Observability Accelerator to setup your cluster, the kubectl role name would be provided by the output of the deployment, on your command-line interface (CLI): + +```json + "context": { + ... + "existing.kubectl.rolename":"...", + ... + } +``` + +3. Amazon Managed Grafana workspace: To visualize metrics collected, you need an Amazon Managed Grafana workspace. If you have an existing workspace, create an environment variable as described below. To create a new workspace, visit [our supporting example for Grafana](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/) + +!!! note +For the URL `https://g-xyz.grafana-workspace.us-east-1.amazonaws.com`, the workspace ID would be `g-xyz` + +```bash +export AWS_REGION= +export COA_AMG_WORKSPACE_ID=g-xxx +export COA_AMG_ENDPOINT_URL=https://g-xyz.grafana-workspace.us-east-1.amazonaws.com +``` + +!!! warning +Setting up environment variables `COA_AMG_ENDPOINT_URL` and `AWS_REGION` is mandatory for successful execution of this pattern. + +4. GRAFANA API KEY: Amazon Managed Grafana provides a control plane API for generating Grafana API keys. + +```bash +export AMG_API_KEY=$(aws grafana create-workspace-api-key \ + --key-name "grafana-operator-key" \ + --key-role "ADMIN" \ + --seconds-to-live 432000 \ + --workspace-id $COA_AMG_WORKSPACE_ID \ + --query key \ + --output text) +``` + +5. AWS SSM Parameter Store for GRAFANA API KEY: Update the Grafana API key secret in AWS SSM Parameter Store using the above new Grafana API key. This will be referenced by Grafana Operator deployment of our solution to access Amazon Managed Grafana from Amazon EKS Cluster + +```bash +aws ssm put-parameter --name "/cdk-accelerator/grafana-api-key" \ + --type "SecureString" \ + --value $AMG_API_KEY \ + --region $AWS_REGION +``` + +6. Install project dependencies by running `npm install` in the main folder of this cloned repository. + +7. The actual settings for dashboard urls are expected to be specified in the CDK context. Generically it is inside the cdk.json file of the current directory or in `~/.cdk.json` in your home directory. + +Example settings: Update the context in `cdk.json` file located in `cdk-eks-blueprints-patterns` directory + +```typescript + "context": { + "fluxRepository": { + "name": "grafana-dashboards", + "namespace": "grafana-operator", + "repository": { + "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator", + "name": "grafana-dashboards", + "targetRevision": "main", + "path": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + "values": { + "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json", + "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json", + "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json", + "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json", + "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json", + "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json", + "GRAFANA_WORKLOADS_API_BASIC_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-basic.json", + "GRAFANA_WORKLOADS_API_ADVANCED_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-advanced.json", + "GRAFANA_WORKLOADS_API_TROUBLESHOOTING_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-troubleshooting.json" + }, + "kustomizations": [ + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/apiserver" + } + ] + }, + "apiserver.pattern.enabled": true + } +``` + +8. Once all pre-requisites are set you are ready to deploy the pipeline. Run the following command from the root of this repository to deploy the pipeline stack: + +```bash +make build +make pattern existing-eks-opensource-observability deploy +``` + + +## Visualization + +Login to your Grafana workspace and navigate to the Dashboards panel. You should see three new dashboard named `Kubernetes/Kube-apiserver (basic), Kubernetes/Kube-apiserver (advanced), Kubernetes/Kube-apiserver (troubleshooting)`, under `Observability Accelerator Dashboards`: + +![Dashboard](../images/all-dashboards-apiserver.png) + +Open the `Kubernetes/Kube-apiserver (basic)` dashboard and you should be able to view its visualization as shown below: + +![NodeExporter_Dashboard](../images/apiserver-basic.png) + +Open the `Kubernetes/Kube-apiserver (advanced)` dashboard and you should be able to view its visualization as shown below: + +![NodeExporter_Dashboard](../images/apiserver-advanced.png) + +Open the `Kubernetes/Kube-apiserver (troubleshooting)` dashboard and you should be able to view its visualization as shown below: + +![NodeExporter_Dashboard](../images/apiserver-troubleshooting.png) + + +## Verify the resources + +Please see [Single New Nginx Observability Accelerator](../single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md). + +## Teardown + +You can teardown the whole CDK stack with the following command: + +```bash +make pattern existing-eks-opensource-observability destroy +``` + +If you setup your cluster with Single New EKS Cluster Observability Accelerator, you also need to run: + +```bash +make pattern single-new-eks-cluster destroy +``` diff --git a/docs/patterns/existing-eks-observability-accelerators/existing-eks-opensource-observability.md b/docs/patterns/existing-eks-observability-accelerators/existing-eks-opensource-observability.md index edb8bce7..fd935876 100644 --- a/docs/patterns/existing-eks-observability-accelerators/existing-eks-opensource-observability.md +++ b/docs/patterns/existing-eks-observability-accelerators/existing-eks-opensource-observability.md @@ -160,6 +160,48 @@ If you need Java observability you can instead use: } ``` +If you want to deploy API Server dashboards along with Java observability you can instead use: + +```typescript + "context": { + "fluxRepository": { + "name": "grafana-dashboards", + "namespace": "grafana-operator", + "repository": { + "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator", + "name": "grafana-dashboards", + "targetRevision": "main", + "path": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + "values": { + "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json", + "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json", + "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json", + "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json", + "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json", + "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json", + "GRAFANA_JAVA_JMX_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/java/default.json", + "GRAFANA_APISERVER_BASIC_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-basic.json", + "GRAFANA_APISERVER_ADVANCED_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-advanced.json", + "GRAFANA_APISERVER_TROUBLESHOOTING_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-troubleshooting.json" + }, + "kustomizations": [ + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/java" + } + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/apiserver" + } + ] + }, + "java.pattern.enabled": true, + "apiserver.pattern.enabled": true + } +``` + 8. Once all pre-requisites are set you are ready to deploy the pipeline. Run the following command from the root of this repository to deploy the pipeline stack: ```bash diff --git a/docs/patterns/images/all-dashboards-apiserver.png b/docs/patterns/images/all-dashboards-apiserver.png new file mode 100644 index 00000000..68850a6a Binary files /dev/null and b/docs/patterns/images/all-dashboards-apiserver.png differ diff --git a/docs/patterns/images/apiserver-advanced.png b/docs/patterns/images/apiserver-advanced.png new file mode 100644 index 00000000..e29b7a4e Binary files /dev/null and b/docs/patterns/images/apiserver-advanced.png differ diff --git a/docs/patterns/images/apiserver-basic.png b/docs/patterns/images/apiserver-basic.png new file mode 100644 index 00000000..4a954e2a Binary files /dev/null and b/docs/patterns/images/apiserver-basic.png differ diff --git a/docs/patterns/images/apiserver-troubleshooting.png b/docs/patterns/images/apiserver-troubleshooting.png new file mode 100644 index 00000000..e686af4f Binary files /dev/null and b/docs/patterns/images/apiserver-troubleshooting.png differ diff --git a/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-apiserver-opensource-observability.md b/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-apiserver-opensource-observability.md new file mode 100644 index 00000000..b10239f2 --- /dev/null +++ b/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-apiserver-opensource-observability.md @@ -0,0 +1,79 @@ +# Single New EKS Cluster Open Source Observability Accelerator - API Server monitoring + +## Objective + +This pattern demonstrates how to use the _New EKS Cluster Open Source Observability Accelerator_ with API Server monitoring. + +## Prerequisites + +Ensure that you have installed the following tools on your machine. + +1. [aws cli](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) +2. [kubectl](https://Kubernetes.io/docs/tasks/tools/) +3. [cdk](https://docs.aws.amazon.com/cdk/v2/guide/getting_started.html#getting_started_install) +4. [npm](https://docs.npmjs.com/cli/v8/commands/npm-install) + +## Deploying + +Please follow the _Deploying_ instructions of the [New EKS Cluster Open Source Observability Accelerator](./single-new-eks-opensource-observability.md) pattern, except for step 7, where you need to replace "context" in `~/.cdk.json` with the following: + +```typescript + "context": { + "fluxRepository": { + "name": "grafana-dashboards", + "namespace": "grafana-operator", + "repository": { + "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator", + "name": "grafana-dashboards", + "targetRevision": "main", + "path": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + "values": { + "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json", + "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json", + "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json", + "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json", + "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json", + "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json", + "GRAFANA_WORKLOADS_API_BASIC_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-basic.json", + "GRAFANA_WORKLOADS_API_ADVANCED_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-advanced.json", + "GRAFANA_WORKLOADS_API_TROUBLESHOOTING_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-troubleshooting.json" + }, + "kustomizations": [ + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/apiserver" + } + ] + }, + "apiserver.pattern.enabled": true, + } +``` + +## Visualization + +Login to your Grafana workspace and navigate to the Dashboards panel. You should see three new dashboard named `Kubernetes/Kube-apiserver (basic), Kubernetes/Kube-apiserver (advanced), Kubernetes/Kube-apiserver (troubleshooting)`, under `Observability Accelerator Dashboards`: + +![Dashboard](../images/all-dashboards-apiserver.png) + +Open the `Kubernetes/Kube-apiserver (basic)` dashboard and you should be able to view its visualization as shown below: + +![NodeExporter_Dashboard](../images/apiserver-basic.png) + +Open the `Kubernetes/Kube-apiserver (advanced)` dashboard and you should be able to view its visualization as shown below: + +![NodeExporter_Dashboard](../images/apiserver-advanced.png) + +Open the `Kubernetes/Kube-apiserver (troubleshooting)` dashboard and you should be able to view its visualization as shown below: + +![NodeExporter_Dashboard](../images/apiserver-troubleshooting.png) + +## Teardown + +You can teardown the whole CDK stack with the following command: + +```bash +make pattern single-new-eks-opensource-observability destroy +``` \ No newline at end of file diff --git a/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-graviton-opensource-observability.md b/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-graviton-opensource-observability.md index 38d22e1c..38d26260 100644 --- a/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-graviton-opensource-observability.md +++ b/docs/patterns/single-new-eks-observability-accelerators/single-new-eks-graviton-opensource-observability.md @@ -159,6 +159,48 @@ If you need Java observability you can instead use: } ``` +If you want to deploy API Server dashboards along with Java observability you can instead use: + +```typescript + "context": { + "fluxRepository": { + "name": "grafana-dashboards", + "namespace": "grafana-operator", + "repository": { + "repoUrl": "https://github.com/aws-observability/aws-observability-accelerator", + "name": "grafana-dashboards", + "targetRevision": "main", + "path": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + "values": { + "GRAFANA_CLUSTER_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/cluster.json", + "GRAFANA_KUBELET_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/kubelet.json", + "GRAFANA_NSWRKLDS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/namespace-workloads.json", + "GRAFANA_NODEEXP_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodeexporter-nodes.json", + "GRAFANA_NODES_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/nodes.json", + "GRAFANA_WORKLOADS_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/infrastructure/workloads.json", + "GRAFANA_JAVA_JMX_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/java/default.json", + "GRAFANA_APISERVER_BASIC_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-basic.json", + "GRAFANA_APISERVER_ADVANCED_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-advanced.json", + "GRAFANA_APISERVER_TROUBLESHOOTING_DASH_URL" : "https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/eks/apiserver/apiserver-troubleshooting.json" + }, + "kustomizations": [ + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/infrastructure" + }, + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/java" + } + { + "kustomizationPath": "./artifacts/grafana-operator-manifests/eks/apiserver" + } + ] + }, + "java.pattern.enabled": true, + "apiserver.pattern.enabled": true, + } +``` + 8. Once all pre-requisites are set you are ready to deploy the pipeline. Run the following command from the root of this repository to deploy the pipeline stack: ```bash diff --git a/lib/common/resources/amp-config/apiserver/recording-rules.yml b/lib/common/resources/amp-config/apiserver/recording-rules.yml new file mode 100644 index 00000000..ccbb028a --- /dev/null +++ b/lib/common/resources/amp-config/apiserver/recording-rules.yml @@ -0,0 +1,115 @@ +groups: + - name: apiserver-monitoring + rules: + - expr: sum by (cluster, code, verb) (increase(apiserver_request_total{job="apiserver",verb=~"LIST|GET|POST|PUT|PATCH|DELETE",code=~"2.."}[1h])) + record: code_verb:apiserver_request_total:increase1h + - expr: sum by (cluster, code, verb) (increase(apiserver_request_total{job="apiserver",verb=~"LIST|GET|POST|PUT|PATCH|DELETE",code=~"3.."}[1h])) + record: code_verb:apiserver_request_total:increase1h + - expr: sum by (cluster, code, verb) (increase(apiserver_request_total{job="apiserver",verb=~"LIST|GET|POST|PUT|PATCH|DELETE",code=~"4.."}[1h])) + record: code_verb:apiserver_request_total:increase1h + - expr: sum by (cluster, code, verb) (increase(apiserver_request_total{job="apiserver",verb=~"LIST|GET|POST|PUT|PATCH|DELETE",code=~"5.."}[1h])) + record: code_verb:apiserver_request_total:increase1h + - expr: sum by (cluster,code,resource) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET"}[5m])) + labels: + verb: read + record: code_resource:apiserver_request_total:rate5m + - expr: sum by (cluster,code,resource) (rate(apiserver_request_total{job="apiserver",verb=~"POST|PUT|PATCH|DELETE"}[5m])) + labels: + verb: write + record: code_resource:apiserver_request_total:rate5m + - expr: sum by (cluster, verb, scope, le) (increase(apiserver_request_slo_duration_seconds_bucket[1h])) + record: cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase1h + - expr: sum by (cluster, verb, scope, le) (avg_over_time(cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase1h[30d]) + * 24 * 30) + record: cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d + - expr: |- + 1 - ( + ( + # write too slow + sum by (cluster) (cluster_verb_scope:apiserver_request_slo_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"}) + - + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"}) + ) + + ( + # read too slow + sum by (cluster) (cluster_verb_scope:apiserver_request_slo_duration_seconds_count:increase30d{verb=~"LIST|GET"}) + - + ( + ( + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"}) + or + vector(0) + ) + + + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"}) + + + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"}) + ) + ) + + # errors + sum by (cluster) (code:apiserver_request_total:increase30d{code=~"5.."} or vector(0)) + ) + / + sum by (cluster) (code:apiserver_request_total:increase30d) + labels: + verb: all + record: apiserver_request:availability30d + - expr: |- + 1 - ( + sum by (cluster) (cluster_verb_scope:apiserver_request_slo_duration_seconds_count:increase30d{verb=~"LIST|GET"}) + - + ( + # too slow + ( + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"}) + or + vector(0) + ) + + + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"}) + + + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"}) + ) + + + # errors + sum by (cluster) (code:apiserver_request_total:increase30d{verb="read",code=~"5.."} or vector(0)) + ) + / + sum by (cluster) (code:apiserver_request_total:increase30d{verb="read"}) + labels: + verb: read + record: apiserver_request:availability30d + - expr: |- + 1 - ( + ( + # too slow + sum by (cluster) (cluster_verb_scope:apiserver_request_slo_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"}) + - + sum by (cluster) (cluster_verb_scope_le:apiserver_request_slo_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"}) + ) + + + # errors + sum by (cluster) (code:apiserver_request_total:increase30d{verb="write",code=~"5.."} or vector(0)) + ) + / + sum by (cluster) (code:apiserver_request_total:increase30d{verb="write"}) + labels: + verb: write + record: apiserver_request:availability30d + - expr: histogram_quantile(0.99, sum by (cluster, le, resource) (rate(apiserver_request_slo_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[5m]))) + > 0 + labels: + quantile: "0.99" + verb: read + record: cluster_quantile:apiserver_request_slo_duration_seconds:histogram_quantile + - expr: histogram_quantile(0.99, sum by (cluster, le, resource) (rate(apiserver_request_slo_duration_seconds_bucket{job="apiserver",verb=~"POST|PUT|PATCH|DELETE",subresource!~"proxy|attach|log|exec|portforward"}[5m]))) + > 0 + labels: + quantile: "0.99" + verb: write + record: cluster_quantile:apiserver_request_slo_duration_seconds:histogram_quantile + - expr: | + histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",subresource!="log",verb!~"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT"}[5m])) without(instance, pod)) + labels: + quantile: "0.9" + record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile \ No newline at end of file diff --git a/lib/common/resources/otel-collector-config.yml b/lib/common/resources/otel-collector-config.yml index 8f0b6d55..83542ab9 100644 --- a/lib/common/resources/otel-collector-config.yml +++ b/lib/common/resources/otel-collector-config.yml @@ -55,6 +55,33 @@ spec: regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/$${1}/proxy/metrics/cadvisor + {{ if enableAPIserverJob }} + - job_name: 'apiserver' + scheme: https + tls_config: + ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + insecure_skip_verify: true + bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token + kubernetes_sd_configs: + - role: endpoints + relabel_configs: + - source_labels: + [ + __meta_kubernetes_namespace, + __meta_kubernetes_service_name, + __meta_kubernetes_endpoint_port_name, + ] + action: keep + regex: default;kubernetes;https + metric_relabel_configs: + - action: keep + source_labels: [__name__] + - source_labels: [__name__, le] + separator: ; + regex: apiserver_request_duration_seconds_bucket;(0.15|0.2|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2|3|3.5|4|4.5|6|7|8|9|15|25|40|50) + replacement: $1 + action: drop + {{ end }} - job_name: serviceMonitor/default/kube-prometheus-stack-prometheus-node-exporter/0 honor_timestamps: true scrape_interval: 30s diff --git a/lib/existing-eks-opensource-observability-pattern/index.ts b/lib/existing-eks-opensource-observability-pattern/index.ts index 03563eff..480f082d 100644 --- a/lib/existing-eks-opensource-observability-pattern/index.ts +++ b/lib/existing-eks-opensource-observability-pattern/index.ts @@ -6,6 +6,7 @@ import * as amp from 'aws-cdk-lib/aws-aps'; import { ObservabilityBuilder } from '@aws-quickstart/eks-blueprints'; import * as cdk from "aws-cdk-lib"; import * as eks from 'aws-cdk-lib/aws-eks'; +import * as fs from 'fs'; export default class ExistingEksOpenSourceobservabilityPattern { async buildAsync(scope: cdk.App, id: string) { @@ -56,9 +57,21 @@ export default class ExistingEksOpenSourceobservabilityPattern { } }; + const jsonString = fs.readFileSync(__dirname + '/../../cdk.json', 'utf-8'); + const jsonStringnew = JSON.parse(jsonString); + let doc = utils.readYamlDocument(__dirname + '/../common/resources/otel-collector-config.yml'); + doc = utils.changeTextBetweenTokens( + doc, + "{{ if enableAPIserverJob }}", + "{{ end }}", + jsonStringnew.context["apiserver.pattern.enabled"] + ); + console.log(doc); + fs.writeFileSync(__dirname + '/../common/resources/otel-collector-config-new.yml', doc); + if (utils.valueFromContext(scope, "java.pattern.enabled", false)) { ampAddOnProps.openTelemetryCollector = { - manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestPath: __dirname + '/../common/resources/otel-collector-config-new.yml', manifestParameterMap: { javaScrapeSampleLimit: 1000, javaPrometheusMetricsEndpoint: "/metrics" @@ -70,9 +83,16 @@ export default class ExistingEksOpenSourceobservabilityPattern { ); } + if (utils.valueFromContext(scope, "apiserver.pattern.enabled", false)) { + ampAddOnProps.enableAPIServerJob = true, + ampAddOnProps.ampRules?.ruleFilePaths.push( + __dirname + '/../common/resources/amp-config/apiserver/recording-rules.yml' + ); + } + if (utils.valueFromContext(scope, "nginx.pattern.enabled", false)) { ampAddOnProps.openTelemetryCollector = { - manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestPath: __dirname + '/../common/resources/otel-collector-config-new.yml', manifestParameterMap: { javaScrapeSampleLimit: 1000, javaPrometheusMetricsEndpoint: "/metrics" diff --git a/lib/single-new-eks-opensource-observability-pattern/graviton-index.ts b/lib/single-new-eks-opensource-observability-pattern/graviton-index.ts index 1969b1e9..8bf7d8b1 100644 --- a/lib/single-new-eks-opensource-observability-pattern/graviton-index.ts +++ b/lib/single-new-eks-opensource-observability-pattern/graviton-index.ts @@ -6,6 +6,7 @@ import * as amp from 'aws-cdk-lib/aws-aps'; import * as eks from 'aws-cdk-lib/aws-eks'; import * as ec2 from 'aws-cdk-lib/aws-ec2'; import { ObservabilityBuilder } from '@aws-quickstart/eks-blueprints'; +import * as fs from 'fs'; export default class SingleNewEksGravitonOpenSourceObservabilityPattern { constructor(scope: Construct, id: string) { @@ -37,9 +38,21 @@ export default class SingleNewEksGravitonOpenSourceObservabilityPattern { } }; + const jsonString = fs.readFileSync(__dirname + '/../../cdk.json', 'utf-8'); + const jsonStringnew = JSON.parse(jsonString); + let doc = utils.readYamlDocument(__dirname + '/../common/resources/otel-collector-config.yml'); + doc = utils.changeTextBetweenTokens( + doc, + "{{ if enableAPIserverJob }}", + "{{ end }}", + jsonStringnew.context["apiserver.pattern.enabled"] + ); + console.log(doc); + fs.writeFileSync(__dirname + '/../common/resources/otel-collector-config-new.yml', doc); + if (utils.valueFromContext(scope, "java.pattern.enabled", false)) { ampAddOnProps.openTelemetryCollector = { - manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestPath: __dirname + '/../common/resources/otel-collector-config-new.yml', manifestParameterMap: { javaScrapeSampleLimit: 1000, javaPrometheusMetricsEndpoint: "/metrics" @@ -51,6 +64,13 @@ export default class SingleNewEksGravitonOpenSourceObservabilityPattern { ); } + if (utils.valueFromContext(scope, "apiserver.pattern.enabled", false)) { + ampAddOnProps.enableAPIServerJob = true, + ampAddOnProps.ampRules?.ruleFilePaths.push( + __dirname + '/../common/resources/amp-config/apiserver/recording-rules.yml' + ); + } + Reflect.defineMetadata("ordered", true, blueprints.addons.GrafanaOperatorAddon); const addOns: Array = [ new blueprints.addons.CloudWatchLogsAddon({ diff --git a/lib/single-new-eks-opensource-observability-pattern/index.ts b/lib/single-new-eks-opensource-observability-pattern/index.ts index 7ce62951..c31ae87c 100644 --- a/lib/single-new-eks-opensource-observability-pattern/index.ts +++ b/lib/single-new-eks-opensource-observability-pattern/index.ts @@ -4,6 +4,7 @@ import * as blueprints from '@aws-quickstart/eks-blueprints'; import { GrafanaOperatorSecretAddon } from './grafanaoperatorsecretaddon'; import * as amp from 'aws-cdk-lib/aws-aps'; import { ObservabilityBuilder } from '@aws-quickstart/eks-blueprints'; +import * as fs from 'fs'; export default class SingleNewEksOpenSourceobservabilityPattern { constructor(scope: Construct, id: string) { @@ -38,9 +39,21 @@ export default class SingleNewEksOpenSourceobservabilityPattern { } }; + const jsonString = fs.readFileSync(__dirname + '/../../cdk.json', 'utf-8'); + const jsonStringnew = JSON.parse(jsonString); + let doc = utils.readYamlDocument(__dirname + '/../common/resources/otel-collector-config.yml'); + doc = utils.changeTextBetweenTokens( + doc, + "{{ if enableAPIserverJob }}", + "{{ end }}", + jsonStringnew.context["apiserver.pattern.enabled"] + ); + console.log(doc); + fs.writeFileSync(__dirname + '/../common/resources/otel-collector-config-new.yml', doc); + if (utils.valueFromContext(scope, "java.pattern.enabled", false)) { ampAddOnProps.openTelemetryCollector = { - manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestPath: __dirname + '/../common/resources/otel-collector-config-new.yml', manifestParameterMap: { javaScrapeSampleLimit: 1000, javaPrometheusMetricsEndpoint: "/metrics" @@ -52,9 +65,16 @@ export default class SingleNewEksOpenSourceobservabilityPattern { ); } + if (utils.valueFromContext(scope, "apiserver.pattern.enabled", false)) { + ampAddOnProps.enableAPIServerJob = true, + ampAddOnProps.ampRules?.ruleFilePaths.push( + __dirname + '/../common/resources/amp-config/apiserver/recording-rules.yml' + ); + } + if (utils.valueFromContext(scope, "nginx.pattern.enabled", false)) { ampAddOnProps.openTelemetryCollector = { - manifestPath: __dirname + '/../common/resources/otel-collector-config.yml', + manifestPath: __dirname + '/../common/resources/otel-collector-config-new.yml', manifestParameterMap: { javaScrapeSampleLimit: 1000, javaPrometheusMetricsEndpoint: "/metrics" diff --git a/mkdocs.yml b/mkdocs.yml index f4d5b42b..c7018ae7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -28,13 +28,15 @@ nav: - Existing Cluster: - AWS Native : patterns/existing-eks-observability-accelerators/existing-eks-awsnative-observability.md - OSS : patterns/existing-eks-observability-accelerators/existing-eks-opensource-observability.md - - OSS nginx Mon : patterns/existing-eks-observability-accelerators/existing-eks-nginx-observability.md + - OSS Nginx Mon : patterns/existing-eks-observability-accelerators/existing-eks-nginx-observability.md + - OSS Apiserver Mon : patterns/existing-eks-observability-accelerators/existing-eks-apiserver-observability.md - Mixed : patterns/existing-eks-observability-accelerators/existing-eks-mixed-observability.md - New Cluster: - AWS Native : patterns/single-new-eks-observability-accelerators/single-new-eks-awsnative-observability.md - OSS : patterns/single-new-eks-observability-accelerators/single-new-eks-opensource-observability.md - OSS Java Mon : patterns/single-new-eks-observability-accelerators/single-new-eks-java-opensource-observability.md - - OSS nginx Mon : patterns/single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md + - OSS Nginx Mon : patterns/single-new-eks-observability-accelerators/single-new-eks-nginx-opensource-observability.md + - OSS Apiserver Mon: patterns/single-new-eks-observability-accelerators/single-new-eks-apiserver-opensource-observability.md - Mixed : patterns/single-new-eks-observability-accelerators/single-new-eks-mixed-observability.md - Graviton OSS : patterns/single-new-eks-observability-accelerators/single-new-eks-graviton-opensource-observability.md - Logs: logs.md