Move all dashboards to GitOps (#175)

* Typo * Remove Grafana provider * Temp: move dashbaords to gitOps * Move external labels to resource attributes * Avoid DDoS with using 0.0.0.0 * Pre-commit * Transition in two steps Will need to remove provider in a separate version to provide a transition path as removing this will break terraform and leave orphans in the state * Move patterns' dashboards creation to gitOps Standardize config objects for patterns as well * Pre-commit * Create AMP dashboard from external source with Grafana provider * Fix deprecated option * Fix Flux requirements * Run pre-commit * Update example with operator * Cleanup examples * Update multicluster example * Update multicluster example * Drop dead variable * Update docs * Change GitOps branch name * Update docs * Replacing Secrets Manager to SSM to store Grafana API Key (#178) * Fixing SSM * Fixing SSM * Replacing Secrets Manager with SSM * Replacing Secrets Manager with SSM * Update architecture diagram * Update architecture diagram * Update README.md * Update index.md * Fixing Grafana Operator Version * Fix multicluster example * Update docs --------- Co-authored-by: Ela AWS <[email protected]> Co-authored-by: Elamaran Shanmugam <[email protected]>
aws-observability · Jun 12, 2023 · fa38a90 · fa38a90
1 parent c5e4c0c
commit fa38a90
Show file tree

Hide file tree

Showing 46 changed files with 361 additions and 4,460 deletions.
diff --git a/README.md b/README.md
@@ -17,7 +17,8 @@ your custom applications.
 You also can monitor your Amazon Managed Service for Prometheus workspaces ingestion,
 costs, active series with [this module](./modules/managed-prometheus-monitoring).
 
-<img width="1501" alt="image" src="docs/images/dark-o11y-accelerator-amp-xray.png">
+![image](https://github.com/aws-observability/terraform-aws-observability-accelerator/assets/10175027/e83f8709-f754-4192-90f2-e3de96d2e26c)
+
 
 ## Documentation
 
@@ -33,15 +34,6 @@ visit the [Amazon EKS cluster monitoring documentation](https://aws-observabilit
 The sections below demonstrate how you can leverage AWS Observability Accelerator
 to enable monitoring to an existing EKS cluster.
 
-### v2.x changes
-
-v2+ releases introduces couple of breaking changes compared to previous versions:
-
-- `modules/workloads/infra` module moves to `modules/eks-monitoring`
-- All EKS configuration options moves from the base  module to the `eks-monitoring` module
-- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visibility
-- All examples have been updated to reflect these changes
-- Introducing GitOps for Grafana contents (Dashboards, Folders and Data sources) with [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/)
 
 ### Base Module
 
@@ -161,14 +153,13 @@ If you are interested in contributing, see the
 | <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.1.0 |
 | <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 4.0.0 |
 | <a name="requirement_awscc"></a> [awscc](#requirement\_awscc) | >= 0.24.0 |
-| <a name="requirement_grafana"></a> [grafana](#requirement\_grafana) | 1.25.0 |
+| <a name="requirement_grafana"></a> [grafana](#requirement\_grafana) | >= 1.25.0 |
 
 ## Providers
 
 | Name | Version |
 |------|---------|
 | <a name="provider_aws"></a> [aws](#provider\_aws) | >= 4.0.0 |
-| <a name="provider_grafana"></a> [grafana](#provider\_grafana) | 1.25.0 |
 
 ## Modules
 
@@ -180,8 +171,6 @@ No modules.
 |------|------|
 | [aws_prometheus_alert_manager_definition.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_alert_manager_definition) | resource |
 | [aws_prometheus_workspace.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_workspace) | resource |
-| [grafana_data_source.amp](https://registry.terraform.io/providers/grafana/grafana/1.25.0/docs/resources/data_source) | resource |
-| [grafana_folder.this](https://registry.terraform.io/providers/grafana/grafana/1.25.0/docs/resources/folder) | resource |
 | [aws_grafana_workspace.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/grafana_workspace) | data source |
 | [aws_region.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/region) | data source |
 
@@ -190,12 +179,10 @@ No modules.
 | Name | Description | Type | Default | Required |
 |------|-------------|------|---------|:--------:|
 | <a name="input_aws_region"></a> [aws\_region](#input\_aws\_region) | AWS Region | `string` | n/a | yes |
-| <a name="input_create_dashboard_folder"></a> [create\_dashboard\_folder](#input\_create\_dashboard\_folder) | Boolean flag to enable Amazon Managed Grafana folder and dashboards | `bool` | `true` | no |
-| <a name="input_create_prometheus_data_source"></a> [create\_prometheus\_data\_source](#input\_create\_prometheus\_data\_source) | Boolean flag to enable Amazon Managed Grafana datasource | `bool` | `true` | no |
 | <a name="input_enable_alertmanager"></a> [enable\_alertmanager](#input\_enable\_alertmanager) | Creates Amazon Managed Service for Prometheus AlertManager for all workloads | `bool` | `false` | no |
 | <a name="input_enable_managed_prometheus"></a> [enable\_managed\_prometheus](#input\_enable\_managed\_prometheus) | Creates a new Amazon Managed Service for Prometheus Workspace | `bool` | `true` | no |
 | <a name="input_grafana_api_key"></a> [grafana\_api\_key](#input\_grafana\_api\_key) | Grafana API key for the Amazon Managed Grafana workspace | `string` | n/a | yes |
-| <a name="input_managed_grafana_workspace_id"></a> [managed\_grafana\_workspace\_id](#input\_managed\_grafana\_workspace\_id) | Amazon Managed Grafana Workspace ID | `string` | `""` | no |
+| <a name="input_managed_grafana_workspace_id"></a> [managed\_grafana\_workspace\_id](#input\_managed\_grafana\_workspace\_id) | Amazon Managed Grafana Workspace ID | `string` | n/a | yes |
 | <a name="input_managed_prometheus_workspace_id"></a> [managed\_prometheus\_workspace\_id](#input\_managed\_prometheus\_workspace\_id) | Amazon Managed Service for Prometheus Workspace ID | `string` | `""` | no |
 | <a name="input_managed_prometheus_workspace_region"></a> [managed\_prometheus\_workspace\_region](#input\_managed\_prometheus\_workspace\_region) | Region where Amazon Managed Service for Prometheus is deployed | `string` | `null` | no |
 | <a name="input_tags"></a> [tags](#input\_tags) | Additional tags (e.g. `map('BusinessUnit`,`XYZ`) | `map(string)` | `{}` | no |
@@ -205,15 +192,10 @@ No modules.
 | Name | Description |
 |------|-------------|
 | <a name="output_aws_region"></a> [aws\_region](#output\_aws\_region) | AWS Region |
-| <a name="output_grafana_dashboard_folder_created"></a> [grafana\_dashboard\_folder\_created](#output\_grafana\_dashboard\_folder\_created) | Boolean value indicating if the module created a dashboard folder in Amazon Managed Grafana |
-| <a name="output_grafana_dashboards_folder_id"></a> [grafana\_dashboards\_folder\_id](#output\_grafana\_dashboards\_folder\_id) | Grafana folder ID for automatic dashboards. Required by workload modules |
-| <a name="output_grafana_prometheus_datasource_test"></a> [grafana\_prometheus\_datasource\_test](#output\_grafana\_prometheus\_datasource\_test) | Grafana save & test URL for Amazon Managed Prometheus workspace |
 | <a name="output_managed_grafana_workspace_endpoint"></a> [managed\_grafana\_workspace\_endpoint](#output\_managed\_grafana\_workspace\_endpoint) | Amazon Managed Grafana workspace endpoint |
-| <a name="output_managed_grafana_workspace_id"></a> [managed\_grafana\_workspace\_id](#output\_managed\_grafana\_workspace\_id) | Amazon Managed Grafana workspace ID |
 | <a name="output_managed_prometheus_workspace_endpoint"></a> [managed\_prometheus\_workspace\_endpoint](#output\_managed\_prometheus\_workspace\_endpoint) | Amazon Managed Prometheus workspace endpoint |
 | <a name="output_managed_prometheus_workspace_id"></a> [managed\_prometheus\_workspace\_id](#output\_managed\_prometheus\_workspace\_id) | Amazon Managed Prometheus workspace ID |
 | <a name="output_managed_prometheus_workspace_region"></a> [managed\_prometheus\_workspace\_region](#output\_managed\_prometheus\_workspace\_region) | Amazon Managed Prometheus workspace region |
-| <a name="output_prometheus_data_source_created"></a> [prometheus\_data\_source\_created](#output\_prometheus\_data\_source\_created) | Boolean value indicating if the module created a prometheus data source in Amazon Managed Grafana |
 <!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
 
 ## Contributing

diff --git a/docs/concepts.md b/docs/concepts.md
@@ -39,22 +39,24 @@ The grafana-operator is a Kubernetes operator built to help you manage your Graf
 
 GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux  is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.
 
-We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.
+We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using Flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.
 
-We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator/tree/main/artifacts/grafana-operator-manifests). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.
+We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests/*` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.
 
 
 
-## v2.x changes
+## Release notes
 
-v2.x [releases](https://github.com/aws-observability/terraform-aws-observability-accelerator/releases) introduce
-couple of breaking changes compared to previous versions:
+We encourage you to use our [release versions](https://github.com/aws-observability/terraform-aws-observability-accelerator/releases)
+as much as possible to avoid breaking changes when deploying Terraform modules. You can
+read also our change log on the releases page. Here's an example of using a fixed version:
+
+```hcl
+module "eks_monitoring" {
+    source = "github.com/aws-observability/terraform-aws-observability-accelerator//modules/managed-prometheus-monitoring?ref=v2.5.0"
+}
+```
 
-- `modules/workloads/infra` module moves to `modules/eks-monitoring`
-- EKS configuration options moves from the base  module to the `eks-monitoring` module
-- EKS workload modules **java,nginx** merge into `eks-monitoring` as configuration options (patterns),
-see [examples](https://github.com/aws-observability/terraform-aws-observability-accelerator/tree/main/examples)
-- Examples have been updated to reflect these changes
 
 ## Base module
 
@@ -138,4 +140,4 @@ classDiagram
 
 If you are new to AWS Observability services, or want to dive deeper into them,
 check our [One Observability Workshop](https://catalog.workshops.aws/observability/)
-for a hands-on experience in a self-paced environement or at an AWS venue.
+for a hands-on experience in a self-paced environment or at an AWS venue.
diff --git a/docs/eks/index.md b/docs/eks/index.md
@@ -111,25 +111,40 @@ terraform apply
 
 ## Visualization
 
-#### 1. Prometheus data source on Grafana
 
-Make sure to open the link in the output. After a successful deployment, this will open
-the Prometheus data source configuration on Grafana.
-Click `Save & test` and you should see a notification confirming that the Amazon Managed Service for Prometheus workspace is ready to be used on Grafana.
+#### 1. Grafana dashboards
 
-```bash
-terraform output grafana_prometheus_datasource_test
-```
+Login to your Grafana workspace and navigate to the Dashboards panel. You should see a list of dashboards under the `Observability Accelerator Dashboards`
+<img width="1540" alt="image" src="https://user-images.githubusercontent.com/10175027/190000716-29e16698-7c90-49d6-8c37-79ca1790e2cc.png">
 
-#### 2. Grafana dashboards
+Open a specific dashboard and you should be able to view its visualization
+<img width="2056" alt="cluster headlines" src="https://user-images.githubusercontent.com/10175027/199110753-9bc7a9b7-1b45-4598-89d3-32980154080e.png">
 
-Go to the Dashboards panel of your Grafana workspace. You should see a list of dashboards under the `Observability Accelerator Dashboards`
+With v2.5 and above, the dashboards are managed with a Grafana Operator running in your cluster.
+From the cluster to view all dashboards as Kubernetes objects, run
 
-<img width="1540" alt="image" src="https://user-images.githubusercontent.com/10175027/190000716-29e16698-7c90-49d6-8c37-79ca1790e2cc.png">
+```console
+kubectl get grafanadashboards -A
+NAMESPACE          NAME                                   AGE
+grafana-operator   cluster-grafanadashboard               138m
+grafana-operator   java-grafanadashboard                  143m
+grafana-operator   kubelet-grafanadashboard               13h
+grafana-operator   namespace-workloads-grafanadashboard   13h
+grafana-operator   nginx-grafanadashboard                 134m
+grafana-operator   node-exporter-grafanadashboard         13h
+grafana-operator   nodes-grafanadashboard                 13h
+grafana-operator   workloads-grafanadashboard             13h
+```
 
-Open a specific dashboard and you should be able to view its visualization
+You can inspect more details per dashboard using this command
+
+```console
+kubectl describe grafanadashboards cluster-grafanadashboard -n grafana-operator
+```
+
+Grafana Operator and Flux always work together to synchronize your dashboards with Git.
+If you delete your dashboards by accident, they will be re-provisioned automatically.
 
-<img width="2056" alt="cluster headlines" src="https://user-images.githubusercontent.com/10175027/199110753-9bc7a9b7-1b45-4598-89d3-32980154080e.png">
 
 #### 3. Amazon Managed Service for Prometheus rules and alerts
 
@@ -216,19 +231,20 @@ export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
   --output text)
 ```
 
-- Next, lets grab the Grafana API key secret name from AWS Secrets Manager. The keyname should start with `terraform-..`
+- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:
 
 ```bash
-aws secretsmanager list-secrets
+aws aws ssm put-parameter \
+    --name "/terraform-accelerator/grafana-api-key" \
+    --type "SecureString" \
+    --value "{\"GF_SECURITY_ADMIN_APIKEY\": \"${GO_AMG_API_KEY}\"}" \
+    --region <Your AWS Region>
 ```
 
-- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:
+- If the issue persists, you can force the synchronization by deleting the `externalsecret` Kubernetes object.
 
 ```bash
-aws secretsmanager update-secret \
-    --secret-id  <Your Secret Name> \
-    --secret-string "{\"GF_SECURITY_ADMIN_APIKEY\": \"${GO_AMG_API_KEY}\"}" \
-    --region <Your AWS Region>
+kubectl delete externalsecret/external-secrets-sm -n grafana-operator
 ```
 
 ### 2. Upgrade from 2.1.0 or earlier

diff --git a/docs/eks/java.md b/docs/eks/java.md
@@ -32,7 +32,7 @@ Make sure to refresh your temporary Grafana API key
 
 ```bash
 export TF_VAR_managed_grafana_workspace_id=g-xxx
-export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 1200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
+export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 7200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
 ```
 
 ## Deploy

diff --git a/docs/eks/multicluster.md b/docs/eks/multicluster.md
@@ -11,7 +11,7 @@ Using the example [eks-cluster-with-vpc](https://aws-observability.github.io/ter
    1. `eks-cluster-1`
    2. `eks-cluster-2`
 
-#### 2. Amazon Managed Serivce for Prometheus (AMP) workspace
+#### 2. Amazon Managed Service for Prometheus (AMP) workspace
 
 We recommend that you create a new AMP workspace. To do that you can run the following command.
 
@@ -48,7 +48,7 @@ Ensure you have the following necessary IAM permissions
 * `grafana.DeleteWorkspaceApiKey`
 
 ```sh
-export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 1200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
+export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 7200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
 ```
 
 ## Setup
@@ -70,8 +70,8 @@ Verify by looking at the file `variables.tf` that there are two EKS clusters tar
 
 The difference in deployment between these clusters is that Terraform, when setting up the EKS cluster behind variable `eks_cluster_1_id` for observability, also sets up:
 
-* Dashboard folder and files in `AMG`
-* Prometheus and Java, alerting and recording rules in `AMP`
+* Dashboard folder and files in Amazon Managed Grafana
+* Prometheus and Java, alerting and recording rules in Amazon Managed Service for Prometheus
 
 !!! warning
     To override the defaults, create a `terraform.tfvars` and change the default values of the variables.