Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GitOps with Grafana Operator to create Grafana Dashboard with Flux. #164

Merged
merged 23 commits into from
Jun 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
73acbd9
Grafana With GitOps Feature
elamaran11 May 25, 2023
d5b912c
Grafana With GitOps Feature
elamaran11 May 25, 2023
baff02e
Grafana With GitOps Feature
elamaran11 May 25, 2023
1b1263b
Merge branch 'main' into feature/gitopsWithGrafana
bonclay7 May 31, 2023
a6e3566
Merge branch 'main' into feature/gitopsWithGrafana
bonclay7 Jun 1, 2023
3b1a71b
Fix setup logs retention policy (#169)
bonclay7 Jun 2, 2023
13835ec
Fixing GitOps Repo
elamaran11 Jun 2, 2023
2b86ee9
Merge branch 'main' into feature/gitopsWithGrafana
bonclay7 Jun 2, 2023
28cc0ba
Merge branch 'main' into feature/gitopsWithGrafana
bonclay7 Jun 2, 2023
0d26b54
Commenting out the NodeExp Dash
elamaran11 Jun 2, 2023
1a43621
Commenting out the NodeExp Dash
elamaran11 Jun 2, 2023
8f78879
Adding all Grafana Dashboards
elamaran11 Jun 2, 2023
e1569bf
Adding all Grafana Dashboards
elamaran11 Jun 2, 2023
8d44533
Fixing Grafana Operator Version and cleaning full boards
elamaran11 Jun 3, 2023
db631aa
Fixing Grafana Operator Version and cleaning full boards
elamaran11 Jun 3, 2023
9d6ae4d
Fixing Grafana Operator Version and cleaning full boards
elamaran11 Jun 3, 2023
e17d80c
Fixing Grafana Operator Version and cleaning full boards
elamaran11 Jun 3, 2023
03d3b59
Fixing Grafana Operator Version and cleaning full boards and PR Issues
elamaran11 Jun 3, 2023
f00aca7
Fixing Grafana Operator Version and cleaning full boards and PR Issues
elamaran11 Jun 3, 2023
6839ecc
Fixing Grafana Operator Version and cleaning full boards and PR Issues
elamaran11 Jun 3, 2023
442594b
Fixing Grafana Operator Version and cleaning full boards and PR Issues
elamaran11 Jun 3, 2023
606c275
Fixing Grafana Operator Version and cleaning full boards and PR Issues
elamaran11 Jun 3, 2023
491cea1
Fixing Grafana Operator Version and cleaning full boards and PR Issues
elamaran11 Jun 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions examples/eks-multicluster/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ module "eks_cluster_1_monitoring" {
managed_prometheus_workspace_id = module.aws_observability_accelerator.managed_prometheus_workspace_id
managed_prometheus_workspace_endpoint = module.aws_observability_accelerator.managed_prometheus_workspace_endpoint
managed_prometheus_workspace_region = module.aws_observability_accelerator.managed_prometheus_workspace_region
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

java_config = {
enable_alerting_rules = true
Expand Down Expand Up @@ -76,6 +77,7 @@ module "eks_cluster_2_monitoring" {
managed_prometheus_workspace_id = module.aws_observability_accelerator.managed_prometheus_workspace_id
managed_prometheus_workspace_endpoint = module.aws_observability_accelerator.managed_prometheus_workspace_endpoint
managed_prometheus_workspace_region = module.aws_observability_accelerator.managed_prometheus_workspace_region
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

java_config = {
enable_alerting_rules = false # addressed while setting up the eks_cluster_1 EKS cluster
Expand Down
1 change: 1 addition & 0 deletions examples/existing-cluster-java/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ module "eks_monitoring" {
grafana_api_key = var.grafana_api_key
target_secret_name = "grafana-admin-credentials"
target_secret_namespace = "grafana-operator"
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

eks_cluster_id = var.eks_cluster_id

Expand Down
1 change: 1 addition & 0 deletions examples/existing-cluster-nginx/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ module "eks_monitoring" {
grafana_api_key = var.grafana_api_key
target_secret_name = "grafana-admin-credentials"
target_secret_namespace = "grafana-operator"
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

# control the publishing of dashboards by specifying the boolean value for the variable 'enable_dashboards', default is 'true'
# the intention to publish is overruled depending upon whether grafana dashboard folder is created by the observability accelerator
Expand Down
1 change: 1 addition & 0 deletions examples/existing-cluster-with-base-and-infra/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ module "eks_monitoring" {
grafana_api_key = var.grafana_api_key
target_secret_name = "grafana-admin-credentials"
target_secret_namespace = "grafana-operator"
grafana_url = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

# control the publishing of dashboards by specifying the boolean value for the variable 'enable_dashboards', default is 'true'
# the intention to publish is overruled depending upon whether grafana dashboard folder is created by the observability accelerator
Expand Down
88 changes: 80 additions & 8 deletions modules/eks-monitoring/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | >= 4.0.0 |
| <a name="provider_grafana"></a> [grafana](#provider\_grafana) | >= 1.25.0 |
| <a name="provider_helm"></a> [helm](#provider\_helm) | >= 2.4.1 |
| <a name="provider_kubectl"></a> [kubectl](#provider\_kubectl) | >= 1.14 |

## Modules

Expand All @@ -49,16 +49,12 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
|------|------|
| [aws_prometheus_rule_group_namespace.alerting_rules](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_rule_group_namespace) | resource |
| [aws_prometheus_rule_group_namespace.recording_rules](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/prometheus_rule_group_namespace) | resource |
| [grafana_dashboard.cluster](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
| [grafana_dashboard.kubelet](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
| [grafana_dashboard.nodeexp_nodes](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
| [grafana_dashboard.nodes](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
| [grafana_dashboard.nsworkload](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
| [grafana_dashboard.workloads](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/dashboard) | resource |
| [helm_release.fluxcd](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
| [helm_release.grafana_operator](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
| [helm_release.kube_state_metrics](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
| [helm_release.prometheus_node_exporter](https://registry.terraform.io/providers/hashicorp/helm/latest/docs/resources/release) | resource |
| [kubectl_manifest.flux_gitrepository](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
| [kubectl_manifest.flux_kustomization](https://registry.terraform.io/providers/gavinbunney/kubectl/latest/docs/resources/manifest) | resource |
| [aws_caller_identity.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_eks_cluster.eks_cluster](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/eks_cluster) | data source |
| [aws_partition.current](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/partition) | data source |
Expand Down Expand Up @@ -87,8 +83,19 @@ This module makes use of the open source [kube-prometheus-stack](https://github.
| <a name="input_enable_recording_rules"></a> [enable\_recording\_rules](#input\_enable\_recording\_rules) | Enables or disables Managed Prometheus recording rules | `bool` | `true` | no |
| <a name="input_enable_tracing"></a> [enable\_tracing](#input\_enable\_tracing) | (Experimental) Enables tracing with AWS X-Ray. This changes the deploy mode of the collector to daemon set. Requirement: adot add-on <= 0.58-build.0 | `bool` | `false` | no |
| <a name="input_flux_config"></a> [flux\_config](#input\_flux\_config) | FluxCD configuration | <pre>object({<br> create_namespace = bool<br> k8s_namespace = string<br> helm_chart_name = string<br> helm_chart_version = string<br> helm_release_name = string<br> helm_repo_url = string<br> helm_settings = map(string)<br> helm_values = map(any)<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart_name": "flux2",<br> "helm_chart_version": "2.7.0",<br> "helm_release_name": "observability-fluxcd-addon",<br> "helm_repo_url": "https://fluxcd-community.github.io/helm-charts",<br> "helm_settings": {},<br> "helm_values": {},<br> "k8s_namespace": "flux-system"<br>}</pre> | no |
| <a name="input_flux_gitrepository_branch"></a> [flux\_gitrepository\_branch](#input\_flux\_gitrepository\_branch) | Flux GitRepository Branch | `string` | `"feature/allDashboards"` | no |
| <a name="input_flux_gitrepository_url"></a> [flux\_gitrepository\_url](#input\_flux\_gitrepository\_url) | Flux GitRepository URL | `string` | `"https://github.com/aws-observability/aws-observability-accelerator"` | no |
| <a name="input_flux_kustomization_path"></a> [flux\_kustomization\_path](#input\_flux\_kustomization\_path) | Flux Kustomization Path | `string` | `"./artifacts/grafana-operator-manifests"` | no |
| <a name="input_flux_name"></a> [flux\_name](#input\_flux\_name) | Flux GitRepository and Kustomization Name | `string` | `"grafana-dashboards"` | no |
| <a name="input_go_config"></a> [go\_config](#input\_go\_config) | Grafana Operator configuration | <pre>object({<br> create_namespace = bool<br> helm_chart = string<br> helm_name = string<br> k8s_namespace = string<br> helm_release_name = string<br> helm_chart_version = string<br> })</pre> | <pre>{<br> "create_namespace": true,<br> "helm_chart": "oci://ghcr.io/grafana-operator/helm-charts/grafana-operator",<br> "helm_chart_version": "v5.0.0-rc1",<br> "helm_name": "grafana-operator",<br> "helm_release_name": "grafana-operator",<br> "k8s_namespace": "grafana-operator"<br>}</pre> | no |
| <a name="input_grafana_api_key"></a> [grafana\_api\_key](#input\_grafana\_api\_key) | Grafana API key for the Amazon Managed Grafana workspace | `string` | n/a | yes |
| <a name="input_grafana_cluster_dashboard_url"></a> [grafana\_cluster\_dashboard\_url](#input\_grafana\_cluster\_dashboard\_url) | Dashboard URL for Cluster Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/cluster.json"` | no |
| <a name="input_grafana_kubelet_dashboard_url"></a> [grafana\_kubelet\_dashboard\_url](#input\_grafana\_kubelet\_dashboard\_url) | Dashboard URL for Kubelet Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/kubelet.json"` | no |
| <a name="input_grafana_namespace_workloads_dashboard_url"></a> [grafana\_namespace\_workloads\_dashboard\_url](#input\_grafana\_namespace\_workloads\_dashboard\_url) | Dashboard URL for Namespace Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/namespace-workloads.json"` | no |
| <a name="input_grafana_node_exporter_dashboard_url"></a> [grafana\_node\_exporter\_dashboard\_url](#input\_grafana\_node\_exporter\_dashboard\_url) | Dashboard URL for Node Exporter Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/main/artifacts/grafana-dashboards/nodeexporter-nodes.json"` | no |
| <a name="input_grafana_nodes_dashboard_url"></a> [grafana\_nodes\_dashboard\_url](#input\_grafana\_nodes\_dashboard\_url) | Dashboard URL for Nodes Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/nodes.json"` | no |
| <a name="input_grafana_url"></a> [grafana\_url](#input\_grafana\_url) | Endpoint URL of Amazon Managed Grafana workspace | `string` | n/a | yes |
| <a name="input_grafana_workloads_dashboard_url"></a> [grafana\_workloads\_dashboard\_url](#input\_grafana\_workloads\_dashboard\_url) | Dashboard URL for Workloads Grafana Dashboard JSON | `string` | `"https://raw.githubusercontent.com/aws-observability/aws-observability-accelerator/1d731aca31cdeb26e9fe9d017e609a5ba1621a30/artifacts/grafana-dashboards/workloads.json"` | no |
| <a name="input_helm_config"></a> [helm\_config](#input\_helm\_config) | Helm Config for Prometheus | `any` | `{}` | no |
| <a name="input_irsa_iam_permissions_boundary"></a> [irsa\_iam\_permissions\_boundary](#input\_irsa\_iam\_permissions\_boundary) | IAM permissions boundary for IRSA roles | `string` | `null` | no |
| <a name="input_irsa_iam_role_path"></a> [irsa\_iam\_role\_path](#input\_irsa\_iam\_role\_path) | IAM role path for IRSA roles | `string` | `"/"` | no |
Expand Down Expand Up @@ -117,7 +124,7 @@ This module makes use of the open source [kube-prometheus-stack](https://github.

## Troubleshooting

When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.
1. When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.

```bash
Error: cannot patch "prometheus-node-exporter" with kind DaemonSet: DaemonSet.apps "prometheus-node-exporter" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"prometheus-node-exporter", "app.kubernetes.io/name":"prometheus-node-exporter"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Expand All @@ -129,3 +136,68 @@ This is due to the upgrade of the node-exporter chart from v2 to v4. Manually de
kubectl -n prometheus-node-exporter delete daemonset -l app=prometheus-node-exporter
terraform apply
```

2. In case you dont see the grafana dashboards in your Amazon Managed Grafana console, check on the logs on your grafana operator pod using the below command :

```bash
kubectl get pods -n grafana-operator
```

Output:

```
NAME READY STATUS RESTARTS AGE
grafana-operator-866d4446bb-nqq5c 1/1 Running 0 3h17m
```

```bash
kubectl logs grafana-operator-866d4446bb-nqq5c -n grafana-operator
```

Output:

```
1.6857285045556655e+09 ERROR error reconciling datasource {"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafanadatasource-sample-amp","namespace":"grafana-operator"}, "namespace": "grafana-operator", "name": "grafanadatasource-sample-amp", "reconcileID": "72cfd60c-a255-44a1-bfbd-88b0cbc4f90c", "datasource": "grafanadatasource-sample-amp", "grafana": "external-grafana", "error": "status: 401, body: {\"message\":\"Expired API key\"}\n"}
github.com/grafana-operator/grafana-operator/controllers.(*GrafanaDatasourceReconciler).Reconcile
```

If you observe, the the above `grafana-api-key error` in the logs, your grafana API key is expired. Please use the operational procedure to update your `grafana-api-key` :

- First, lets create a new Grafana API key.

```bash
export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
--key-name "grafana-operator-key-new" \
--key-role "ADMIN" \
--seconds-to-live 432000 \
--workspace-id <YOUR_WORKSPACE_ID> \
--query key \
--output text)
```

- Next, lets grab the Grafana API key secret name from AWS Secrets Manager. The keyname should start with `terraform-..`

```bash
aws secretsmanager list-secrets
```

- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:

```bash
aws secretsmanager update-secret \
--secret-id <Your Secret Name> \
--secret-string "${GO_AMG_API_KEY}" \
--region <Your AWS Region>
```

## Grafana datasources and dashboards via GitOps on Amazon Managed Grafana

We have upgraded out solution to use [grafana-operator](https://github.com/grafana-operator/grafana-operator#:~:text=The%20grafana%2Doperator%20is%20a,an%20easy%20and%20scalable%20way.) and [Flux](https://fluxcd.io/) to create Grafana datasources, folder and dashboards via GitOps on Amazon Managed Grafana.

The grafana-operator is a Kubernetes operator built to help you manage your Grafana instances inside and outside Kubernetes. Grafana Operator makes it possible for you to manage and create Grafana dashboards, datasources etc. declaratively between multiple instances in an easy and scalable way. Using grafana-operator it will be possible to add AWS data sources such as Amazon Managed Service for Prometheus, Amazon CloudWatch, AWS X-Ray to Amazon Managed Grafana and create Grafana dashboards on Amazon Managed Grafana from your Amazon EKS cluster. This enables us to use our Kubernetes cluster to create and manage the lifecycle of resources in Amazon Managed Grafana in a Kubernetes native way. This ultimately enables us to use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.

GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.

We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.

We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator/tree/main/artifacts/grafana-operator-manifests). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.
Loading