Skip to content

Commit

Permalink
Update docs (#172)
Browse files Browse the repository at this point in the history
  • Loading branch information
bonclay7 authored Jun 6, 2023
1 parent 0392f5b commit 1f16bec
Show file tree
Hide file tree
Showing 7 changed files with 92 additions and 303 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ To explore the complete project documentation, please visit our [documentation s

## Getting started

To quickstart with a complete workflow and view Amazon EKS infrastructure dashboards,
To quick start with a complete workflow and view Amazon EKS infrastructure dashboards,
visit the [Amazon EKS cluster monitoring documentation](https://aws-observability.github.io/terraform-aws-observability-accelerator/eks/)

## How it works
Expand All @@ -39,8 +39,9 @@ v2+ releases introduces couple of breaking changes compared to previous versions

- `modules/workloads/infra` module moves to `modules/eks-monitoring`
- All EKS configuration options moves from the base module to the `eks-monitoring` module
- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visiblity
- All EKS workload modules `modules/workloads/{java,nginx}` merge into `eks-monitoring` as configuration options (patterns), see [examples](./examples) to provide a more complete visibility
- All examples have been updated to reflect these changes
- Introducing GitOps for Grafana contents (Dashboards, Folders and Data sources) with [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/)

### Base Module

Expand Down
13 changes: 13 additions & 0 deletions docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,19 @@ you need to track changes as part of a Git repository or CI/CD pipeline.
!!! warning
When using `tfvars` files, always be careful to not store and commit any secrets (keys, passwords, ...)

## Grafana contents via GitOps on Amazon Managed Grafana

We have upgraded our solution to use [grafana-operator](https://github.com/grafana-operator/grafana-operator#:~:text=The%20grafana%2Doperator%20is%20a,an%20easy%20and%20scalable%20way.) and [Flux](https://fluxcd.io/) to create Grafana data sources, folder and dashboards via GitOps on Amazon Managed Grafana.

The grafana-operator is a Kubernetes operator built to help you manage your Grafana instances inside and outside Kubernetes. Grafana Operator makes it possible for you to manage and create Grafana dashboards, datasources etc. declaratively between multiple instances in an easy and scalable way. Using grafana-operator it will be possible to add AWS data sources such as Amazon Managed Service for Prometheus, Amazon CloudWatch, AWS X-Ray to Amazon Managed Grafana and create Grafana dashboards on Amazon Managed Grafana from your Amazon EKS cluster. This enables us to use our Kubernetes cluster to create and manage the lifecycle of resources in Amazon Managed Grafana in a Kubernetes native way. This ultimately enables us to use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.

GitOps is a way of managing application and infrastructure deployment so that the whole system is described declaratively in a Git repository. It is an operational model that offers you the ability to manage the state of multiple Kubernetes clusters leveraging the best practices of version control, immutable artifacts, and automation. Flux is a declarative, GitOps-based continuous delivery tool that can be integrated into any CI/CD pipeline. It gives users the flexibility of choosing their Git provider (GitHub, GitLab, BitBucket). Now, with grafana-operator supporting the management of external Grafana instances such as Amazon Managed Grafana, operations personas can use GitOps mechanisms using CNCF projects such as Flux to create and manage the lifecycle of resources in Amazon Managed Grafana.

We have setup a [GitRepository](https://fluxcd.io/flux/components/source/gitrepositories/) and [Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/) using flux to sync our GitHub Repository to add Grafana Datasources, folder and Dashboards to Amazon Managed Grafana using Grafana Operator. GitRepository defines a Source to produce an Artifact for a Git repository revision. Kustomization defines a pipeline for fetching, decrypting, building, validating and applying Kustomize overlays or plain Kubernetes manifests. we are also using [Flux Post build variable substitution](https://fluxcd.io/flux/components/kustomize/kustomization/#post-build-variable-substitution) to dynamically render variables such as AMG_AWS_REGION, AMP_ENDPOINT_URL, AMG_ENDPOINT_URL,GRAFANA_NODEEXP_DASH_URL on the YAML manifests during deployment time to avoid hardcoding on the YAML manifests stored in Git repo.

We have placed our declarative code snippet to create an Amazon Managed Service For Promethes datasource and Grafana Dashboard in Amazon Managed Grafana in our [AWS Observabiity Accelerator GitHub Repository](https://github.com/aws-observability/aws-observability-accelerator/tree/main/artifacts/grafana-operator-manifests). We have setup a GitRepository to point to the AWS Observabiity Accelerator GitHub Repository and `Kustomization` for flux to sync Git Repository with artifacts in `./artifacts/grafana-operator-manifests` path in the AWS Observabiity Accelerator GitHub Repository. You can use this extension of our solution to point your own Kubernetes manifests to create Grafana Datasources and personified Grafana Dashboards of your choice using GitOps with Grafana Operator and Flux in Kubernetes native way with altering and redeploying this solution for changes to Grafana resources.



## v2.x changes

Expand Down
77 changes: 70 additions & 7 deletions docs/eks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The Amazon EKS infrastructure Terraform modules focuses on metrics collection to
Managed Service for Prometheus using the [AWS Distro for OpenTelemetry Operator](https://docs.aws.amazon.com/eks/latest/userguide/opentelemetry.html) for Amazon EKS. It deploys the [node exporter](https://github.com/prometheus/node_exporter) and [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) in your cluster.

It provides default dashboards to get a comprehensible visibility on your nodes,
namespaces, pods, and kubelet operations health. Finally, you get curated Prometheus recording rules
namespaces, pods, and Kubelet operations health. Finally, you get curated Prometheus recording rules
and alerts to operate your cluster.

Additionally, you can optionally collect custom Prometheus metrics from your applications running
Expand Down Expand Up @@ -72,9 +72,9 @@ aws amp create-workspace --alias observability-accelerator --query '.workspaceId

#### 5. Amazon Managed Grafana workspace

To run this example you need an Amazon Managed Grafana workspace. If you have
To visualize metrics collected, you need an Amazon Managed Grafana workspace. If you have
an existing workspace, create an environment variable as described below.
To create a new workspace, visit our supporting example for Grafana.
To create a new workspace, visit [our supporting example for Grafana](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/managed-grafana/)

!!! note
For the URL `https://g-xyz.grafana-workspace.eu-central-1.amazonaws.com`, the workspace ID would be `g-xyz`
Expand All @@ -91,8 +91,14 @@ run the `apply` or `destroy` command.

Ensure you have necessary IAM permissions (`CreateWorkspaceApiKey, DeleteWorkspaceApiKey`)

!!! note
Starting version v2.5.x and above, we use Grafana Operator and External Secrets to
manage Grafana contents. Your API Key will be stored securely on AWS Secrets Manager
and the Grafana Operator will use it to sync dashboards, folders and data sources.
Read more [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/concepts/).

```bash
export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 1200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
export TF_VAR_grafana_api_key=`aws grafana create-workspace-api-key --key-name "observability-accelerator-$(date +%s)" --key-role ADMIN --seconds-to-live 7200 --workspace-id $TF_VAR_managed_grafana_workspace_id --query key --output text`
```

## Deploy
Expand All @@ -105,10 +111,10 @@ terraform apply

## Visualization

#### 1. Prometheus datasource on Grafana
#### 1. Prometheus data source on Grafana

Make sure to open the link in the output. After a successful deployment, this will open
the Prometheus datasource configuration on Grafana.
the Prometheus data source configuration on Grafana.
Click `Save & test` and you should see a notification confirming that the Amazon Managed Service for Prometheus workspace is ready to be used on Grafana.

```bash
Expand All @@ -135,7 +141,7 @@ Open the Amazon Managed Service for Prometheus console and view the details of y
To setup your alert receiver, with Amazon SNS, follow [this documentation](https://docs.aws.amazon.com/prometheus/latest/userguide/AMP-alertmanager-receiver.html)


## Custom metrics collection
## Custom Prometheus metrics collection

In addition to the cluster metrics, if you are interested in collecting Prometheus
metrics from your pods, you can use setup `custom metrics collection`.
Expand Down Expand Up @@ -170,6 +176,63 @@ sum(up{job="custom-metrics"}) by (container_name, cluster, nodename)

## Troubleshooting

### 1. Grafana dashboards missing or Grafana API key expired

In case you don't see the grafana dashboards in your Amazon Managed Grafana console, check on the logs on your grafana operator pod using the below command :

```bash
kubectl get pods -n grafana-operator
```

Output:

```console
NAME READY STATUS RESTARTS AGE
grafana-operator-866d4446bb-nqq5c 1/1 Running 0 3h17m
```

```bash
kubectl logs grafana-operator-866d4446bb-nqq5c -n grafana-operator
```

Output:

```console
1.6857285045556655e+09 ERROR error reconciling datasource {"controller": "grafanadatasource", "controllerGroup": "grafana.integreatly.org", "controllerKind": "GrafanaDatasource", "GrafanaDatasource": {"name":"grafanadatasource-sample-amp","namespace":"grafana-operator"}, "namespace": "grafana-operator", "name": "grafanadatasource-sample-amp", "reconcileID": "72cfd60c-a255-44a1-bfbd-88b0cbc4f90c", "datasource": "grafanadatasource-sample-amp", "grafana": "external-grafana", "error": "status: 401, body: {\"message\":\"Expired API key\"}\n"}
github.com/grafana-operator/grafana-operator/controllers.(*GrafanaDatasourceReconciler).Reconcile
```

If you observe, the the above `grafana-api-key error` in the logs, your grafana API key is expired. Please use the operational procedure to update your `grafana-api-key` :

- First, lets create a new Grafana API key.

```bash
export GO_AMG_API_KEY=$(aws grafana create-workspace-api-key \
--key-name "grafana-operator-key-new" \
--key-role "ADMIN" \
--seconds-to-live 432000 \
--workspace-id <YOUR_WORKSPACE_ID> \
--query key \
--output text)
```

- Next, lets grab the Grafana API key secret name from AWS Secrets Manager. The keyname should start with `terraform-..`

```bash
aws secretsmanager list-secrets
```

- Finally, update the Grafana API key secret in AWS Secrets Manager using the above new Grafana API key:

```bash
aws secretsmanager update-secret \
--secret-id <Your Secret Name> \
--secret-string "${GO_AMG_API_KEY}" \
--region <Your AWS Region>
```

### 2. Upgrade from 2.1.0 or earlier

When you upgrade the eks-monitoring module from v2.1.0 or earlier, the following error may occur.

```bash
Expand Down
5 changes: 2 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,8 @@ traces collection, dashboards and alerts for monitoring:
- NGINX workloads (running on Amazon EKS)
- Java/JMX workloads (running on Amazon EKS)
- Amazon Managed Service for Prometheus workspaces with Amazon CloudWatch
- Installs Grafana Operator to add AWS data sources and create Grafana Dashboards to Amazon Managed Grafana.
- Installs FluxCD to perform GitOps sync of a Git Repo to EKS Cluster. We will use this later for creating Grafana Dashboards and AWS datasources to Amazon Managed Grafana.
- Installs External Secrets Operator to retrieve and Sync the Grafana API keys.
- [Grafana Operator](https://github.com/grafana-operator/grafana-operator) and [Flux CD](https://fluxcd.io/) to manage Grafana contents (AWS data sources, Grafana Dashboards) with GitOps
- External Secrets Operator to retrieve and Sync the Grafana API keys

These modules can be directly configured in your existing Terraform
configurations or ready to be deployed in our packaged
Expand Down
91 changes: 1 addition & 90 deletions examples/eks-cluster-with-vpc/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,93 +8,4 @@ This example deploys the following Basic EKS Cluster with VPC
- Creates Internet gateway for Public Subnets and NAT Gateway for Private Subnets
- Creates EKS Cluster Control plane with one managed node group

## How to Deploy

### Prerequisites

Ensure that you have installed the following tools in your Mac or Windows Laptop before start working with this module and run Terraform Plan and Apply

1. [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
2. [Kubectl](https://Kubernetes.io/docs/tasks/tools/)
3. [Terraform](https://learn.hashicorp.com/tutorials/terraform/install-cli)

### Minimum IAM Policy

> **Note**: The policy resource is set as `*` to allow all resources, this is not a recommended practice.
You can find the policy [here](min-iam-policy.json)


### Deployment Steps

#### Step 1: Clone the repo using the command below

```sh
git clone https://github.com/aws-observability/terraform-aws-observability-accelerator.git
```

#### Step 2: Run Terraform INIT

Initialize a working directory with configuration files

```sh
cd examples/eks-cluster-with-vpc/
terraform init
```

#### Step 3: Run Terraform PLAN

Verify the resources created by this execution

```sh
export TF_VAR_aws_region=<ENTER YOUR REGION> # Select your own region
export TF_VAR_cluster_name=<ENTER YOUR CLUSTER NAME> # Enter your cluster name
terraform plan
```

#### Step 4: Finally, Terraform APPLY

**Deploy the pattern**

```sh
terraform apply
```

Enter `yes` to apply.

### Configure `kubectl` and test cluster

EKS Cluster details can be extracted from terraform output or from AWS Console to get the name of cluster.
This following command used to update the `kubeconfig` in your local machine where you run kubectl commands to interact with your EKS Cluster.

#### Step 5: Run `update-kubeconfig` command

`~/.kube/config` file gets updated with cluster details and certificate from the below command

aws eks --region <enter-your-region> update-kubeconfig --name <cluster-name>

#### Step 6: List all the worker nodes by running the command below

kubectl get nodes

#### Step 7: List all the pods running in `kube-system` namespace

kubectl get pods -n kube-system

## Cleanup

To clean up your environment, destroy the Terraform modules in reverse order.

Destroy the Kubernetes Add-ons, EKS cluster with Node groups and VPC

```sh
terraform destroy -target="module.eks_blueprints_kubernetes_addons" -auto-approve
terraform destroy -target="module.eks_blueprints" -auto-approve
terraform destroy -target="module.vpc" -auto-approve
```

Finally, destroy any additional resources that are not in the above modules

```sh
terraform destroy -auto-approve
```
You can view the full documentation for this example [here](https://aws-observability.github.io/terraform-aws-observability-accelerator/helpers/new-eks-cluster/)
Loading

0 comments on commit 1f16bec

Please sign in to comment.