-
Notifications
You must be signed in to change notification settings - Fork 832
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Extend documentation for eks-prow-build-cluster
Signed-off-by: Marko Mudrinić <[email protected]>
- Loading branch information
Showing
1 changed file
with
209 additions
and
67 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,109 +1,251 @@ | ||
# Provisioninig EKS clusters | ||
# EKS-based Prow build cluster | ||
|
||
## Prod vs Canary | ||
This folder contains Terraform configs and modules needed to provision and | ||
bootstrap an EKS-based Prow build cluster. | ||
|
||
These scripts support provisioning two types of EKS clusters. One is meant for hosting prow jobs | ||
on production and the other one is for testing infrastructure changes before promoting them to | ||
production. | ||
## Environments | ||
|
||
Here are some differences between canary and production setups: | ||
* cluster name, | ||
* cluster admin IAM role name, | ||
* secrets-manager IAM policy name, | ||
* canary is missing k8s prow OIDC provider and corresponding role, | ||
* subnet setup is different, | ||
* instance type and autoscaling paramethers (mainly for saving), | ||
There are two different environments, i.e. clusters: | ||
|
||
## Provisioning Cluster | ||
* Production - the cluster that's used as a build cluster and that's connected | ||
to the Prow control plane and GCP | ||
* Canary - the cluster that's used to verify infrastructure changes before | ||
applying them to the production cluster. **This cluster is not connected to | ||
the Prow Control plane or GCP** | ||
|
||
### Choosing the environment | ||
|
||
Running installation from scratch is different than consecutive invocations of Terraform. | ||
First run creates a role that can be later assumed by other users. Becasue of that additional | ||
variable has to be set: | ||
Set the `WORKSPACE_NAME` environment variable to `prod` or `canary`. | ||
|
||
Production: | ||
|
||
```bash | ||
# For provisioning Prod: | ||
export WORKSPACE_NAME=prod | ||
# For provisioning Canary: | ||
export WORKSPACE_NAME=canary | ||
``` | ||
|
||
Canary: | ||
|
||
# Just making sure we don't have state cached locally. | ||
ASSUME_ROLE=false make init | ||
ASSUME_ROLE=false make apply | ||
```bash | ||
export WORKSPACE_NAME=canary | ||
``` | ||
|
||
Once the infrastructure is provisioned, next step is RBAC setup: | ||
### Differences between production and canary | ||
|
||
* cluster name | ||
* cluster admin IAM role name | ||
* secrets manager IAM policy name | ||
* canary is missing k8s-prow OIDC provider and the corresponding role | ||
* subnet setup is different | ||
* instance type and autoscaling parameters (mainly for saving) | ||
|
||
## Interacting with clusters | ||
|
||
You'll mainly interact with clusters using kubectl and Terraform. You need | ||
kubeconfig for the former which can be obtained using the `aws` CLI. | ||
|
||
Production: | ||
|
||
```bash | ||
# Fetch & update kubeconfig. | ||
# For Prod: | ||
aws eks update-kubeconfig --region us-east-2 --name prow-build-cluster | ||
# For Canary: | ||
``` | ||
|
||
Canary: | ||
|
||
```bash | ||
aws eks update-kubeconfig --region us-east-2 --name prow-build-canary-cluster | ||
``` | ||
|
||
This is going to update your `~/.kube/config` (unless specified otherwise). | ||
Once you fetched kubeconfig, you need to update it to add assume role arguments, | ||
otherwise you'll have no access to the cluster (e.g. you'll get Unauthorized | ||
error). | ||
|
||
Open kubeconfig in a text editor of your choice and update `args` for the | ||
appropriate cluster: | ||
|
||
* Production | ||
```yaml | ||
args: | ||
- --region | ||
- us-east-2 | ||
- eks | ||
- get-token | ||
- --cluster-name | ||
- prow-build-cluster | ||
- --role-arn | ||
- arn:aws:iam::468814281478:role/Prow-Cluster-Admin | ||
``` | ||
* Canary: | ||
```yaml | ||
args: | ||
- --region | ||
- us-east-2 | ||
- eks | ||
- get-token | ||
- --cluster-name | ||
- prow-build-canary-cluster | ||
- --role-arn | ||
- arn:aws:iam::468814281478:role/canary-Prow-Cluster-Admin | ||
``` | ||
## Running Terraform | ||
**WARNING: We strongly recommend using the provided Makefile to avoid | ||
mistakes due to selecting wrong environment!!!** | ||
We have a Makefile that can be used to execute Terraform targeting the | ||
appropriate/correct environment. This Makefile uses the following environment | ||
variables to control Terraform: | ||
* `WORKSPACE_NAME` (default: `canary`, can be `prod`) | ||
* `ASSUME_ROLE` (default: `true`) - whether to authenticate to AWS using | ||
provided credentials or by assuming the ProwClusterAdmin role | ||
* `DEPLOY_K8S_RESOURCES` (default: `true`) - whether to deploy Kubernetes | ||
resources defined via Terraform | ||
* `TF_ARGS` (default: none) - additional command-line flags and arguments | ||
to provide to Terraform | ||
|
||
### Commands | ||
|
||
**WARNING: Make sure to read the whole document before creating a cluster | ||
for the first time as the additional steps are needed!** | ||
|
||
Init (**make sure to run this command before getting started with configs**): | ||
|
||
```bash | ||
make init | ||
``` | ||
|
||
Plan: | ||
|
||
# create cluster role bindings | ||
kubectl apply -f ./resources/rbac | ||
```bash | ||
make plan | ||
``` | ||
|
||
Lastly, run Terraform script again without additinal variable. This time, it will implicitly assume | ||
previously created role and provision resources on top of EKS cluster. | ||
Apply: | ||
|
||
```bash | ||
make apply | ||
``` | ||
|
||
From here, all consecutive runs should be possible with command from above. | ||
Destroy: | ||
|
||
```bash | ||
make destroy | ||
``` | ||
|
||
## Provisioning Cluster | ||
|
||
Running installation from scratch is different than consecutive invocations of | ||
Terraform. | ||
|
||
We first need to create an IAM role that we're going later to assume and use | ||
for creating the cluster. Note that the principal that created the cluster | ||
is considered as a cluster admin, so we want to make sure that we assume | ||
the IAM role before starting the cluster creation process. | ||
|
||
Additionally, we can't provision a cluster and deploy Kubernetes resources in | ||
the same Terraform run. That's because Terraform cannot plan Kubernetes | ||
resources without the cluster being created, so we first create the cluster, | ||
then run Terraform again to deploy Kubernetes resources. | ||
|
||
## Using cluster | ||
That said, the cluster creation is done in four phases: | ||
|
||
### Fetch kubeconfig | ||
- Phase 1: create the IAM role and policies | ||
- Phase 2: create everything else | ||
- Phase 3: deploy the Kubernetes resources managed by Terraform | ||
- Phase 4: deploy the Kubernetes resources not managed by Terraform | ||
|
||
**WARNING: Before getting started, make sure the `WORKSPACE_NAME` environment | ||
variable is set to the correct value!!!** | ||
|
||
### Phase 0: preparing the environment | ||
|
||
Before getting started, make sure to set the needed environment variables: | ||
|
||
```bash | ||
# Prod: | ||
aws eks update-kubeconfig --region us-east-2 --name prow-build-cluster | ||
# Canary: | ||
aws eks update-kubeconfig --region us-east-2 --name prow-build-canary-cluster | ||
export WORKSPACE_NAME=canary # or prod | ||
export ASSUME_ROLE=false # the role to be assumed will be created in phase 1 | ||
export DEPLOY_K8S_RESOURCES=false | ||
``` | ||
|
||
### Open kubeconfig and add assume role argument | ||
|
||
For Prod: | ||
```yaml | ||
args: | ||
- --region | ||
- us-east-2 | ||
- eks | ||
- get-token | ||
- --cluster-name | ||
- prow-build-cluster | ||
- --role-arn | ||
- arn:aws:iam::468814281478:role/Prow-Cluster-Admin | ||
### Phase 1: creating the IAM role and policies | ||
|
||
We're now going to create the IAM role and attach policies to it. | ||
This step is done by applying the appropriate `iam` module: | ||
|
||
```bash | ||
TF_ARGS="-target=module.iam" make apply | ||
``` | ||
|
||
For Canary: | ||
```yaml | ||
args: | ||
- --region | ||
- us-east-2 | ||
- eks | ||
- get-token | ||
- --cluster-name | ||
- prow-build-canary-cluster | ||
- --role-arn | ||
- arn:aws:iam::468814281478:role/canary-Prow-Cluster-Admin | ||
Ignore Terraform warnings about incomplete state, this is as expected | ||
as we're using the `-target` flag. | ||
|
||
### Phase 2: create the EKS cluster | ||
|
||
With the IAM role in place, we can assume it and use it to use it to create the | ||
EKS cluster and other needed resources. | ||
|
||
First, set the `ASSUME_ROLE` environment variable to `true`: | ||
|
||
```bash | ||
export ASSUME_ROLE=true | ||
``` | ||
|
||
Then run Terraform again: | ||
|
||
```bash | ||
make apply | ||
``` | ||
|
||
### Phase 3: deploy the Kubernetes resources | ||
|
||
With the EKS cluster in place, we can deploy Kubernetes resources managed by | ||
Terraform. First, make sure to set the `DEPLOY_K8S_RESOURCES` environment | ||
variable to `true`: | ||
|
||
```bash | ||
export DEPLOY_K8S_RESOURCES=true | ||
``` | ||
|
||
Then run the `apply` command again: | ||
|
||
```bash | ||
make apply | ||
``` | ||
|
||
At this point, the cluster should be fully functional. You should fetch | ||
kubeconfig before proceeding as described at the beginning of this document. | ||
|
||
### Phase 4: deploy the Kubernetes resources not managed by Terraform | ||
|
||
Not all Kubernetes resources are managed by Terraform. We're working on | ||
streamlining this, but until then, you have to deploy those resources manually. | ||
|
||
- Create required namespaces: | ||
```bash | ||
kubectl apply -f ./resources/namespaces.yaml | ||
``` | ||
- Create cluster roles and role bindings: | ||
```bash | ||
kubectl apply -f ./resources/rbac | ||
``` | ||
- Create required resources in kube-system and test-pods namespaces: | ||
```bash | ||
kubectl apply -f ./resources/kube-system | ||
kubectl apply -f ./resources/test-pods | ||
``` | ||
- Follow the appropriate instructions to deploy | ||
[node-termination-handler](./resources/node-termination-handler/README.md) | ||
and [the monitoring stack](./resources/monitoring/README.md) | ||
|
||
## Removing cluster | ||
|
||
Same as for installation, cluster removal requires running Terraform twice. | ||
**IMPORTANT**: It's possible only for users with assigned `AdministratorAccess` policy. | ||
The cluster can be removed by running the following command: | ||
|
||
```bash | ||
export WORKSPACE_NAME= # choose between canary/prod | ||
# First remove resources running on the cluster and IAM role. This fails once assumed role gets deleted. | ||
make destroy | ||
# Clean up the rest. | ||
ASSUME_ROLE=false make destroy | ||
``` | ||
|