Skip to content

Commit

Permalink
Minor validation and docs update (#695)
Browse files Browse the repository at this point in the history
- Enforce cluster name is between [1, 19] characters to prevent empty
cluster names and too long cluster names from being propagated down to
IRSA role creation and causing role creation to fail due to role length
being greater than 64 characters
- Highlight which deployment add-on steps can be skipped when following
the Terraform deployment guides


**Testing:**
- Tested cluster name length validation via empty string,
"ack-sagemaker-controller-irsa-tf-vanilla-uwiqgwfq-ap-southeast-1", and
"ack-sagemaker-controller-irsa-tf-vanilla-uwiqgwfqu3-ap-southeast-1"

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
rrrkharse authored Apr 24, 2023
1 parent b014c04 commit bba1bc9
Show file tree
Hide file tree
Showing 9 changed files with 114 additions and 22 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ install-jq:
sudo apt-get install jq -y

install-terraform:
$(eval TERRAFORM_VERSION:=1.2.7)
$(eval TERRAFORM_VERSION:=1.4.5)
curl "https://releases.hashicorp.com/terraform/$(TERRAFORM_VERSION)/terraform_$(TERRAFORM_VERSION)_linux_amd64.zip" -o "terraform.zip"
unzip -o -q terraform.zip
sudo install -o root -g root -m 0755 terraform /usr/local/bin/terraform
Expand Down
5 changes: 5 additions & 0 deletions deployments/cognito-rds-s3/terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
variable "cluster_name" {
description = "Name of cluster"
type = string

validation {
condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19
error_message = "The cluster name must be between [1, 19] characters"
}
}

variable "cluster_region" {
Expand Down
5 changes: 5 additions & 0 deletions deployments/cognito/terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
variable "cluster_name" {
description = "Name of cluster"
type = string

validation {
condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19
error_message = "The cluster name must be between [1, 19] characters"
}
}

variable "cluster_region" {
Expand Down
5 changes: 5 additions & 0 deletions deployments/rds-s3/terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
variable "cluster_name" {
description = "Name of cluster"
type = string

validation {
condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19
error_message = "The cluster name must be between [1, 19] characters"
}
}

variable "cluster_region" {
Expand Down
5 changes: 5 additions & 0 deletions deployments/vanilla/terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
variable "cluster_name" {
description = "Name of cluster"
type = string

validation {
condition = length(var.cluster_name) > 0 && length(var.cluster_name) <= 19
error_message = "The cluster name must be between [1, 19] characters"
}
}

variable "cluster_region" {
Expand Down
39 changes: 35 additions & 4 deletions website/content/en/docs/add-ons/load-balancer/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ This tutorial shows how to expose Kubeflow over a load balancer on AWS.

Follow this guide only if you are **not** using `Cognito` as the authentication provider in your deployment. Cognito-integrated deployment is configured with the AWS Load Balancer controller by default to create an ingress-managed Application Load Balancer and exposes Kubeflow via a hosted domain.

> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below.
## Background

Kubeflow does not offer a generic solution for connecting to Kubeflow over a Load Balancer because this process is highly dependent on your environment and cloud provider. On AWS, we use the [AWS Load Balancer (ALB) controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/), which satisfies the Kubernetes [Ingress resource](https://kubernetes.io/docs/concepts/services-networking/ingress/) to create an [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) (ALB). When you create a Kubernetes `Ingress`, an ALB is provisioned that load balances application traffic.
Expand Down Expand Up @@ -37,8 +39,15 @@ This guide assumes that you have:

## Create Load Balancer


#### Setup for Manifest deployments

If you prefer to create a load balancer using automated scripts, you **only** need to follow the steps in the [automated script section](#automated-script). You can read the following sections in this guide to understand what happens when you run the automated script or to walk through all of the steps manually.

#### Setup for Terraform deployments

Follow the manual steps below.

### Create domain and certificates

You need a registered domain and TLS certificate to use HTTPS with Load Balancer. Since your top level domain (e.g. `example.com`) can be registered at any service provider, for uniformity and taking advantage of the integration provided between Route53, ACM, and Application Load Balancer, you will create a separate [sudomain](https://en.wikipedia.org/wiki/Subdomain) (e.g. `platform.example.com`) to host Kubeflow and a corresponding hosted zone in Route53 to route traffic for this subdomain. To get TLS support, you will need certificates for both the root domain (`*.example.com`) and subdomain (`*.platform.example.com`) in the region where your platform will run (your EKS cluster region).
Expand Down Expand Up @@ -86,7 +95,9 @@ If you choose DNS validation for the validation of the certificates, you will be
```bash
printf 'certArn='$certArn'' > awsconfigs/common/istio-ingress/overlays/https/params.env
```
### Configure Load Balancer controller
### Configure Load Balancer Controller

> Important: Skip this step if you are using a Terraform deployment since the AWS Load Balancer Controller is installed by default unless you set `enable_aws_load_balancer_controller = false`.

Set up resources required for the Load Balancer controller:

Expand All @@ -103,6 +114,7 @@ Set up resources required for the Load Balancer controller:
```
- `kubernetes.io/role/internal-elb`. Add this tag only to private subnets.
- `kubernetes.io/role/elb`. Add this tag only to public subnets.

1. The Load balancer controller uses [IAM roles for service accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html)(IRSA) to access AWS services. An OIDC provider must exist for your cluster to use IRSA. Create an OIDC provider and associate it with your EKS cluster by running the following command if your cluster doesn’t already have one:
```bash
eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --approve
Expand All @@ -113,15 +125,30 @@ Set up resources required for the Load Balancer controller:
export LBC_POLICY_ARN=$(aws iam create-policy --policy-name $LBC_POLICY_NAME --policy-document file://awsconfigs/infra_configs/iam_alb_ingress_policy.json --output text --query 'Policy.Arn')
eksctl create iamserviceaccount --name aws-load-balancer-controller --namespace kube-system --cluster ${CLUSTER_NAME} --region ${CLUSTER_REGION} --attach-policy-arn ${LBC_POLICY_ARN} --override-existing-serviceaccounts --approve
```

1. Configure the parameters for [load balancer controller](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/common/aws-alb-ingress-controller/base/params.env) with the cluster name.
```bash
printf 'clusterName='$CLUSTER_NAME'' > awsconfigs/common/aws-alb-ingress-controller/base/params.env
```

### Build Manifests and deploy components
Run the following command to build and install the components specified in the Load Balancer [kustomize](https://github.com/awslabs/kubeflow-manifests/blob/main/deployments/add-ons/load-balancer/kustomization.yaml) file.
### Install Load Balancer Controller

> Important: Skip this step if you are using a Terraform deployment since the AWS Load Balancer Controller is installed by default unless you set `enable_aws_load_balancer_controller = false`.

Run the following command to build and install the Load Balancer controller [kustomize](https://github.com/awslabs/kubeflow-manifests/blob/main/awsconfigs/common/aws-alb-ingress-controller/base/kustomization.yaml) file.

```bash
while ! kustomize build deployments/add-ons/load-balancer | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 30; done
kustomize build awsconfigs/common/aws-alb-ingress-controller/base | kubectl apply -f -
kubectl wait --for condition=established crd/ingressclassparams.elbv2.k8s.aws
kustomize build awsconfigs/common/aws-alb-ingress-controller/base | kubectl apply -f -
```

### Create Ingress

Create an ingress that will use the certifcate you specified in `certArn`.

```bash
kustomize build awsconfigs/common/istio-ingress/overlays/https | kubectl apply -f -
```

### Update the domain with ALB address
Expand All @@ -140,6 +167,8 @@ while ! kustomize build deployments/add-ons/load-balancer | kubectl apply -f -;

### Automated script

> Important: Terraform deployment users should not follow these Automated setup instructions and should follow the [Manual setup instructions](#create-load-balancer).

1. Install dependencies for the script
```bash
cd tests/e2e
Expand Down Expand Up @@ -198,6 +227,8 @@ while ! kustomize build deployments/add-ons/load-balancer | kubectl apply -f -;

## Clean up

> Important: Terraform deployment users should not follow these clean up steps and should manually delete resources created while following the [Manual setup instructions](#create-load-balancer).

To delete the resources created in this guide, run the following commands from the root of your repository:
> Note: Make sure that you have the configuration file created by the script in `tests/e2e/utils/load_balancer/config.yaml`. If you did not use the script, plug in the name, ARN, or ID of the resources that you created in the configuration file by referring to the sample in Step 4 of the [previous section](#automated-script).
```bash
Expand Down
25 changes: 20 additions & 5 deletions website/content/en/docs/add-ons/storage/efs/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ weight = 10

This guide describes how to use Amazon EFS as Persistent storage on top of an existing Kubeflow deployment.

> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below.
## 1.0 Prerequisites
For this guide, we assume that you already have an EKS Cluster with Kubeflow installed. The FSx CSI Driver can be installed and configured as a separate resource on top of an existing Kubeflow deployment. See the [deployment options]({{< ref "/docs/deployment" >}}) and [general prerequisites]({{< ref "/docs/deployment/vanilla/guide.md" >}}) for more information.

Expand Down Expand Up @@ -37,9 +39,18 @@ export CLAIM_NAME=<efs-claim>

## 2.0 Set up EFS

#### Setup for Manifest deployments

You can either use Automated or Manual setup to set up the resources required. If you choose the manual route, you get another choice between **static and dynamic provisioning**, so pick whichever suits you. On the other hand, for the automated script we currently only support **dynamic provisioning**. Whichever combination you pick, be sure to continue picking the appropriate sections through the rest of this guide.

#### Setup for Terraform deployments

Follow the Manual setup to set up the resources required. As part of the Manual setup, you get another choice between **static and dynamic provisioning**, so pick whichever suits you.

### 2.1 [Option 1] Automated setup

> Important: Terraform deployment users should not follow these Automated setup instructions and should follow the [Manual setup instructions](#22-option-2-manual-setup).
The script automates all the manual resource creation steps but is currently only available for **Dynamic Provisioning** option.
It performs the required cluster configuration, creates an EFS file system and it also takes care of creating a storage class for dynamic provisioning. Once done, move to section 3.0.
1. Run the following commands from the `tests/e2e` directory:
Expand Down Expand Up @@ -80,7 +91,11 @@ If you prefer to manually setup each component then you can follow this manual g
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)
```

#### 1. Install the EFS CSI driver
#### 1. Driver install and IAM configuration

> Important: Skip this step if you are using a Terraform deployment since EFS CSI driver is installed by default unless you set `enable_aws_efs_csi_driver = false`.
##### 1.1 Install the EFS CSI driver
We recommend installing the EFS CSI Driver v1.5.4 directly from the [the aws-efs-csi-driver github repo](https://github.com/kubernetes-sigs/aws-efs-csi-driver) as follows:

```bash
Expand All @@ -95,7 +110,7 @@ NAME ATTACHREQUIRED PODINFOONMOUNT MODES AGE
efs.csi.aws.com false false Persistent 5d17h
```

#### 2. Create the IAM Policy for the CSI driver
##### 1.2. Create the IAM Policy for the CSI driver
The CSI driver's service account (created during installation) requires IAM permission to make calls to AWS APIs on your behalf. Here, we will be annotating the Service Account `efs-csi-controller-sa` with an IAM Role which has the required permissions.

1. Download the IAM policy document from GitHub as follows.
Expand Down Expand Up @@ -129,15 +144,15 @@ eksctl create iamserviceaccount \
kubectl describe -n kube-system serviceaccount efs-csi-controller-sa
```

#### 3. Manually create an instance of the EFS filesystem
#### 2. Manually create an instance of the EFS filesystem
Please refer to the official [AWS EFS CSI Document](https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html#efs-create-filesystem) for detailed instructions on creating an EFS filesystem.

> Note: For this guide, we assume that you are creating your EFS Filesystem in the same VPC as your EKS Cluster.
#### Choose between dynamic and static provisioning
In the following section, you have to choose between setting up [dynamic provisioning](https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/) or setting up static provisioning.

#### 4. [Option 1] Dynamic provisioning
#### 3. [Option 1] Dynamic provisioning
1. Use the `$file_system_id` you recorded in section 3 above or use the AWS Console to get the filesystem id of the EFS file system you want to use. Now edit the `dynamic-provisioning/sc.yaml` file by chaning `<YOUR_FILE_SYSTEM_ID>` with your `fs-xxxxxx` file system id. You can also change it using the following command :
```bash
file_system_id=$file_system_id yq e '.parameters.fileSystemId = env(file_system_id)' -i $GITHUB_STORAGE_DIR/efs/dynamic-provisioning/sc.yaml
Expand All @@ -161,7 +176,7 @@ kubectl apply -f $GITHUB_STORAGE_DIR/efs/dynamic-provisioning/pvc.yaml

Note : The `StorageClass` is a cluster scoped resource which means we only need to do this step once per cluster.

#### 4. [Option 2] Static Provisioning
#### 3. [Option 2] Static Provisioning
Using [this sample](https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/examples/kubernetes/multiple_pods), we provided the required spec files in the sample subdirectory. However, you can create the PVC another way.

1. Use the `$file_system_id` you recorded in section 3 above or use the AWS Console to get the filesystem id of the EFS file system you want to use. Now edit the last line of the static-provisioning/pv.yaml file to specify the `volumeHandle` field to point to your EFS filesystem. Replace `$file_system_id` if it is not already set.
Expand Down
24 changes: 20 additions & 4 deletions website/content/en/docs/add-ons/storage/fsx-for-lustre/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ weight = 20

This guide describes how to use Amazon FSx as Persistent storage on top of an existing Kubeflow deployment.

> Note: For Terraform deployment users, some steps that should be skipped will have a note indicating such below.
## 1.0 Prerequisites
For this guide, we assume that you already have an EKS Cluster with Kubeflow installed. The FSx CSI Driver can be installed and configured as a separate resource on top of an existing Kubeflow deployment. See the [deployment options]({{< ref "/docs/deployment" >}}) and [general prerequisites]({{< ref "/docs/deployment/vanilla/guide.md" >}}) for more information.

Expand Down Expand Up @@ -36,9 +38,19 @@ export CLAIM_NAME=<fsx-claim>
```

## 2.0 Setup FSx for Lustre

#### Setup for Manifest deployments

You can either use Automated or Manual setup. We currently only support **Static provisioning** for FSx.

#### Setup for Terraform deployments

Follow the Manual setup. We currently only support **Static provisioning** for FSx.

### 2.1 [Option 1] Automated setup

> Important: Terraform deployment users should not follow these Automated setup instructions and should follow the [Manual setup instructions](#22-option-2-manual-setup).
The script automates all the manual resource creation steps but is currently only available for **Static Provisioning** option.
It performs the required cluster configuration, creates an FSx file system and it also takes care of creating a storage class for static provisioning. Once done, move to section 3.0.
1. Run the following commands from the `tests/e2e` directory:
Expand Down Expand Up @@ -74,7 +86,11 @@ The script applies some default values for the file system name, performance mod
### 2.2 [Option 2] Manual setup
If you prefer to manually setup each component then you can follow this manual guide.

#### 1. Install the FSx CSI Driver
#### 1. Driver install and IAM configuration

> Important: Skip this step if you are using a Terraform deployment since EFS CSI driver is installed by default unless you set `enable_aws_fsx_csi_driver = false`.
##### 1. Install the FSx CSI Driver
We recommend installing the FSx CSI Driver v0.9.0 directly from the [the aws-fsx-csi-driver GitHub repository](https://github.com/kubernetes-sigs/aws-fsx-csi-driver) as follows:

```bash
Expand All @@ -89,7 +105,7 @@ NAME ATTACHREQUIRED PODINFOONMOUNT MODES AGE
fsx.csi.aws.com false false Persistent 14s
```

#### 2. Create the IAM Policy for the CSI Driver
##### 2. Create the IAM Policy for the CSI Driver
The CSI driver's service account (created during installation) requires IAM permission to make calls to AWS APIs on your behalf. Here, we will be annotating the Service Account `fsx-csi-controller-sa` with an IAM Role which has the required permissions.

1. Create the policy using the json file provided as follows:
Expand Down Expand Up @@ -117,12 +133,12 @@ eksctl create iamserviceaccount \
kubectl describe -n kube-system serviceaccount fsx-csi-controller-sa
```

#### 3. Create an instance of the FSx Filesystem
#### 2. Create an instance of the FSx Filesystem
Please refer to the official [AWS FSx CSI documentation](https://docs.aws.amazon.com/fsx/latest/LustreGuide/getting-started-step1.html) for detailed instructions on creating an FSx filesystem.

Note: For this guide, we assume that you are creating your FSx Filesystem in the same VPC as your EKS Cluster.

#### 4. Static provisioning
#### 3. Static provisioning
[Using this sample from official Kubeflow Docs](https://www.kubeflow.org/docs/distributions/aws/customizing-aws/storage/#amazon-fsx-for-lustre)

1. Use the AWS Console to get the filesystem id of the FSx volume you want to use. You could also use the following command to list all the volumes available in your region. Either way, make sure that `file_system_id` is set.
Expand Down
Loading

0 comments on commit bba1bc9

Please sign in to comment.