Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: terraforming an EKS cluster with autoscaling and EFS. #9427

Merged
merged 3 commits into from
May 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/setup-cluster/k8s/setup-eks-cluster.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
information that is obsolete. This documentation is preserved because it may contain useful
insights relevant to legacy systems.

See `Github repo <https://github.com/determined-ai/determined/tree/main/examples/deploy/eks>` for
an up-to-date example for terraform code deploying Determined on EKS with autoscaling and EFS
support.

Determined can be installed on a cluster that is hosted on a managed Kubernetes service such as
`Amazon EKS <https://aws.amazon.com/eks/>`_. This document describes how to set up an EKS cluster
with GPU-enabled nodes. The recommended setup includes deploying a cluster with a single non-GPU
Expand Down
3 changes: 3 additions & 0 deletions examples/deploy/eks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.terraform
terraform.tfstate*
.terraform.tfstate*
166 changes: 166 additions & 0 deletions examples/deploy/eks/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

43 changes: 43 additions & 0 deletions examples/deploy/eks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Terraformed EKS cluster for Determined

This is an example terraform code to configure an EKS cluster to run Determined on.

Supported features:
- autoscaling via Karpenter,
- postgresql volume on EBS,
- shared fs on EFS.

Based on [original Karpenter example](https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/karpenter)

## Prerequisites

- terraform
- helm
- aws CLI

## Installation

First, edit the `locals` section in `main.tf` to set your cluster name and AWS region.

```bash
$ terraform init
$ terraform apply -auto-approve
$ aws eks --region us-west-2 update-kubeconfig --name <CLUSTER NAME>
$ helm install determined determined-ai/determined --values values.yaml
```

## Teardown

Warning: shut down all the jobs in determined first.

```bash
$ helm uninstall determined
$ terraform destroy -auto-approve
```

## Future work

In the future, we may want to:
- Make the code configurable: currently, custom configurations will require changing the terraform code directly.
- Rework this code as `det deploy eks` utility.
- Switch from a postgres instance installed by helm and using an EBS volume to a terraform-provisioned RDS.
Loading
Loading