diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 8ef3f7f710..e5cd44e6d3 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -1,6 +1,6 @@ repos: - repo: https://github.com/streetsidesoftware/cspell-cli - rev: v8.15.1 + rev: v8.15.2 hooks: - id: cspell args: [--exclude, 'ADOPTERS.md', --exclude, '.pre-commit-config.yaml', --exclude, '.gitignore', --exclude, '*.drawio', --exclude, 'mkdocs.yml', --exclude, '.helmignore', --exclude, '.github/workflows/*', --exclude, 'patterns/istio-multi-cluster/*', --exclude, 'patterns/blue-green-upgrade/*', --exclude, '/patterns/vpc-lattice/cross-cluster-pod-communication/*', --exclude, 'patterns/bottlerocket/*', --exclude, 'patterns/nvidia-gpu-efa/generate-efa-nccl-test.sh'] diff --git a/patterns/gitops/getting-started-argocd/README.md b/patterns/gitops/getting-started-argocd/README.md index 6ac4df89f4..350479fc5f 100644 --- a/patterns/gitops/getting-started-argocd/README.md +++ b/patterns/gitops/getting-started-argocd/README.md @@ -117,7 +117,7 @@ The output looks like the following: Bootstrap the addons using ArgoCD: ```shell -kubectl apply -f bootstrap/addons.yaml +kubectl apply --server-side -f bootstrap/addons.yaml ``` ### Monitor GitOps Progress for Addons @@ -188,7 +188,7 @@ echo "ArgoCD URL: https://$(kubectl get svc -n argocd argo-cd-argocd-server -o j Deploy a sample application located in [k8s/game-2048.yaml](k8s/game-2048.yaml) using ArgoCD: ```shell -kubectl apply -f bootstrap/workloads.yaml +kubectl apply --server-side -f bootstrap/workloads.yaml ``` ### Monitor GitOps Progress for Workloads diff --git a/patterns/istio/README.md b/patterns/istio/README.md index 9d4b717a25..9e4129409e 100644 --- a/patterns/istio/README.md +++ b/patterns/istio/README.md @@ -36,7 +36,7 @@ cluster with deployed Istio. for ADDON in kiali jaeger prometheus grafana do ADDON_URL="https://raw.githubusercontent.com/istio/istio/release-1.20/samples/addons/$ADDON.yaml" - kubectl apply -f $ADDON_URL + kubectl apply --server-side -f $ADDON_URL done ``` @@ -177,7 +177,7 @@ kubectl port-forward svc/jaeger 16686:16686 -n istio-system - containerPort: 5000 EOF - kubectl apply -f helloworld.yaml -n sample + kubectl apply --server-side -f helloworld.yaml -n sample ``` ```text @@ -239,7 +239,7 @@ kubectl port-forward svc/jaeger 16686:16686 -n istio-system optional: true EOF - kubectl apply -f sleep.yaml -n sample + kubectl apply --server-side -f sleep.yaml -n sample ``` ```text diff --git a/patterns/karpenter-mng/README.md b/patterns/karpenter-mng/README.md index 00a58aaa1f..5779ee4c48 100644 --- a/patterns/karpenter-mng/README.md +++ b/patterns/karpenter-mng/README.md @@ -54,13 +54,13 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started 2. Provision the Karpenter `EC2NodeClass` and `NodePool` resources which provide Karpenter the necessary configurations to provision EC2 resources: ```sh - kubectl apply -f karpenter.yaml + kubectl apply --server-side -f karpenter.yaml ``` 3. Once the Karpenter resources are in place, Karpenter will provision the necessary EC2 resources to satisfy any pending pods in the scheduler's queue. You can demonstrate this with the example deployment provided. First deploy the example deployment which has the initial number replicas set to 0: ```sh - kubectl apply -f example.yaml + kubectl apply --server-side -f example.yaml ``` 4. When you scale the example deployment, you should see Karpenter respond by quickly provisioning EC2 resources to satisfy those pending pod requests: diff --git a/patterns/karpenter/README.md b/patterns/karpenter/README.md index 10c0593b84..b353c3274a 100644 --- a/patterns/karpenter/README.md +++ b/patterns/karpenter/README.md @@ -47,13 +47,13 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started 2. Provision the Karpenter `EC2NodeClass` and `NodePool` resources which provide Karpenter the necessary configurations to provision EC2 resources: ```sh - kubectl apply -f karpenter.yaml + kubectl apply --server-side -f karpenter.yaml ``` 3. Once the Karpenter resources are in place, Karpenter will provision the necessary EC2 resources to satisfy any pending pods in the scheduler's queue. You can demonstrate this with the example deployment provided. First deploy the example deployment which has the initial number replicas set to 0: ```sh - kubectl apply -f example.yaml + kubectl apply --server-side -f example.yaml ``` 4. When you scale the example deployment, you should see Karpenter respond by quickly provisioning EC2 resources to satisfy those pending pod requests: diff --git a/patterns/ml-container-cache/README.md b/patterns/ml-container-cache/README.md index 25fb37fd60..b984db8b5f 100644 --- a/patterns/ml-container-cache/README.md +++ b/patterns/ml-container-cache/README.md @@ -81,13 +81,13 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started 4. Once the EKS cluster and node group have been provisioned, you can deploy the provided example pod that will use a cached image to verify the time it takes for the pod to reach a ready state. ```sh - kubectl apply -f pod-cached.yaml + kubectl apply --server-side -f pod-cached.yaml ``` You can contrast this with the time it takes for a pod that is not cached on a node by using the provided `pod-uncached.yaml` file. This works by simply using a pod that doesn't have a toleration for nodes that contain NVIDIA GPUs, which is where the cached images are provided in this example. ```sh - kubectl apply -f pod-uncached.yaml + kubectl apply --server-side -f pod-uncached.yaml ``` You can also do the same steps above but using the small, utility CLI [ktime](https://github.com/clowdhaus/ktime) which can either collect the pod events to measure the time duration to reach a ready state, or it can deploy a pod manifest and return the same: diff --git a/patterns/nvidia-gpu-efa/README.md b/patterns/nvidia-gpu-efa/README.md index d223fff869..937383689e 100644 --- a/patterns/nvidia-gpu-efa/README.md +++ b/patterns/nvidia-gpu-efa/README.md @@ -36,8 +36,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started ## Validate !!! note - - Desired instance type can be specified in [eks.tf](eks.tf#L36). + Desired instance type can be specified in [eks.tf](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/d5ddd10afef9b4fd3e0cbba865645f0f522992ac/patterns/nvidia-gpu-efa/eks.tf#L38). Values shown below will change based on the instance type selected (i.e. - `p5.48xlarge` has 8 GPUs and 32 EFA interfaces). A list of EFA-enabled instance types is available [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html#efa-instance-types). If you are using an on-demand capacity reservation (ODCR) for your instance type, please uncomment the `capacity_reservation_specification` block in `eks.tf` @@ -66,36 +65,25 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started To deploy the MPI operator execute the following: ```sh - kubectl apply -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.4.0/deploy/v2beta1/mpi-operator.yaml - ``` - - ```text - namespace/mpi-operator created - customresourcedefinition.apiextensions.k8s.io/mpijobs.kubeflow.org created - serviceaccount/mpi-operator created - clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-admin created - clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-edit created - clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-view created - clusterrole.rbac.authorization.k8s.io/mpi-operator created - clusterrolebinding.rbac.authorization.k8s.io/mpi-operator created - deployment.apps/mpi-operator created - ``` - - In addition to deploying the operator, please apply a patch to the mpi-operator clusterrole - to allow the mpi-operator service account access to `leases` resources in the `coordination.k8s.io` apiGroup. - - ```sh - kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/kubeflow/mpi-operator/clusterrole-mpi-operator.yaml + kubectl apply --server-side -f https://raw.githubusercontent.com/kubeflow/mpi-operator/v0.6.0/deploy/v2beta1/mpi-operator.yaml ``` ```text - clusterrole.rbac.authorization.k8s.io/mpi-operator configured + namespace/mpi-operator serverside-applied + customresourcedefinition.apiextensions.k8s.io/mpijobs.kubeflow.org serverside-applied + serviceaccount/mpi-operator serverside-applied + clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-admin serverside-applied + clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-edit serverside-applied + clusterrole.rbac.authorization.k8s.io/kubeflow-mpijobs-view serverside-applied + clusterrole.rbac.authorization.k8s.io/mpi-operator serverside-applied + clusterrolebinding.rbac.authorization.k8s.io/mpi-operator serverside-applied + deployment.apps/mpi-operator serverside-applied ``` 3. EFA info test This test prints a list of available EFA interfaces by using the `/opt/amazon/efa/bin/fi_info` utility. - The script [generate-efa-info-test.sh](generate-efa-info-test.sh) creates an MPIJob manifest file named `efa-info-test.yaml`. It assumes that there are two cluster nodes with 8 GPU's per node and 32 EFA adapters. If you are not using `p5.48xlarge` instances in your cluster, you may adjust the settings in the script prior to running it. + The script [generate-efa-info-test.sh](https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/patterns/nvidia-gpu-efa/generate-efa-info-test.sh) creates an MPIJob manifest file named `efa-info-test.yaml`. It assumes that there are two cluster nodes with 8 GPU's per node and 32 EFA adapters. If you are not using `p5.48xlarge` instances in your cluster, you may adjust the settings in the script prior to running it. `NUM_WORKERS` - number of nodes you want to run the test on `GPU_PER_WORKER` - number of GPUs available on each node @@ -108,7 +96,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started To start the test apply the generated manifest to the cluster: ```sh - kubectl apply -f ./efa-info-test.yaml + kubectl apply --server-side -f ./efa-info-test.yaml ``` ```text @@ -186,7 +174,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started This script creates a file named `efa-nccl-test.yaml`. Apply the manifest to start the EFA nccl test. ```sh - kubectl apply -f ./efa-nccl-test.yaml + kubectl apply --server-side -f ./efa-nccl-test.yaml ```text mpijob.kubeflow.org/efa-nccl-test created diff --git a/patterns/wireguard-with-cilium/README.md b/patterns/wireguard-with-cilium/README.md index c558968885..059292def1 100644 --- a/patterns/wireguard-with-cilium/README.md +++ b/patterns/wireguard-with-cilium/README.md @@ -20,7 +20,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started 1. Deploy the example pods: ```sh - kubectl apply -f example.yaml + kubectl apply --server-side -f example.yaml ``` ```text @@ -100,7 +100,7 @@ See [here](https://aws-ia.github.io/terraform-aws-eks-blueprints/getting-started ```sh kubectl create ns cilium-test - kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/v1.14.1/examples/kubernetes/connectivity-check/connectivity-check.yaml + kubectl apply --server-side -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/v1.14.1/examples/kubernetes/connectivity-check/connectivity-check.yaml ``` ```text