diff --git a/release/RELEASE_CHECKLIST.md b/release/RELEASE_CHECKLIST.md index f8a55bfff9aec..e4770a1cb6fdd 100644 --- a/release/RELEASE_CHECKLIST.md +++ b/release/RELEASE_CHECKLIST.md @@ -31,6 +31,7 @@ This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst d - [ ] Test passing - [ ] Results added to `release/release_logs` - [ ] microbenchmark +- [ ] `kubernetes` manual release tests pass - [ ] ``weekly`` release test suite - [ ] Test passing diff --git a/release/RELEASE_PROCESS.rst b/release/RELEASE_PROCESS.rst index 59da95846cf3c..b2f7d05db5492 100644 --- a/release/RELEASE_PROCESS.rst +++ b/release/RELEASE_PROCESS.rst @@ -172,6 +172,9 @@ Release tests are added and maintained by the respective teams. As another example, if you just want to kick off all nightly RLLib tests, select the respective test suite and specify ``rllib`` in the test file filter. +6. **Kubernetes tests must be run manually.** Refer to ``kubernetes_manual_tests/README.md``. + Feel free to ping code owner(s) of OSS Kubernetes support to run these. + Identify and Resolve Release Blockers ------------------------------------- If a release blocking issue arises in the course of testing, you should diff --git a/release/kubernetes_manual_tests/README.md b/release/kubernetes_manual_tests/README.md new file mode 100644 index 0000000000000..12b61f272b079 --- /dev/null +++ b/release/kubernetes_manual_tests/README.md @@ -0,0 +1,25 @@ +# ray-k8s-tests + +These tests are not automated and thus **must be run manually** for each release. +If you have issues running them, bug the code owner(s) for OSS Kubernetes support. + +## How to run +1. Configure kubectl and Helm 3 to access a K8s cluster. +2. `git checkout releases/` +3. You might have to locally pip install the Ray wheel for the relevant commit (or pip install -e) in a conda env, see Ray client note below. +4. cd to this directory +5. `IMAGE=rayproject/ray: bash k8s_release_tests.sh` +6. Test outcomes will be reported at the end of the output. + +This runs three tests and does the necessary resource creation/teardown. The tests typically take about 15 minutes to finish. + +## Notes +0. Anyscale employees: You should have access to create a K8s cluster using either GKE or EKS, ask OSS Kubernetes code owner if in doubt. +1. Your Ray cluster should be able to accomodate 30 1-CPU pods to run all of the tests. +2. These tests use basic Ray client functionality -- your locally installed Ray version may need to be updated to match the one in the release image. +3. The tests do a poor job of Ray client port-forwarding process clean-up -- if a test fails, it's possible there might be a port-forwarding process stuck running in the background. To identify the rogue process run `ps aux | grep "port-forward"`. Then `kill` it. +4. There are some errors that will appear on the screen during the run -- that's normal, error recovery is being tested. + +## Running individual tests +To run any of the three individual tests, substitute in step 5 of **How to Run** `k8s-test.sh` or `helm-test.sh` or `k8s-test-scale.sh`. +It's the last of these that needs 30 1-cpu pods. 10 is enough for either of the other two. The scale test is currently somewhat flaky. Rerun it if it fails. diff --git a/release/kubernetes_manual_tests/helm-test.sh b/release/kubernetes_manual_tests/helm-test.sh new file mode 100755 index 0000000000000..273ddb5c1cc11 --- /dev/null +++ b/release/kubernetes_manual_tests/helm-test.sh @@ -0,0 +1,8 @@ +#!/bin/bash +set -x +kubectl create namespace helm-test +kubectl create namespace helm-test2 +KUBERNETES_OPERATOR_TEST_NAMESPACE=helm-test KUBERNETES_OPERATOR_TEST_IMAGE="$IMAGE" python ../../python/ray/tests/kubernetes_e2e/test_helm.py +kubectl delete namespace helm-test +kubectl delete namespace helm-test2 +kubectl delete -f ../../deploy/charts/ray/crds/cluster_crd.yaml diff --git a/release/kubernetes_manual_tests/k8s-test-scale.sh b/release/kubernetes_manual_tests/k8s-test-scale.sh new file mode 100755 index 0000000000000..59ea06c80f5f1 --- /dev/null +++ b/release/kubernetes_manual_tests/k8s-test-scale.sh @@ -0,0 +1,11 @@ +#!/bin/bash +set -x +kubectl create namespace scale-test +kubectl create namespace scale-test2 +KUBERNETES_OPERATOR_TEST_NAMESPACE=scale-test KUBERNETES_OPERATOR_TEST_IMAGE="$IMAGE" python ../../python/ray/tests/kubernetes_e2e/test_k8s_operator_scaling.py +kubectl -n scale-test delete --all rayclusters +kubectl -n scale-test2 delete --all rayclusters +kubectl delete -f ../../deploy/components/operator_cluster_scoped.yaml +kubectl delete namespace scale-test +kubectl delete namespace scale-test2 +kubectl delete -f ../../deploy/charts/ray/crds/cluster_crd.yaml diff --git a/release/kubernetes_manual_tests/k8s-test.sh b/release/kubernetes_manual_tests/k8s-test.sh new file mode 100755 index 0000000000000..aa0ec6325d880 --- /dev/null +++ b/release/kubernetes_manual_tests/k8s-test.sh @@ -0,0 +1,9 @@ +#!/bin/bash +set -x +kubectl create namespace basic-test +kubectl apply -f ../../deploy/charts/ray/crds/cluster_crd.yaml +KUBERNETES_OPERATOR_TEST_NAMESPACE=basic-test KUBERNETES_OPERATOR_TEST_IMAGE="$IMAGE" python ../../python/ray/tests/kubernetes_e2e/test_k8s_operator_basic.py +kubectl -n basic-test delete --all rayclusters +kubectl -n basic-test delete deployment ray-operator +kubectl delete namespace basic-test +kubectl delete -f ../../deploy/charts/ray/crds/cluster_crd.yaml diff --git a/release/kubernetes_manual_tests/k8s_release_tests.sh b/release/kubernetes_manual_tests/k8s_release_tests.sh new file mode 100644 index 0000000000000..6576dcdabfa39 --- /dev/null +++ b/release/kubernetes_manual_tests/k8s_release_tests.sh @@ -0,0 +1,30 @@ +#!/bin/bash +set -x +IMAGE="$IMAGE" bash k8s-test.sh +BASIC_SUCCEEDED=$? +IMAGE="$IMAGE" bash helm-test.sh +HELM_SUCCEEDED=$? +IMAGE="$IMAGE" bash k8s-test-scale.sh +SCALE_SUCCEEDED=$? + +if (( BASIC_SUCCEEDED == 0 )) +then + echo "k8s-test.sh succeeded" +else + echo "k8s-test.sh test failed" +fi + +if (( HELM_SUCCEEDED == 0 )) +then + echo "helm-test.sh test succeeded"; +else + echo "helm-test.sh test failed" +fi + +if (( SCALE_SUCCEEDED == 0)) +then + echo "k8s-test-scale.sh test succeeded"; +else + echo "k8s-test-scale.sh failed. Try re-running just the k8s-test-scale.sh. It's expected to be flaky." +fi +