Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kuberay][autoscaler] Use new autoscaling fields from the KubeRay operator #25386

Merged
merged 35 commits into from
Jun 9, 2022

Conversation

DmitriGekhtman
Copy link
Contributor

@DmitriGekhtman DmitriGekhtman commented Jun 2, 2022

Why are these changes needed?

This PR incorporates changes from KubeRay PRs ray-project/kuberay#163 ray-project/kuberay#274 ray-project/kuberay#278

  • adds logic to process the new fields upscalingMode and idleTimeoutSeconds in the KubeRay CR
  • refactors the example config to use kuberay's ability to autogenerate an autoscaler container
  • uses the new autoscalerOptions.image field to set the autoscaler image in the test
  • uses a small idleTimeoutSeconds to test scale-down in the autoscaler e2e test

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Integration tests

@DmitriGekhtman
Copy link
Contributor Author

Hmm...getting some logs claiming the operator doesn't have permissions to list service accounts, but the RBAC looks as expected...need to debug more.

@DmitriGekhtman
Copy link
Contributor Author

Ok, having the issue with the operator not being able to list service accounts in upstream KubeRay as well.
Also reproduced with some dummy objects in my K8s cluster. Maybe my K8s cluster is messed up somehow?

@DmitriGekhtman
Copy link
Contributor Author

Ok...no, definitely not my K8s cluster...going to keep debugging.

@DmitriGekhtman
Copy link
Contributor Author

...lol...
https://github.com/ray-project/kuberay/pull/285/files
I really wish K8s would validate that.

@DmitriGekhtman
Copy link
Contributor Author

There's some incorrect downscaling behavior which is causing the test to fail. Going to debug.

@DmitriGekhtman
Copy link
Contributor Author

Welcome, codeowner SWAT team.

@DmitriGekhtman
Copy link
Contributor Author

There's some incorrect downscaling behavior which is causing the test to fail. Going to debug.

For once, autoscaler was correct, I (test) was wrong. Test fixed. Let's wait for CI to run.

@DmitriGekhtman
Copy link
Contributor Author

Head pod failed to start in the CI, very strange...will probably have to debug the CI environment.

@DmitriGekhtman
Copy link
Contributor Author

It's an issue with ImagePull policy for the autoscaler container. This will require another quick fix to the KubeRay repo.

@DmitriGekhtman
Copy link
Contributor Author

...crossing fingers that it works and I don't have to debug it in the CI environment again....

@DmitriGekhtman
Copy link
Contributor Author

Ah. There's the green check mark.
Will let the CI run to completion before merging.

@DmitriGekhtman
Copy link
Contributor Author

100% green, except for the forsaken MacOS tests, which never run.
Merging.

@DmitriGekhtman DmitriGekhtman merged commit 836b085 into ray-project:master Jun 9, 2022
@DmitriGekhtman DmitriGekhtman deleted the dmitri/align-repos branch June 10, 2022 15:27
DmitriGekhtman added a commit to ray-project/kuberay that referenced this pull request Jun 19, 2022
Updates the autoscaler image to reflect changes from ray-project/ray#25386
lowang-bh pushed a commit to lowang-bh/kuberay that referenced this pull request Sep 24, 2023
Updates the autoscaler image to reflect changes from ray-project/ray#25386
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants