Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KubeRay][Autoscaler] Make KubeRay CRD version configurable #40357

Merged
merged 2 commits into from
Oct 16, 2023

Conversation

kevin85421
Copy link
Member

@kevin85421 kevin85421 commented Oct 16, 2023

Why are these changes needed?

The KubeRay community has decided to bump the CRD version from v1alpha1 to v1 in the KubeRay v1.0.0-rc.1 release. Initially, I also planned to delete CRD v1alpha1 in v1.0.0-rc.1. However, the Ray Autoscaler has hard-coded the CRD version to v1alpha1. As a result, the Autoscaler will throw an exception when attempting to fetch data from the Kubernetes API server, given that CRD v1alpha1 no longer exists. See ray-project/kuberay#1492 (comment) for more details.

This PR makes the CRD version configurable, and KubeRay v1.0.0-rc.1 will automatically inject the environment variable KUBERAY_CRD_VER (ray-project/kuberay#1496).

  • Compatibility between KubeRay and Ray
    • KubeRay v0.6.0: Support all Ray versions
      • KubeRay v0.6.0 only has CRD v1alpha1. It won't inject the KUBERAY_CRD_VER environment variable into the Autoscaler sidecar container. Therefore, Ray 2.8.0 (including this PR) will fall back to CRD v1alpha1. This ensures that KubeRay v0.6.0 remains compatible with Ray versions that incorporate this PR.
    • KubeRay v1.0.0: Support all Ray versions
      • KubeRay v1.0.0 supports both CRD v1alpha1 and v1, with v1 being the storage version. It automatically injects the KUBERAY_CRD_VER environment variable set to v1. If users are on KubeRay v1.0.0, the Autoscaler in Ray 2.8.0 and later versions (including this PR) will use CRD v1. For Ray versions older than 2.8.0, the Autoscaler will use CRD v1alpha1 since those versions don't include this PR.
    • KubeRay v1.1.0: Support Ray 2.8.0 and later
      • In KubeRay v1.1.0, I plan to remove CRD v1alpha1, leaving only CRD v1. Ray versions without this PR will no longer be supported. As a result, KubeRay v1.1.0 might be released concurrently with Ray 2.10.0.
      • [Update: May 10, 2024]: I won't remove the CRD v1alpha1 in KubeRay v1.1.0 due to compatibility issues with Kueue. Instead, I will remove it in KubeRay v1.2.0.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(
  • KubeRay v0.6.0: CI

  • KubeRay v1.0.0 (including CRD v1alpha1 and v1): Try to fetch both CRD v1alpha1 and CRD v1 from Kubernetes API server in the Autoscaler sidecar container, and both results are the same.
    Screen Shot 2023-10-15 at 11 46 07 PM

  • KubeRay v1.1.0 (only has CRD v1): Try to fetch both CRD v1alpha1 and CRD v1 from Kubernetes API server in the Autoscaler sidecar container. Fail to fetch data when I use CRD v1alpha1.
    Screen Shot 2023-10-15 at 11 27 09 PM

Signed-off-by: kaihsun <[email protected]>
Signed-off-by: kaihsun <[email protected]>
@kevin85421
Copy link
Member Author

cc @rickyyx

@kevin85421
Copy link
Member Author

cc @gvspraveen

Copy link
Contributor

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Do we need any additional test coverage for this?

@kevin85421
Copy link
Member Author

@rickyyx The current Ray CI covers part of it (the latest KubeRay release). KubeRay CI tests both the nightly KubeRay and the latest KubeRay release. After v1.0.0 is released, I will add a new CI test in KubeRay CI that only uses CRD v1.

@kevin85421
Copy link
Member Author

Open an issue to track the progress ray-project/kuberay#1499.

@kevin85421
Copy link
Member Author

Hi @rickyyx, does #40357 (comment) make sense to you? If it makes sense to you, could you please merge this PR? It's blocking the KubeRay release. Thank you!

@davidxia
Copy link
Contributor

just noticed a small typo in the description

In KubeRay v1.0.0, I plan to remove CRD v1alpha1

should be

In KubeRay v1.1.0, I plan to remove CRD v1alpha1

@kevin85421
Copy link
Member Author

@davidxia good catch! Updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants