-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for ephemeral volumes and ingress creation support #1312
Added support for ephemeral volumes and ingress creation support #1312
Conversation
@psschwei This extends your PR by adding ephemeral volumes, which are sort of ephemeral PVC. Please take a look |
Closes ray-project#1309. I don't know why, after but downgrading the kind version to 0.11.1, I never observed the issue again. Whereas with 0.20.0, the issue is consistently reproducible. I haven't investigated which kind version between 0.11.1 and 0.20.0 is the first one that caused the issue. The reason for choosing 0.11.1 is that this is the version used in Ray CI, and we haven't observed the issue in Ray CI. https://github.com/architkulkarni/ray/blob/5cb837dbaf1e5875f4f365e67cec6b09d90bf710/ci/k8s/prep-k8s-environment.sh#L8 Signed-off-by: Archit Kulkarni <[email protected]>
Do not update pod labels if they haven't changed
Sets up the Buildkite CI pipeline to test the RayJob sample YAML files using kind. Related issue number Closes ray-project#1246 --------- Signed-off-by: Archit Kulkarni <[email protected]>
Api server makefile
Upgrade to Go 1.19
Fix release actions
KubeRay memory / scalability benchmark
Signed-off-by: Boris Lublinsky <[email protected]>
…t#1342) Bump the golangci-lint version in the api server makefile
…ay-project#1340) * add service yaml for nlp * Documentation fixes * Fix instructions * Apply suggestions from code review Co-authored-by: Kai-Hsun Chen <[email protected]> Signed-off-by: Praveen <[email protected]> * Fix tolerations comment * review comments * Update docs/guidance/stable-diffusion-rayservice.md Signed-off-by: Kai-Hsun Chen <[email protected]> --------- Signed-off-by: Praveen <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Kai-Hsun Chen <[email protected]>
proto/cluster.proto
Outdated
@@ -221,22 +221,22 @@ message HeadGroupSpec { | |||
string service_type = 3; | |||
// Optional. Enable Ingress | |||
// if Ingress is enabled, we might have to specify annotation IngressClassAnnotationKey, for the cluster itself, defining Ingress class | |||
bool enableIngress = 4; | |||
bool enableIngress = 11; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Kind of odd to have a higher number so high up in the code. Can you please move lines 222-224 past 239.
…1336) Removed use of the of BUILD_FLAGS in apiserver makefile
…istening to Kubernetes events (ray-project#1341) Redefine the behavior for deleting Pods and stop listening to Kubernetes events
@blublinsky Could you give an example of an ephemeral volume rest api request. I have tried but the api server would just crash
Error:
see entire error message debug_ephemeral_vol.txt |
@@ -197,7 +200,16 @@ message Volume { | |||
HOSTTOCONTAINER = 1; | |||
BIDIRECTIONAL = 2; | |||
} | |||
MountPropagationMode mount_propagation_mode = 7; | |||
MountPropagationMode mount_propagation_mode = 7; | |||
// If indicate ephemeral, we need to let user specify volumeClaimTemplate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a volumeClaimTemplate
message ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are specifying here volumeClaimTemplate
parameters - storage class, storage size and access mode. The actual volumeClaimTemplate
is build based on them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the comment to make it clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @tedhtchang, If you use ephemeral volumes, you need to specify, at least the size. See this:
var testEphemeralVolume = &api.Volume{
Name: "test-ephemeral",
VolumeType: api.Volume_EPHEMERAL,
MountPath: "/ephimeral/dir",
Storage: "10Gi",
}
from the cluster_test.go
Adds the field RuntimeEnvYAML to the RayJob CRD which accepts a multi-line YAML string. This format is preferred for two reasons: Consistency with the ServeConfigV2 format, which is also a Ray configuration specified as a multi-line YAML string (Related to above) Allows using snake_case fields without modification We preserve the older field RuntimeEnv which accepts a base64-encoded string of the runtime env. We mark it as deprecated in the documentation. We raise an error if both fields are specified. Related issue number Closes ray-project#1195 --------- Signed-off-by: Archit Kulkarni <[email protected]>
… api_server_volumes_ingress
@blublinsky The error handling worked for me. Thanks.
|
Sure @tedhtchang.
|
I still couldn't get it to work. Here is my setup:
Nginx ingress controller
Kuberay
Apiserver run on separate terminal
Compute template
raycluster with ingress
kubectl get ingress myraycluster-head-ingress -oyaml
curl 127.0.0.1/myraycluster/
|
@tedhtchang
With these annotations in place the correct Ingress is created and you can use the browser to access dashboard |
@tedhtchang can you, please, approve this so that @kevin85421 can merge it? |
@blublinsky The browser just opened a blank page. The curll http://localhost/myraycluster/ worked
|
@tedhtchang
|
yes. I think so... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We should add some examples to the README.md later.
@kevin85421 you can use #1409 in place of this. It only has required files |
Close this PR because the replacement #1409 is merged. |
Why are these changes needed?
The current implementation of the API server is still not completely on par with the operator's capabilities. Two important things that are missing:
This PR is adding support for those. For individual PVC, this PR is leveraging generic ephemeral volumes https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes, which are similar to PVCs, but are completely controlled by the pod. The downside, they do not survive the pod crashes, which is probably ok, as they are mostly used for scratch data in the pod, for example, disk spilling. The upside of this approach is that it is not necessary to manage PVCs lifecycle.
Related issue number
closes #1087
Checks