Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Environment variable to select Ray container is lower case sensitive. #1078

Closed
2 tasks done
enozcan opened this issue May 11, 2023 · 6 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@enozcan
Copy link

enozcan commented May 11, 2023

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

RAY=true environment variable to identify Ray container does not seem to be working properly due to the check here:

if env.Name == strings.ToLower("ray") && env.Value == strings.ToLower("true") {

where the check does not match with the command a few lines below:

log.Info("Head pod container with index " + strconv.Itoa(i) + " identified as Ray container based on env RAY=true.")

It actually checks for ray=true env variable in a case sensitive manner.

Reproduction script

--

Anything else

It must align with the logged statement for RAY=true format. Also not sure if that is still valid as I have not seen that within the docs but in the source.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@enozcan enozcan added the bug Something isn't working label May 11, 2023
@kevin85421
Copy link
Member

Good catch! Would you mind opening a PR to fix it?

@Yicheng-Lu-llll
Copy link
Contributor

Yicheng-Lu-llll commented May 11, 2023

In my understanding, It is better to still set the ray container to be the first container (containers[0]). There are segments of code that do not utilize the getRayContainerIndex method. For instance:

for _, port := range podTemplate.Spec.Containers[0].Ports {

podTemplate.Spec.Containers[0].Ports = append(podTemplate.Spec.Containers[0].Ports, metricsPort)

A thorough examination may need to fix this issue.

@enozcan
Copy link
Author

enozcan commented May 11, 2023

Would you mind opening a PR to fix it?

@kevin85421 Sure, just let me know about the final decision, e.g. case sensitive or not, both for key and value.

It is better to still set the ray container to be the first container (containers[0])

@Yicheng-Lu-llll That is a strict restriction, and from user point of view having this env var option (and documenting it clearly) would be a better alternative.

@kevin85421
Copy link
Member

  • I noticed that there is a public function that is similar to getRayContainerIndex. Maybe we can consider removing getRayContainerIndex and using FindRayContainerIndex instead.

    func FindRayContainerIndex(spec corev1.PodSpec) (index int) {

  • What's the current behavior if Ray container is not the first one? Will it always fail?

@enozcan
Copy link
Author

enozcan commented May 12, 2023

What's the current behavior if Ray container is not the first one? Will it always fail?

I believe it silently causes unexpected behavior. Say I define 2 containers, linkerd and main respectively. Operator will check for linkerd container as it’s the first one, override its command and args (via merging the original one with ray start) - since linkerd command does not have “ray start”. In that case the other container which is supposed to be Ray worker will be useless. Consider the case where linkerd container resources are relatively small compared to the main one, or pre-installed libraries do not exist in the linkerd but main container, etc. So I don’t think it always causes failure, but causes other problems that are going to be visible only when pod is inspected through describe, operator annotations, etc.

@kevin85421
Copy link
Member

#1379 enforces the Ray container must be the first app container in a Pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants