Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update sample configurations with larger resource requests #417

Closed
2 tasks done
DmitriGekhtman opened this issue Jul 26, 2022 · 4 comments · Fixed by #426
Closed
2 tasks done

[Feature] Update sample configurations with larger resource requests #417

DmitriGekhtman opened this issue Jul 26, 2022 · 4 comments · Fixed by #426
Labels
enhancement New feature or request

Comments

@DmitriGekhtman
Copy link
Collaborator

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

The KubeRay sample configurations show Ray clusters with tiny Ray nodes (1 CPU) etc.
It's fine to have a couple such configurations for local experiments -- but the majority of the samples should emphasize real life applications where a Ray node should be sized to take up a sizable fraction of a K8s node (if not the entire node).

Ray is not meant to operate with 1 CPU nodes.

Use case

Prevent users from burning themselves with tiny Ray nodes.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@DmitriGekhtman DmitriGekhtman added the enhancement New feature or request label Jul 26, 2022
@akanso
Copy link
Collaborator

akanso commented Jul 26, 2022

We use multiple Worker-groups in a single ray cluster. Each group has a difference functionality, some have 6 or 12 cores and others have only 2 cores.

I think in our examples and docs we can add a note that users should not limit themselves to a single core, but I don't think we should "prevent" them :)

@DmitriGekhtman
Copy link
Collaborator Author

Thanks Ali, great point.

I'm just curious -- 2 cores is a little small -- what's the application for that?

@akanso
Copy link
Collaborator

akanso commented Jul 26, 2022

we dynamically add and remove simulators used for RL during training, some simulators do not need more than 0.5 cores. We can share the sim config using Ray GCS. And that is why they sit in their own worker-group. When the training is complete, we can remove the entire Ray Cluster with all the workers.

@DmitriGekhtman DmitriGekhtman added this to the v0.3.0 release milestone Jul 26, 2022
@DmitriGekhtman
Copy link
Collaborator Author

Ray folks consider this sufficiently important for preventing user confusion that we consider this a release blocker -- I've added the label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants