-
Notifications
You must be signed in to change notification settings - Fork 674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core feature] Allow setting separate resource configs for different replica types in Kubeflow Jobs #3308
[Core feature] Allow setting separate resource configs for different replica types in Kubeflow Jobs #3308
Comments
Thank you for opening your first issue here! 🛠 |
Thank you for reporting this. we also need separate config for spark task (driver, worker), and ray task (head node, worker node). we can add it in separate PR. |
This could be implemented similar to the current Dask integration. Essentially, for the |
@yubofredwang this is awesome, I think it looks great so far! A few thoughts:
|
@hamersaw thanks for the feedback!
|
I think this is ready for a PR and we can continue to iterate there? Does that sound good? Quick replies: 1 / 2. Lets put the resource |
@hamersaw thanks for the feedback. I am working on formalizing the other changes related. Will create a PR once done. |
Motivation: Why do you think this is important?
It is a common user scenario for users to set different resources for different type of replicas in Kubeflow Jobs such as TFJob and MPIJob.
For example, a user would want to run TFJob using the following resources:
However, this is not currently supported in Flyte.
Goal: What should the final outcome look like, ideally?
Users should be able to override the resources specified in task definition by providing extra resources configs in task config in TfJob.
Describe alternatives you've considered
We can make the resources field in task function to accept a new type TFJobResources, and implement different handling for related backend plugins. However, this requires lots of code changes and undermine consistencies of task definitions.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: