-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python SDK for Kubeflow Training Operator #1380
Comments
/kind question |
/cc @kubeflow/wg-training-leads |
I think we should have it, but do not have the bandwidth for it now. |
Yes. |
@johnugeorge: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
FYI Ketan ping me earlier and seems Flyte user wants to use it to submit jobs. flyteorg/flyte#1375 |
@alembiewski Do you have any specific requirements? Existing SDK should work out of the box since we have not changed API yet. If that works for you, we can consider to extend it to other frameworks. If not, what level of abstraction do you need? |
@Jeffwan, I think by having the unified SDK that supports multiple frameworks we could possibly reduce code duplication between client APIs by introducing a generic API client, which supports multiple model types (pytorchjob api and tfjob api look really similar to each other) - this approach will streamline the process of adding new frameworks to the SDK, but could be tricky to implement. The current model with separate SDK per framework works for us as well, although there are few serious limitations:
It seems to me that such limitations could be dropped just by regenerating the SDK models with OpenAPI generator as it was done recently for Katib: kubeflow/katib#1572, although maybe there are limitations that I'm not aware of. |
I agree with @alembiewski, it's better to have unify SDK for the training operator.
Yes, we should re-generate SDK with OpenAPI to support the latest Kubernetes python client version. |
@Jeffwan we use a custom python module to submit codes. We just reference the go api in Flyte |
Agree. It would be hard for all projects to have same version. We can try to have same version as others.
Yeah, I think at least we should regenerate it to reflect latest changes. I cut #1389 and it introduces some unnecessary files. We will resolve it later |
@Jeffwan, any updates on this? Is there anything I can help with? |
It's auto closed. I keep it open for pypi release update. |
Thanks, @Jeffwan! I built and uploaded the package to the TestPyPI repository for testing: |
Hey @Jeffwan, is the testing of the SDK still ongoing? Are there any estimations regarding when the package will be published to the PyPI? |
HI @alembiewski sorry I miss last message. The testing is done and we are good to go!
We can address it in README.md. For training blog stuff, I really like to promote this work in kubeflow/blog#110. WDYT? |
Sounds good, thanks @Jeffwan! Looking forward for the SDK to be released. Please let me know if any help is needed with publishing it to PyPI, glad to help with that |
@alembiewski If you give a hand, that would be great. Could you upload this package? Please add @terrytangyuan Can you share your account? |
Mine is terrytangyuan (same as my GitHub ID). Please add us as “owner”s once the package is uploaded to PyPI. Thank you! |
The package has been uploaded to PyPI: https://pypi.org/project/kubeflow-training/ 🚀 🎉 |
Great job cheers! |
Are there any plans to add Python SDK for the new all-in-one Kubeflow Training Operator?
The text was updated successfully, but these errors were encountered: