Python SDK for Kubeflow Training Operator #1380

alembiewski · 2021-08-26T08:20:50Z

Are there any plans to add Python SDK for the new all-in-one Kubeflow Training Operator?

alembiewski · 2021-08-26T08:21:59Z

/kind question

gaocegege · 2021-08-26T08:29:36Z

/cc @kubeflow/wg-training-leads

gaocegege · 2021-08-26T08:30:05Z

I think we should have it, but do not have the bandwidth for it now.

johnugeorge · 2021-08-26T09:52:19Z

Yes.
/help

google-oss-robot · 2021-08-26T09:52:20Z

@johnugeorge:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

Yes.
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jeffwan · 2021-08-27T05:35:51Z

FYI

Ketan ping me earlier and seems Flyte user wants to use it to submit jobs. flyteorg/flyte#1375

Jeffwan · 2021-08-27T05:37:29Z

@alembiewski Do you have any specific requirements? Existing SDK should work out of the box since we have not changed API yet. If that works for you, we can consider to extend it to other frameworks. If not, what level of abstraction do you need?

alembiewski · 2021-08-27T10:50:38Z

@Jeffwan, I think by having the unified SDK that supports multiple frameworks we could possibly reduce code duplication between client APIs by introducing a generic API client, which supports multiple model types (pytorchjob api and tfjob api look really similar to each other) - this approach will streamline the process of adding new frameworks to the SDK, but could be tricky to implement.

The current model with separate SDK per framework works for us as well, although there are few serious limitations:

kubernetes client version lock-in makes the SDK incompatible with the other Kubeflow SDKs, such as KFserving 0.5.1, KFP with the upcoming Katib SDK.
inconsistencies between the API and SDK (e.g.: Update swagger.json schema for TFJobSpec to include RunPolicy #1278)

It seems to me that such limitations could be dropped just by regenerating the SDK models with OpenAPI generator as it was done recently for Katib: kubeflow/katib#1572, although maybe there are limitations that I'm not aware of.

andreyvelich · 2021-08-27T11:43:43Z

I agree with @alembiewski, it's better to have unify SDK for the training operator.
Our users just can run: pip install kubeflow-training to use it.

It seems to me that such limitations could be dropped just by regenerating the SDK models with OpenAPI generator as it was done recently for Katib: kubeflow/katib#1572, although maybe there are limitations that I'm not aware of.

Yes, we should re-generate SDK with OpenAPI to support the latest Kubernetes python client version.

kumare3 · 2021-08-27T14:48:33Z

FYI

Ketan ping me earlier and seems Flyte user wants to use it to submit jobs. flyteorg/flyte#1375

@Jeffwan we use a custom python module to submit codes. We just reference the go api in Flyte

Jeffwan · 2021-08-29T22:38:44Z

kubernetes client version lock-in makes the SDK incompatible with the other Kubeflow SDKs, such as KFserving 0.5.1, KFP with the upcoming Katib SDK.

Agree. It would be hard for all projects to have same version. We can try to have same version as others.

inconsistencies between the API and SDK (e.g.: Update swagger.json schema for TFJobSpec to include RunPolicy #1278)

Yeah, I think at least we should regenerate it to reflect latest changes. I cut #1389 and it introduces some unnecessary files. We will resolve it later

alembiewski · 2021-09-21T16:43:47Z

@Jeffwan, any updates on this? Is there anything I can help with?

Jeffwan · 2021-10-04T00:51:35Z

It's auto closed. I keep it open for pypi release update.

alembiewski · 2021-10-04T08:59:52Z

Thanks, @Jeffwan! I built and uploaded the package to the TestPyPI repository for testing:
https://test.pypi.org/project/kubeflow-training/1.3.0/#description
After it is verified and tested, we can then release it to PyPI. Should we update the release notes mentioning the updated SDK for 1.3.0 after the package is published?

alembiewski · 2021-10-13T19:26:03Z

Hey @Jeffwan, is the testing of the SDK still ongoing? Are there any estimations regarding when the package will be published to the PyPI?

Jeffwan · 2021-10-14T00:07:31Z

Hey @Jeffwan, is the testing of the SDK still ongoing? Are there any estimations regarding when the package will be published to the PyPI?

HI @alembiewski sorry I miss last message. The testing is done and we are good to go!

I. Should we update the release notes mentioning the updated SDK for 1.3.0 after the package is published?

We can address it in README.md. For training blog stuff, I really like to promote this work in kubeflow/blog#110. WDYT?

alembiewski · 2021-10-14T08:29:16Z

Sounds good, thanks @Jeffwan! Looking forward for the SDK to be released. Please let me know if any help is needed with publishing it to PyPI, glad to help with that

Jeffwan · 2021-10-16T00:09:45Z

@alembiewski If you give a hand, that would be great. Could you upload this package? Please add andreyvelich, jiaxin.shan and @terrytangyuan as maintainer as well.

@terrytangyuan Can you share your account?

terrytangyuan · 2021-10-16T00:14:56Z

Mine is terrytangyuan (same as my GitHub ID).

Please add us as “owner”s once the package is uploaded to PyPI. Thank you!

alembiewski · 2021-10-16T19:53:41Z

The package has been uploaded to PyPI: https://pypi.org/project/kubeflow-training/ 🚀 🎉
Added accounts mentioned above as owners, please check your mailbox.

Jeffwan · 2021-10-17T01:36:40Z

Great job cheers!

google-oss-robot added the kind/question label Aug 26, 2021

google-oss-robot added the help wanted label Aug 26, 2021

alembiewski mentioned this issue Sep 22, 2021

Add Python SDK for Kubeflow Training Operator #1420

Merged

google-oss-robot closed this as completed in #1420 Oct 3, 2021

Jeffwan reopened this Oct 4, 2021

alembiewski closed this as completed Oct 17, 2021

shaowei-su mentioned this issue Apr 1, 2022

Add setup info for Python SDK kubeflow/mpi-operator#463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python SDK for Kubeflow Training Operator #1380

Python SDK for Kubeflow Training Operator #1380

alembiewski commented Aug 26, 2021

alembiewski commented Aug 26, 2021

gaocegege commented Aug 26, 2021

gaocegege commented Aug 26, 2021

johnugeorge commented Aug 26, 2021

google-oss-robot commented Aug 26, 2021

Jeffwan commented Aug 27, 2021

Jeffwan commented Aug 27, 2021 •

edited

Loading

alembiewski commented Aug 27, 2021 •

edited

Loading

andreyvelich commented Aug 27, 2021

kumare3 commented Aug 27, 2021

Jeffwan commented Aug 29, 2021

alembiewski commented Sep 21, 2021 •

edited

Loading

Jeffwan commented Oct 4, 2021

alembiewski commented Oct 4, 2021 •

edited

Loading

alembiewski commented Oct 13, 2021

Jeffwan commented Oct 14, 2021

alembiewski commented Oct 14, 2021

Jeffwan commented Oct 16, 2021

terrytangyuan commented Oct 16, 2021

alembiewski commented Oct 16, 2021

Jeffwan commented Oct 17, 2021

Python SDK for Kubeflow Training Operator #1380

Python SDK for Kubeflow Training Operator #1380

Comments

alembiewski commented Aug 26, 2021

alembiewski commented Aug 26, 2021

gaocegege commented Aug 26, 2021

gaocegege commented Aug 26, 2021

johnugeorge commented Aug 26, 2021

google-oss-robot commented Aug 26, 2021

Jeffwan commented Aug 27, 2021

Jeffwan commented Aug 27, 2021 • edited Loading

alembiewski commented Aug 27, 2021 • edited Loading

andreyvelich commented Aug 27, 2021

kumare3 commented Aug 27, 2021

Jeffwan commented Aug 29, 2021

alembiewski commented Sep 21, 2021 • edited Loading

Jeffwan commented Oct 4, 2021

alembiewski commented Oct 4, 2021 • edited Loading

alembiewski commented Oct 13, 2021

Jeffwan commented Oct 14, 2021

alembiewski commented Oct 14, 2021

Jeffwan commented Oct 16, 2021

terrytangyuan commented Oct 16, 2021

alembiewski commented Oct 16, 2021

Jeffwan commented Oct 17, 2021

Jeffwan commented Aug 27, 2021 •

edited

Loading

alembiewski commented Aug 27, 2021 •

edited

Loading

alembiewski commented Sep 21, 2021 •

edited

Loading

alembiewski commented Oct 4, 2021 •

edited

Loading