-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(sdk): accelerator type setting in kfp #11373
fix(sdk): accelerator type setting in kfp #11373
Conversation
An api breaking change was introduced in 2.10 that removed deprecate fields in the compilation step for the accelerator fields. This has resulted in the driver being unable to fetch the old fields. This change re-introduces the deprecated fields to allow for a proper timespan for allowing driver to adjust to the new values. Signed-off-by: Humair Khan <[email protected]>
Follow up to this would be to update the driver code, the golang proto changes snuck in here. These should have probably been added with the implementation pr for the accelerator changes. Which makes me think we really should be testing for proto changes and generated compiled diffs. We now just need to update driver code to utilize these fields instead of the deprecated ones. If we are expecting no other downstream impacts, we can remove these deprecated fields for KFP 2.4 or 2.5 |
cc @chensun PTAL this one I think is severe enough that it warrants a patch release for 2.10, being unable to use GPUs is no good |
Sounds reasonable for a patch release. I'll target it by this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
It should be a hard rule that any time we accidentally make a breaking API change, it warrants a patch release to undo the breakage. Of course, best if we never let breaking API changes sneak in 😅 It would also be good if we have some announcement mechanism to let people know that SDK 2.10.0 is broken in this way. Perhaps we can add a note to the releases page? Shrug. |
@gregsheremeta agree, FWIW I send out a note in the community slack, but a more permanent notice somewhere would be ideal. We can utilize the KFP discussions for this if we wanted to (we can pin those), or we can also Pin the issue itself |
In this PR, new API fields were introduced to set accelerator resources via pipeline spec. Note that
type
andcount
were deprecated. However in the follow up implementation pr here we can see that these fields are removed entirely from the compilation step. This has basically broken setting accelerator types in KFP, because the driver is expectingtype
andcount
to be present, and these are thus not being set at all.This PR re-introduces these fields, see test case for how the pipeline spec will look like. This is to allow for a proper deprecation period on the api, so we can adjust the driver changes accordingly.
To verify the changes you can use the following sample pipeline on the changes before/after:
After the changes this should compile to:
The resulting executor pod should have:
Fixes #11374