-
Notifications
You must be signed in to change notification settings - Fork 53
fix v1 pytorch job plugin with elastic policy #359
Conversation
Signed-off-by: Yubo Wang <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #359 +/- ##
==========================================
+ Coverage 62.62% 64.02% +1.40%
==========================================
Files 152 152
Lines 12789 10378 -2411
==========================================
- Hits 8009 6645 -1364
+ Misses 4168 3122 -1046
+ Partials 612 611 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the elastic policy for task type version 1 was never being applied because we were only looking at the elastic config from version 0?
The configs for version 0 and 1 have the same functions right? Can we just wrap this in an interface and remove a lot of the boilerplate code? Something like:
type elasticConfig interface {
func GetMinReplicas() uint32
func GetMaxReplicas() uint32
// ...
}
var elasticConfig elasticConfig
if taskTemplate.TaskTypeVersion == 0 {
// ...
elasticConfig = pytorchTaskExtraArgs.GetElasticConfig()
else {
// ...
elasticConfig = kfPytorchTaskExtraArgs.GetElasticConfig()
}
if elasticConfig != nil {
// leave as is
}
Signed-off-by: Yubo Wang <[email protected]>
yes, it was not handled initially. very good suggestion. I updated the code! thanks |
Signed-off-by: Yubo Wang <[email protected]>
* fix pytorch job plugin elastic policy Signed-off-by: Yubo Wang <[email protected]> * add ElasticConfig interface Signed-off-by: Yubo Wang <[email protected]> * add more testing Signed-off-by: Yubo Wang <[email protected]> --------- Signed-off-by: Yubo Wang <[email protected]> Co-authored-by: Yubo Wang <[email protected]>
TL;DR
There was a bug that introduced in PR #345. The pytorch v2 task template is handled incorrectly.
Added Elastic Config parsing for v1. It follows the exact same handling logic as v0 since they are kept as same as v0.
Follow-up PR in flytekit: PR #1690
Type
Are all requirements met?