Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

fix v1 pytorch job plugin with elastic policy #359

Merged
merged 3 commits into from
Jun 14, 2023

Conversation

yubofredwang
Copy link
Contributor

@yubofredwang yubofredwang commented Jun 12, 2023

TL;DR

There was a bug that introduced in PR #345. The pytorch v2 task template is handled incorrectly.
Added Elastic Config parsing for v1. It follows the exact same handling logic as v0 since they are kept as same as v0.

Follow-up PR in flytekit: PR #1690

Type

  • Bug Fix
  • Feature
  • Plugin

Are all requirements met?

  • Code completed
  • Smoke tested
  • Unit tests added
  • Code documentation added
  • Any pending items have an associated Issue

@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Merging #359 (f451499) into master (2a7a6f9) will increase coverage by 1.40%.
The diff coverage is 100.00%.

❗ Current head f451499 differs from pull request most recent head 1deafe2. Consider uploading reports for the commit 1deafe2 to get more accurate results

@@            Coverage Diff             @@
##           master     #359      +/-   ##
==========================================
+ Coverage   62.62%   64.02%   +1.40%     
==========================================
  Files         152      152              
  Lines       12789    10378    -2411     
==========================================
- Hits         8009     6645    -1364     
+ Misses       4168     3122    -1046     
+ Partials      612      611       -1     
Flag Coverage Δ
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
go/tasks/logs/logging_utils.go 90.00% <100.00%> (+2.50%) ⬆️
go/tasks/pluginmachinery/tasklog/template.go 97.80% <100.00%> (+0.83%) ⬆️
...o/tasks/plugins/k8s/kfoperators/pytorch/pytorch.go 79.86% <100.00%> (+3.32%) ⬆️

... and 130 files with indirect coverage changes

hamersaw
hamersaw previously approved these changes Jun 13, 2023
Copy link
Contributor

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the elastic policy for task type version 1 was never being applied because we were only looking at the elastic config from version 0?

The configs for version 0 and 1 have the same functions right? Can we just wrap this in an interface and remove a lot of the boilerplate code? Something like:

type elasticConfig interface {
    func GetMinReplicas() uint32
    func GetMaxReplicas() uint32
    // ...
}

var elasticConfig elasticConfig
if taskTemplate.TaskTypeVersion == 0 {
    // ...
    elasticConfig = pytorchTaskExtraArgs.GetElasticConfig()
else {
    // ...
    elasticConfig = kfPytorchTaskExtraArgs.GetElasticConfig()
}

if elasticConfig != nil {
    // leave as is
}

@hamersaw hamersaw self-requested a review June 13, 2023 17:47
@hamersaw hamersaw dismissed their stale review June 13, 2023 17:47

approved by mistake

Signed-off-by: Yubo Wang <[email protected]>
@yubofredwang
Copy link
Contributor Author

So the elastic policy for task type version 1 was never being applied because we were only looking at the elastic config from version 0?

The configs for version 0 and 1 have the same functions right? Can we just wrap this in an interface and remove a lot of the boilerplate code? Something like:

type elasticConfig interface {
    func GetMinReplicas() uint32
    func GetMaxReplicas() uint32
    // ...
}

var elasticConfig elasticConfig
if taskTemplate.TaskTypeVersion == 0 {
    // ...
    elasticConfig = pytorchTaskExtraArgs.GetElasticConfig()
else {
    // ...
    elasticConfig = kfPytorchTaskExtraArgs.GetElasticConfig()
}

if elasticConfig != nil {
    // leave as is
}

yes, it was not handled initially. very good suggestion. I updated the code! thanks

Signed-off-by: Yubo Wang <[email protected]>
@hamersaw hamersaw merged commit 318fa6b into flyteorg:master Jun 14, 2023
eapolinario pushed a commit that referenced this pull request Sep 6, 2023
* fix pytorch job plugin elastic policy

Signed-off-by: Yubo Wang <[email protected]>

* add ElasticConfig interface

Signed-off-by: Yubo Wang <[email protected]>

* add more testing

Signed-off-by: Yubo Wang <[email protected]>

---------

Signed-off-by: Yubo Wang <[email protected]>
Co-authored-by: Yubo Wang <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants