Add API to rerun the pipeline #1720

IronPan · 2019-08-02T11:32:56Z

This change will add a new API for rerunning a failed pipeline without repeating the succeeded steps.

Instead of using argo workflow UID as the run ID, create the run id from API server, and mark the workflow with that run ID as label. Change persistence agent to lookup and sync run keyed by that ID label. This is due to the fact that when recreating an argo workflow, the workflow UID won't be same, it's no longer the unique identifier for a run.
Replace {{workflow.id}} with run id. {{workflow.id}} is argo specific and should be implementation details for KFP. See here for another work to hide this SDK - Hiding Argo's workflow.uid placeholder behind DSL #1683

This change is

neuromage

@IronPan can you explain the use-case here a little bit? I'd like to understand this in the context of caching with metadata. When we resubmit a run, is this submitting the same pipeline? How is it different from cloning the run?

neuromage · 2019-08-02T15:15:29Z

backend/src/apiserver/client_manager.go

@@ -67,6 +67,7 @@ type ClientManager struct {
 	swfClient              scheduledworkflowclient.ScheduledWorkflowInterface
 	time                   util.TimeInterface
 	uuid                   util.UUIDGeneratorInterface
+	randomString 					 util.RandomStringInterface


neuromage · 2019-08-02T15:20:43Z

backend/src/apiserver/client_manager.go

@@ -119,6 +120,11 @@ func (c *ClientManager) UUID() util.UUIDGeneratorInterface {
 	return c.uuid
 }

+func (c *ClientManager) RandomString() util.RandomStringInterface{


I strongly recommend getting rid of this ClientManager abstraction. It adds unnecessary complexity to the code. This function is a perfect example of code indirection that adds to cognitive burden for the reader of the code. There is no need for generating a random string to be a function of ClientManager. We can push this to its own function where it is used, and in tests, replace that with a mock as needed, i.e.

file where used:

var randomStringGenFn = func() { return random() }

and then in tests:

randomStringGenFn = func() { return "abc"}

I would +100 for getting rid of it. maybe do it separately?

neuromage · 2019-08-02T15:22:24Z

backend/src/apiserver/resource/resource_manager.go

@@ -64,6 +65,7 @@ type ResourceManager struct {
 	scheduledWorkflowClient scheduledworkflowclient.ScheduledWorkflowInterface
 	time                    util.TimeInterface
 	uuid                    util.UUIDGeneratorInterface
+	randomString 						util.RandomStringInterface


backend/src/apiserver/server/util.go

IronPan · 2019-08-05T22:05:26Z

/test kubeflow-pipeline-e2e-test

neuromage · 2019-08-06T15:08:22Z

backend/src/apiserver/client/pod.go

+	return clientSet.CoreV1().Pods(namespace), nil
+}
+
+// creates a new client for the Kubernetes pod.


Nit: CreatePodClientOrFatal creates....

https://golang.org/doc/effective_go.html#commentary

thanks. done

neuromage · 2019-08-06T15:09:07Z

backend/src/apiserver/client/pod.go

+	"time"
+)
+
+func CreatePodClient(namespace string) (v1.PodInterface, error) {


Should this be public?

made it private.

neuromage · 2019-08-06T15:10:22Z

backend/src/apiserver/client/pod.go

+}
+
+// creates a new client for the Kubernetes pod.
+func CreatePodClientOrFatal(namespace string, initConnectionTimeout time.Duration) v1.PodInterface{


Format the line?

Also, do we really want to die here? This kills the APIServer. Why not just return an error to the client? This is a blocking call anyway.

this is called during api server startup. it implies something wrong with cluster that can't be recovered. this is what we do for other clients in the same folder too.

neuromage · 2019-08-06T15:24:35Z

backend/src/apiserver/resource/resource_manager.go

-	if err = deletePods(podsToDelete, newWorkflow.ObjectMeta.Namespace); err != nil {
-		return util.NewInternalServerError(err, "Retry run failed. Failed to clean up the failed pods from previous run.")
-	}
+	//


Commented code?

sorry it was not ready yet. it's ready now

neuromage · 2019-08-06T15:25:33Z

backend/src/apiserver/resource/pod_fake.go

@@ -0,0 +1,79 @@
+package resource


Why do we need this? Client side mocks are really difficult to maintain.

this is following the same as other fakes in the same folder. It's used for testing the Retry logic.

neuromage · 2019-08-06T15:27:33Z

backend/src/agent/persistence/worker/metrics_reporter.go

@@ -49,7 +49,11 @@ func (r MetricsReporter) ReportMetrics(workflow *util.Workflow) error {
 	if workflow.Status.Nodes == nil {
 		return nil
 	}
-	runID := string(workflow.UID)
+	if _, ok := workflow.ObjectMeta.Labels[util.LabelKeyWorkflowRunId]; !ok {
+		// Skip reporting if the workflow doesn't have the run id label


When does this happen?

It's unlikely going to happen. It only happen if someone delibrartly call this api with a workflow that's not created by KFP.

IronPan · 2019-08-06T23:30:32Z

/assign @neuromage @Ark-kun

IronPan · 2019-08-06T23:56:41Z

@hongye-sun @neuromage @Ark-kun This PR is ready for another round of review

neuromage

/lgtm

Thanks @IronPan!

IronPan · 2019-08-07T07:39:59Z

/approve

k8s-ci-robot · 2019-08-07T07:40:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: IronPan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [IronPan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

IronPan · 2019-08-07T19:07:45Z

/test kubeflow-pipeline-sample-test

neuromage · 2019-08-07T19:11:17Z

/lgtm

* Update Alibi Explainers to 0.6.0 * fix lint * Update shap to 0.39.0 * Fix tests * update explainer tabular test

IronPan and others added 24 commits August 1, 2019 01:05

add resubmit proto

aab80cd

add compiled code

6f85bd0

fix

67ca697

add resubmit proto

9d1510a

add

13df218

refactor

29fe282

update builder

9c6f9c7

refactor

83e2e2a

Merge branch 'cache' of https://github.com/IronPan/pipelines into cache

ce5f96f

refactor

c0c7d80

refactor

48a23a3

refactor

eae813b

refactor

87cf310

refactor

89fd63a

add test

5ab51dd

add test

1ccee05

add test

d53d3a3

add test

75a9db9

fix test

40000c9

fix test

b4fabbb

fix test

8a8c4d9

fix test

dbe4d5f

fix test

028802e

fix test

fe8fe51

k8s-ci-robot assigned hongye-sun Aug 2, 2019

k8s-ci-robot added the size/XXL label Aug 2, 2019

k8s-ci-robot requested review from neuromage and paveldournov August 2, 2019 11:34

neuromage reviewed Aug 2, 2019

View reviewed changes

hongye-sun reviewed Aug 2, 2019

View reviewed changes

backend/src/apiserver/server/util.go Outdated Show resolved Hide resolved

backend/src/apiserver/server/util.go Outdated Show resolved Hide resolved

IronPan added 4 commits August 5, 2019 10:49

add error handling

02469a7

reorder the call

277545e

Merge remote-tracking branch 'upstream/master' into cache

6b19e03

remove logic to update the database entry

6e4d2a7

add mock

3cae9dc

neuromage reviewed Aug 6, 2019

View reviewed changes

IronPan added 3 commits August 6, 2019 13:54

add tests for rerousrce manager

be0d9ba

update error handling logic

bfe866f

fix tests

e50fa8c

k8s-ci-robot assigned Ark-kun and neuromage Aug 6, 2019

address comments

91064c2

neuromage reviewed Aug 7, 2019

View reviewed changes

k8s-ci-robot added the lgtm label Aug 7, 2019

k8s-ci-robot added the approved label Aug 7, 2019

Merge remote-tracking branch 'upstream/master' into cache

f534992

k8s-ci-robot removed the lgtm label Aug 7, 2019

k8s-ci-robot added the lgtm label Aug 7, 2019

IronPan merged commit a9602fb into kubeflow:master Aug 7, 2019

This was referenced Aug 10, 2019

Add integration test for Run's Retry API #1800

Closed

Will pipelines support the same behavior as argo retry #1198

Closed

elikatsis mentioned this pull request Jan 8, 2020

VolumeOp doesn't support GC after workflow deletion #1779

Closed

magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this pull request Oct 22, 2023

Update Alibi Explainers to 0.6.0 (kubeflow#1720)

4bca76d

* Update Alibi Explainers to 0.6.0 * fix lint * Update shap to 0.39.0 * Fix tests * update explainer tabular test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to rerun the pipeline #1720

Add API to rerun the pipeline #1720

IronPan commented Aug 2, 2019 •

edited

Loading

neuromage left a comment

neuromage Aug 2, 2019

IronPan Aug 2, 2019

neuromage Aug 2, 2019

IronPan Aug 2, 2019

neuromage Aug 2, 2019

IronPan Aug 2, 2019

IronPan commented Aug 5, 2019

neuromage Aug 6, 2019

IronPan Aug 6, 2019

neuromage Aug 6, 2019

IronPan Aug 6, 2019

neuromage Aug 6, 2019

IronPan Aug 6, 2019

neuromage Aug 6, 2019

IronPan Aug 6, 2019

neuromage Aug 6, 2019

IronPan Aug 6, 2019

neuromage Aug 6, 2019

IronPan Aug 6, 2019

IronPan commented Aug 6, 2019

IronPan commented Aug 6, 2019

neuromage left a comment

IronPan commented Aug 7, 2019

k8s-ci-robot commented Aug 7, 2019

IronPan commented Aug 7, 2019

neuromage commented Aug 7, 2019

Add API to rerun the pipeline #1720

Add API to rerun the pipeline #1720

Conversation

IronPan commented Aug 2, 2019 • edited Loading

neuromage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IronPan commented Aug 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IronPan commented Aug 6, 2019

IronPan commented Aug 6, 2019

neuromage left a comment

Choose a reason for hiding this comment

IronPan commented Aug 7, 2019

k8s-ci-robot commented Aug 7, 2019

IronPan commented Aug 7, 2019

neuromage commented Aug 7, 2019

IronPan commented Aug 2, 2019 •

edited

Loading