Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL refactor #619

Merged
merged 14 commits into from
Jan 9, 2019
Merged

DSL refactor #619

merged 14 commits into from
Jan 9, 2019

Conversation

gaoning777
Copy link
Contributor

@gaoning777 gaoning777 commented Jan 3, 2019

combine two sanitize name function into one;
add more comments;
Aggregate certain function with similar functionality.


This change is Reviewable

Copy link
Contributor

@qimingj qimingj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One major comment.

@@ -108,7 +105,7 @@ def add_op(self, op: _container_op.ContainerOp, define_only: bool):
op: An operator of ContainerOp or its inherited type.
"""

kubernetes_name = _make_kubernetes_name(op.human_name)
kubernetes_name = _sanitize_k8s_name(op.human_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the previous change, so I am adding my thoughts here: one design goal is to hide k8s as much as possible in "dsl" layer, and push the k8s stuff to compiler (I used to call it "argo compiler). This way the DSL layer is more generic, and that's why there are "dsl" and "compiler" separate directories.

I feel like we don't have to sanitize the pipeline name here; We can store it as it is (so respect user's choice) and in compiler https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/compiler/compiler.py#L457 we can sanitize there. That way, we can move the util to compiler since it is very k8s specific.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for the DSL hiding k8s. However, the pipeline sanitizes the name for operators which will be part of the final argo yaml. (in Pipeline.add_op() function).
I can sanitize the op names all in the compiler codes, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. Sorry I mixed pipeline name with step name. I think we can sanitize the name in compiler too as you mentioned.

That way, a different "compiler" may choose to sanitize it in a different way, or even not sanitize it at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@Ark-kun Ark-kun Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pipeline class (do not confuse with the @pipeline decorator) is a compiler helper class. It can probably moved to DSL compiler. It's only used during the compilation and it makes sense to sanitize the names/ids at this point. The original name remains untouched in op.human_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pipeline class is used in the @pipeline decorator and it might not be a good idea to move the Pipeline class to the Compiler because the dsl would depend on the compiler otherwise. Then there would be a dependency loop since the compiler depends on the dsl library. Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pipeline class is used in the @pipeline decorator

This is an implementation detail. We've talked with @qimingj and AFAIK he agreed that it was a good idea to unlink them like we did with @python_component. I've prepared the code for the most common case, but there was a slight problem for multi-pipeline file compilation. If we deprecate that feature (it's not currently used anywhere) we can unlink them easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That way, a different "compiler" may choose to sanitize it in a different way, or even not sanitize it at all.

I agree with that. That's why I moved the sanitization code from the ContainerOp to the compiler-relater Pipeline class. As you remember, I tried to detangle the ContainerOp from the Pipeline even more, by making it not required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a proposal: Let's break the dependency between the DSL classes and the compiler by adding generic events/hooks for events like ContainerOp creation or @pipeline application. The compiler can set the handlers for those hooks to execute some compiler-specific code. This way, the DSL does not depend on the compiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ark-kun can you send a separate change and we can discuss more details? I feel it should be a separate change from this one. Small changes are more manageable.

The reason we had a "Pipeline" class are: 1) we need a "scope" that can record all ContainerOp objects. Hence the global Pipeline._default_pipeline variable. 2) Someday we can expose the Pipeline class to support dynamic pipeline construction in DSL (we don't expose that now because we want to limit the surfacing area and promote pipeline function).

Events/hooks is one option. We can compare the approaches by 1) Keep DSL as simple as it is now.
2) Favor simplicity for compiler provider. 3) Hopefully reduce or remove global variables.

Copy link
Contributor

@qimingj qimingj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jan 4, 2019
@qimingj
Copy link
Contributor

qimingj commented Jan 4, 2019

/lgtm

@qimingj
Copy link
Contributor

qimingj commented Jan 4, 2019

/lgtm

@gaoning777
Copy link
Contributor Author

/test kubeflow-pipeline-build-image

@gaoning777
Copy link
Contributor Author

/test kubeflow-pipeline-e2e-test

# Sanitize operator names and param names
sanitized_ops = {}
for op in p.ops.values():
sanitized_name = K8sHelper.sanitize_k8s_name(op.name)
Copy link
Contributor

@Ark-kun Ark-kun Jan 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK op.name is already sanitized and made unique when it's being added to the pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. You've removed that sanitization.
I'm not sure this is a good move. It's easier to fix the name before all the output references have been generated and passed around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation was to move the k8s related functions from DSL to compilers such that the implementations of another compiler in the future will not depend on K8s. The PR aims to moves the sanitization to the compiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation was to move the k8s related functions from DSL to compilers such that the implementations of another compiler in the future will not depend on K8s. The PR aims to moves the sanitization to the compiler.

I agree with both those ideas. What's debatable is whether the Pipeline class belongs to the compiler or DSL.

if param.op_name:
param.op_name = K8sHelper.sanitize_k8s_name(param.op_name)
for param in op.outputs.values():
param.name = K8sHelper.sanitize_k8s_name(param.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this will work at this stage? The argument placeholders are probably already embedded into op.command and op.args as strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The op.args are handled in the _op_to_template function in this PR.
However, I have not handled the op.command. Do we already support parameterized commands?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We did. See 875efea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this PR for the command as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jan 7, 2019
@qimingj
Copy link
Contributor

qimingj commented Jan 9, 2019

/lgtm

@gaoning777
Copy link
Contributor Author

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit d3c4add into kubeflow:master Jan 9, 2019
Linchin pushed a commit to Linchin/pipelines that referenced this pull request Apr 11, 2023
* There is a bug in the code to detect completed jobs. For completed
  jobs the condition can be of type "Complete". As a result, we aren't
  properly detecting when jobs have been completed and rerunning them
  if needed.

Related to kubeflow#762
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this pull request Mar 11, 2024
* add init github actions

* Remove old comments

* Update test_kfp_samples.sh

* add condition for venv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants