build_python_component does not automatically configure the file outputs such as mlpipeline-ui-metadata in the containerop #2317

gaoning777 · 2019-10-07T14:25:54Z

When the component python function contains the output such as mlpipeline-metrics.json or mlpipeline-ui-metadata.json, the generated containerop (by build_python_component) does not contain the corresponding file_output.

Ark-kun · 2019-10-08T03:01:15Z

This does not seem to be a regression, since artifacts did not previously appear in ContainerOp.file_outputs.
However, this issue should be fixed as part of my efforts for improving data passing.
Thank you for reminding.
I wanted to clean up this part of ContainerOp, but forgot that it's critical for passing artifacts when using ContainerOp.

I'll fix this issue.

gaoning777 · 2019-10-08T06:07:57Z

Agreed that this is not regression but is a bug that we need to fix.
Thanks, Alexey

Ark-kun · 2019-10-12T01:21:57Z

I discovered two problems:

This is a breaking change that will break some of the existing samples: There are some component (especially GCP ones) that only have one output. So, in samples, we refer to that output as task.output. Adding mlpipeline-ui-metadata output makes the task.output stop working, because there are now multiple outputs.
Artifact names. DSL-Compiler prepends the task name to every output name: task_name-output_name (probably it was easiest way for Bradley to ensure uniqueness of downstream input names which are based on upstream output names). Then the names in the compiled workflow would be my-task-3-mlpipeline-ui-metadata. And this will prevent the frontend and backend from recognizing those artifacts.

So, adding those two artifacts to file_outputs is not possible at this moment.

Ark-kun · 2019-10-12T01:24:54Z

Changed the title since this is not limited/related to the components - if you try to do the same with the ContainerOp, you'll get the same result:

task = ContainerOp(
    file_outputs={
        'mlpipeline-ui-metadata': '/tmp/mlpipeline-ui-metadata.json',
    }
)
assert not task.outputs

gaoning777 · 2019-10-14T03:50:50Z

I discovered two problems:

This is a breaking change that will break some of the existing samples: There are some component (especially GCP ones) that only have one output. So, in samples, we refer to that output as task.output. Adding mlpipeline-ui-metadata output makes the task.output stop working, because there are now multiple outputs.

It will not be a breaking change because the outputs are only injected if the mlpipeline-ui-metadata is written in the python codes. The current (python component build)samples, which do not output Metric/Visualization are not affected.

Artifact names. DSL-Compiler prepends the task name to every output name: task_name-output_name (probably it was easiest way for Bradley to ensure uniqueness of downstream input names which are based on upstream output names). Then the names in the compiled workflow would be my-task-3-mlpipeline-ui-metadata. And this will prevent the frontend and backend from recognizing those artifacts.

AFAIK, the frontend/backend look for the metric/visualization based on the keys: https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/

So, adding those two artifacts to file_outputs is not possible at this moment.

gaoning777 · 2019-10-14T03:51:14Z

Changed the title since this is not limited/related to the components - if you try to do the same with the ContainerOp, you'll get the same result:
task = ContainerOp(
    file_outputs={
        'mlpipeline-ui-metadata': '/tmp/mlpipeline-ui-metadata.json',
    }
)
assert not task.outputs

Are you sure? I tried on my machine and it worked fine.

Ark-kun · 2019-10-14T19:03:36Z

The current (python component build)samples, which do not output Metric/Visualization are not affected.

The XGBoost sample and some other e2e sampels might be affected. Many of our 1st party components produce the mlpipeline-ui-metadata artifacts and some samples use task.output (singular).

AFAIK, the frontend/backend look for the metric/visualization based on the keys:
I'm not fully sure about that. In any case, the keys depend on the artifact name.

Are you sure? I tried on my machine and it worked fine.

It's expected. That snippet demonstrates that the mlpipeline-* artifacts are missing from task.outputs even if you just use ContainerOp (not build_python_component or func_to_container_op or load_component).

Ark-kun · 2019-10-14T19:09:26Z

Let's take a step back and understand the high level CUJ here.

The current state of the issue is: The "mlpipeline-ui-metadata" and "mlpipeline-metrics" entries are never added to the task.outputs dictionary which is used to pass output references to dependent tasks.

What CUJs are affected by this issues? What scenarios require passing "mlpipeline-ui-metadata" to another step?

gaoning777 · 2019-10-15T06:41:41Z

Let's take a step back and understand the high level CUJ here.

The current state of the issue is: The "mlpipeline-ui-metadata" and "mlpipeline-metrics" entries are never added to the task.outputs dictionary which is used to pass output references to dependent tasks.

What CUJs are affected by this issues? What scenarios require passing "mlpipeline-ui-metadata" to another step?

You might've misunderstood the issue. I was not referring to the task.outputs and I wasn't expecting the "mlpipeline-ui-metadata" and "mlpipeline-metrics" entries to be used by the downstream components.
The issue is when I author a python function and turn it into a component using the build_python_component(or the other functions if this one is deprecated), the output section in the argo yaml is not generated. The resulting effect is that the metric is displayed on the UI.

Ark-kun · 2019-10-16T21:31:38Z

The issue is when I author a python function and turn it into a component using the build_python_component(or the other functions if this one is deprecated), the output section in the argo yaml is not generated. The resulting effect is that the metric is displayed on the UI.

Oh. This changes everything. I was confused by this part in the description "the output such as mlpipeline-metrics.json or mlpipeline-ui-metadata.json, the generated containerop (by build_python_component) does not contain the corresponding file_output.".

Does your component function have explicit outputs named 'mlpipeline_ui_metadata' and 'mlpipeline_metrics'? See the lightweight sample: https://github.com/kubeflow/pipelines/blob/master/samples/core/lightweight_component/lightweight_component.ipynb
All outputs need to be explicitly specified now. (See #1422 and #1250 (comment) )

The metrics and visualizations should then work.

gaoning777 · 2019-10-17T02:39:50Z

Make sense. Similar to the codes below, If I'm understanding it correctly.

from typing import NamedTuple
def func() -> NamedTuple('output', [('mlpipeline_ui_metadata', 'UI_metadata'), ('mlpipeline_metrics', 'Metrics')]):
    # Exports a sample tensorboard:
    metadata = {
      'outputs' : [{
        'type': 'tensorboard',
        'source': 'gs://ml-pipeline-dataset/tensorboard-train',
      }]
    }

    # Exports two sample metrics:
    metrics = {
      'metrics': [{
          'name': 'quotient',
          'numberValue':  float(quotient),
        },{
          'name': 'remainder',
          'numberValue':  float(remainder),
        }]}

    from collections import namedtuple
    divmod_output = namedtuple('output', ['mlpipeline_ui_metadata', 'mlpipeline_metrics'])
    return divmod_output(json.dumps(metadata), json.dumps(metrics))

Ark-kun · 2019-10-19T00:16:00Z

Make sense. Similar to the codes below, If I'm understanding it correctly.

Yes. This code should work. Let me know if it does not.

I think I should put a notice about the mlpipeline_ui_metadata change in the README.md (e.g. Notable changes in v0.1.29). WDYT?

Ark-kun · 2020-02-11T20:37:07Z

The issue is resolved.

Ark-kun self-assigned this Oct 8, 2019

Ark-kun added kind/bug area/sdk/components and removed kind/bug labels Oct 8, 2019

Ark-kun added area/sdk/dsl area/sdk/dsl/compiler and removed area/sdk/components labels Oct 12, 2019

Ark-kun changed the title ~~build_python_component does not automatically configure the file outputs in the containerop~~ The mlpipeline-ui-metadata output artifact is missing from ConatinerOp.outputs Oct 12, 2019

gaoning777 changed the title ~~The mlpipeline-ui-metadata output artifact is missing from ConatinerOp.outputs~~ build_python_component does not automatically configure the file outputs such as mlpipeline-ui-metadata in the containerop Oct 15, 2019

Ark-kun closed this as completed Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build_python_component does not automatically configure the file outputs such as mlpipeline-ui-metadata in the containerop #2317

build_python_component does not automatically configure the file outputs such as mlpipeline-ui-metadata in the containerop #2317

gaoning777 commented Oct 7, 2019

Ark-kun commented Oct 8, 2019

gaoning777 commented Oct 8, 2019

Ark-kun commented Oct 12, 2019

Ark-kun commented Oct 12, 2019

gaoning777 commented Oct 14, 2019

gaoning777 commented Oct 14, 2019

Ark-kun commented Oct 14, 2019

Ark-kun commented Oct 14, 2019

gaoning777 commented Oct 15, 2019

Ark-kun commented Oct 16, 2019 •

edited

Loading

gaoning777 commented Oct 17, 2019

Ark-kun commented Oct 19, 2019

Ark-kun commented Feb 11, 2020

build_python_component does not automatically configure the file outputs such as mlpipeline-ui-metadata in the containerop #2317

build_python_component does not automatically configure the file outputs such as mlpipeline-ui-metadata in the containerop #2317

Comments

gaoning777 commented Oct 7, 2019

Ark-kun commented Oct 8, 2019

gaoning777 commented Oct 8, 2019

Ark-kun commented Oct 12, 2019

Ark-kun commented Oct 12, 2019

gaoning777 commented Oct 14, 2019

gaoning777 commented Oct 14, 2019

Ark-kun commented Oct 14, 2019

Ark-kun commented Oct 14, 2019

gaoning777 commented Oct 15, 2019

Ark-kun commented Oct 16, 2019 • edited Loading

gaoning777 commented Oct 17, 2019

Ark-kun commented Oct 19, 2019

Ark-kun commented Feb 11, 2020

Ark-kun commented Oct 16, 2019 •

edited

Loading