Skip to content
This repository was archived by the owner on Dec 2, 2021. It is now read-only.

Future of metadata development and deployment - standalone? Only as part of KFP #225

Closed
jlewi opened this issue Apr 28, 2020 · 21 comments
Closed

Comments

@jlewi
Copy link
Contributor

jlewi commented Apr 28, 2020

/kind feature

What is the future of metadata deployment?

There are currently at least two variants of metadata

  • There is one version of metadata that is tightly integrated and deployed as part of KFP
  • There is a second version of metadata which can be deployed independently

I think the differences might pertain mostly to the UI. I think KFP ships a UI for metadata integrated into the KFP UI but I think the backend might be the same.

I think the net effect is that a lot of development is happening in the KFP UI and the generic metadata UI is lagging behind; e.g. #217 is tracking upstreaming changes for lineage that are in KFP UI but not metadata UI.

I think metadata is largely based on mlmd which is developed in google/ml-metadata

What's the path forward for providing a metadata story?

/cc @neuromage @Bobgy @rmgogogo @avdaredevil @zhenghuiwang

@rmgogogo
Copy link

quick question, here "metadata deployment" means google/ml-metadata or KF metadata?

If google/ml-metadata, then it's already in KFP (google/mlmd provided a gRPC server). I think we don't have plan/resource to continue the KF metadata, right? I may lack of knowledge/info on the context.

@Bobgy
Copy link
Contributor

Bobgy commented Apr 28, 2020

I think #217 (comment) is the only planned future feature. Other than that, keeping current status and asking for community help is the best I can forsee.

@animeshsingh
Copy link

If we don't plan to extend it, and it becomes redundant with google/ml-metadata, and that's whre KFP is focussed, best to bring it up in community meeting to decide on the future

@jlewi
Copy link
Contributor Author

jlewi commented Apr 29, 2020

If #217 is the only planned feature then what does this mean for creating a generic metadata story?

When people deploy Kubeflow pipelines do they get:

  1. A metadata backend that can be used to record metadata from arbitrary services (not just KFP)
  2. A UI for displaying metadata even if it wasn't created by KFP.

/cc @paveldournov

@Bobgy
Copy link
Contributor

Bobgy commented May 5, 2020

With current status, both 1. and 2. mentioned above are true.

If we have bandwidth, it's better metadata UI is keeping maintained separately, but I don't think that's the case now.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/front-end 0.63

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented May 11, 2020

I believe when you install the full KFP there are two different URLs for the UI for the metadata store

  1. https:/${KFENDPOINT}/_/metadata
  2. https://${KFENDPOINT}/_/pipeline/#/artifacts

Are these both pointing at the same UI service or are they two different servers?

I suspect they are two different servers but I could be wrong.

@Bobgy
Copy link
Contributor

Bobgy commented May 11, 2020

They are two different servers. Some code is reused in kubeflow/frontend, but the codebase are built, distributed completely separatedly.

@avdaredevil
Copy link
Contributor

avdaredevil commented May 11, 2020 via email

@jlewi
Copy link
Contributor Author

jlewi commented May 18, 2020

Here's my understanding:

  • Google Metadata provides low level libraries for dealing with metadata

    • It defines a data model but not particular schemas
    • It also provides client libraries for talking to the database but not a server
  • This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata

    • I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server
    • @rmgogogo @Bobgy can one of you confirm the answer to that?
  • This repository is also providing a python SDK to make it easy to log data to MLMD.

  • This repository is also providing a front end for visualizing the data

    • Per @avdaredevil 's comments above there are two versions of the front end

      • The front end developed in this repository
      • The front end backed into the KFP UI
    • I believe the plan was to refactor the front end code into shared components that would live in kubeflow/frontend and be reused by both UIs.

  • This repository is also defining some specific schemas that are defined using the ML Metadata data model.

    • I think @animeshsingh pointed out that KFP is also defining schemas.

So to summarize I think there several components

  • ML Metadata server
  • ML Metadata Frontend
  • Python SDK
  • Schemas

@jlewi
Copy link
Contributor Author

jlewi commented May 18, 2020

I think a major feature lineage tracking was introduced with 1.0

  • I think at 1.0 the front end changes were only in the KFP UI but have since been migrated into the standalone UI.

  • Do we have examples illustrating lineage tracking with and without pipelines?

@Bobgy
Copy link
Contributor

Bobgy commented May 19, 2020

Some clarifications on current status

This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata

I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server

Google Metadata repo also provides a grpc server for interacting with metadata https://github.com/kubeflow/manifests/blob/master/metadata/base/metadata-deployment.yaml#L73
KFP standalone is also using this grpc server.

This repository made a REST server on top of it (I'm not sure about technical details, it could be a wrapper on metadata client or the grpc server.)

KFP UI and this repository are already reusing shared components in kubeflow/frontend for the lineage view, but not yet for lists. and both repos agree on the same schema

@Jeffwan
Copy link
Member

Jeffwan commented May 19, 2020

We do use metadata for some metrics tracking in non kfp projects. The reason this is more like a project for KFP is because we don't have experiment concept for other workloads, For example, user has to use SDK manually in their distributed training operator or notebook to log params or metrics. Visualization is limited as well. Even the adoption of this project is not high at this moment, I hope to have it separate and well designed.

it will become more important once we have generic experiment concepts across kubeflow project.
I would say it's key project for MLOPS
See related issue kubeflow/kubeflow#4955

@rmgogogo
Copy link

This repository (kubeflow/metadata) started as a way of providing a server (gRPC? REST? both) interface for metadata

  • I believe KFP might have originally been using the the client libraries to talk directly to the DB but now it using this gRPC? REST? server
  • @rmgogogo @Bobgy can one of you confirm the answer to that?

Current KFP is using google/metadata repo for MLMD stuff.
https://github.com/google/ml-metadata

It's deployed as a gRPC server which connects with the DB. Pipeline tasks/steps calls the gRPC server to access data.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 3, 2020

/cc @zhitaoli

@jlewi
Copy link
Contributor Author

jlewi commented Jul 6, 2020

/cc @aronchick

@jlewi
Copy link
Contributor Author

jlewi commented Jul 6, 2020

If I recall correctly this repository might have originally been providing the following functionality on top of TFX-metadata

  • A python SDK for logging to the metadata server
  • Higher level schemas then what TFX metadata provides
  • Some tools (rest API?) for dynamically managing schemas in TFX metadata
  • A REST API for TFX metadata
  • Standalone (not bundled with KFP) UI for metadata

Some of this functionality might no longer be needed I think tfx-metadata might support GRPC.

Regarding the UI; I believe as @avdaredevil mentioned above some of the frontend code has been refactored into reusable libraries in kubeflow/frontend. I'm unclear to what extent the KFP UI and metadata UIs have been updated to use those shared libraries.

/cc @zhenghuiwang

@Bobgy
Copy link
Contributor

Bobgy commented Jul 7, 2020

I think we don't have a replacement for these items:

  • A python SDK for logging to the metadata server
  • Higher level schemas then what TFX metadata provides
  • Standalone (not bundled with KFP) UI for metadata

The following might not be required any more

  • (not sure on this one) Some tools (rest API?) for dynamically managing schemas in TFX metadata
  • A REST API for TFX metadata -- can be replaced by auto generated grpc http client over envoy

Regarding the UI; I believe as @avdaredevil mentioned above some of the frontend code has been refactored into reusable libraries in kubeflow/frontend. I'm unclear to what extent the KFP UI and metadata UIs have been updated to use those shared libraries.

Current status, kubeflow/pipelines is using those shared libraries entirely, I'm not sure about kubeflow/metadata.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 7, 2020

@neuromage @Bobgy I thought KFP was defining some higher level schemas?

@jlewi
Copy link
Contributor Author

jlewi commented Nov 1, 2020

I filed #250 to get rid of the standalone metadata UI. Its lagging behind the KFP metadata UI and noone seems to be maintaining.

Regarding the SDK; I stumbled upon
https://www.tensorflow.org/tfx/ml_metadata/api_docs/python/mlmd/metadata_store/MetadataStore

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants