-
Notifications
You must be signed in to change notification settings - Fork 67
Define a set of metadata specifications for v0.6 #4
Comments
Some thoughts on what format/tool should be used to define the spec. We need to handle customized types during runtime. Protobuf is not good at this case, because it needs to compile and generate code. JSON schema seems to be a good fit:
Following this JSON schema guideline on defining complex schema, it would be good if we can extract common fields and reuse them. It is also important to categorize metadata by semantic meaning. Here is my attempt to define a type hierarchy of metadata of ML workflows. A child is a subtype of its parent. For example, we have a generic
|
+1 for using JSON schema. Two questions:
|
I don't have any preference. Based on wiki and google trend, it seems "data set" is more widely used than "dataset".
I meant to use the data transformation from raw data to training data as an example. I made a mistake in the type hierarchy: An executable should be defined as an artifact that has input and output. So What do you think? |
Sounds good then. That makes more sense. Yes it depends on which way we are looking at. It should be fine as long as we clarify our definition of model. Users can also define their own customized metadata if needed. |
I think these should be two separate concepts. An artifact represents anything produced by an ML pipeline or run. An executable represents the pipeline/run itself. |
Absolutely, these two should be treated as separate concepts. I don't mean to go into taxonomy, but point out that model is viewed as derived data in training phase but can be viewed as executable in serving phase. |
@neuromage @zhenghuiwang How is this coming? What is the remaining work for 0.6? |
The metadata definition for v0.6 has been done in #17. They are preloaded in the backend service during startup and exposed in the python SDK. We can add more fields into these definitions for future use cases. |
These metadata specifications should
These metadata specifications are preloaded during server start, while customized ones are registered via endpoints.
The text was updated successfully, but these errors were encountered: