-
Notifications
You must be signed in to change notification settings - Fork 12
GTO docs #199
Conversation
Link Check Report
2/46 links failed. |
Perhaps my first question would be: why writing the gto docs as part of the mlem documentation? Where am I going with this question? I guess the "why" you are writing this, is to enable users to build a full-featured model registry. This comes from the cross-use of 3 tools: mlem, dvc, gto. If I am right, and this is the goal, then I would suggest to avoid trying to write a general introduction to gto here, that should be in its own documentation. Rather, write directly "how do I build a full-featured model registry using dvc, mlem, and gto?". |
@@ -0,0 +1,10 @@ | |||
# Using GTO Commands | |||
|
|||
GTO is a command line tool. Here, we provide the specifications, complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @francesco086. This is good point of view. I'm opening a thread based on your comment to keep a discussion re it in a single place.
Minor: If you check out DVC docs, you'll see there are docs for Studio and DVClive. We can put this "GTO documentation" there or here (I used mlem.ai cause it was easier for me). Or to a separate website, like iterative.ai/doc
maybe? Not sure. We need some place to keep GTO docs anyway.
Major: explaining how to build a registry with DVC+GTO+MLEM. Good question where to put that. In this PR you can see I was going to put answers in /doc/gto/user-guide. I guess the Tutorial format would be the best for this, and we could add it to each product involved under Use Cases (e.g. here it can be next or instead of "Pure MLEM Model registry"):
The other option is to create a GS with this - but that would be way to heavy for Get Started. I guess Tutorial or blog post serves the purpose better.
Another place to have this is Model Registry page in Studio docs. But, not sure yet how UI (Studio) and CLI (GTO+DVC+MLEM Tutorial) could co-exist here. Maybe cross-links are a better approach than having this in Studio docs.
Again, good topic to think about 🤔 We also leave CML out of the picture above, it also can be a part of a MR...
@tapadipti, have you had any discussion about setting up a DVC+GTO+MLEM Tutorial to complement Studio docs? Looks like it much needed, but I can't see we ever created something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good place to put GTO docs if we want docs beyond a CMD/API ref (otherwise we could do with a README and possibly a site like https://docs.iterative.ai/dvc-task/reference/dvc_task/
Major: explaining how to build a registry with DVC+GTO+MLEM. Good question where to put that...
Tutorial format would be the best for this
We mention it very high-level in https://mlem.ai/doc/use-cases/model-registry now. And there's the https://iterative.ai/model-registry solution page separately. I'm not sure how much we want to go into the details of this 3-way integration. May be a good blog topic indeed. Let's create a separate issue to discuss that, though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://iterative.ai/model-registry should have links to all relevant docs pages. But since the docs can't reside there, Studio docs look like the next best place to me for explaining how to build a registry with DVC+GTO+MLEM. We could create a Use cases
section. But depending on how much and what content we need, a blog post may also suffice. And docs specific to the GTO cli should definitely be separate.
If you check out DVC docs, you'll see there are docs for Studio and DVClive.
This is to be changed. We will host Studio docs separately in its own docs site (like CML) - although we don't have dates for this yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I'm trying to draft that blog post - please see https://www.notion.so/iterative/Tutorial-Model-Registry-in-Git-with-DVC-MLEM-and-GTO-af124368ce9f4523a568a7e1875c7af3 - high-level feedback would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aguschin. I've left some comments in the draft blog post.
Prob still need to update the README as well? 🙂 (no rush) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you check out DVC docs, you'll see there are docs for Studio and DVClive
I think we should follow something similar to https://dvc.org/doc/dvclive here:
- Short docs home page (links to installation in README, for example);
- Get Started (single page);
- Technical reference (commands in the case of GTO)
Everything else may be overkill here. Please let's avoid the situation we have in MLEM in general with too many docs we can't properly finish 🙂
content/docs/gto/get-started.md
Outdated
by creating Git tags of [special format](/doc/gto/user-guide) and managing | ||
[`artifacts.yaml` metafile](/doc/gto/user-guide). Since committing large files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a big User Guide, for now we can have a single guide page explaining these formats and their mechanics (again, similar to DVCLive's Folder Structure doc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the all your feedback. We were releasing MLEM 0.3.0, so I was a bit off this. In general:
- I processed your feedback - thanks! - feel free to bring more
- Let's focus on GS, but I'll work on other things while I wait for you
- Let's see if UG can fit a single page. Don't want to complicate things and write extra things, but it may be required to split in subpages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get Started review 👇🏼 Let's focus on this first? In fact, splitting the PR would be ideal IMO.
content/docs/gto/get-started.md
Outdated
This repo represents a simple example of Machine Learning Model Registry. Let's | ||
review it: | ||
|
||
```cli | ||
$ gto show | ||
╒══════════╤══════════╤════════╤═════════╤════════════╕ | ||
│ name │ latest │ #dev │ #prod │ #staging │ | ||
╞══════════╪══════════╪════════╪═════════╪════════════╡ | ||
│ churn │ v3.1.1 │ v3.1.1 │ v3.0.0 │ v3.1.0 │ | ||
│ segment │ v0.4.1 │ v0.4.1 │ - │ - │ | ||
│ cv-class │ v0.1.13 │ - │ - │ - │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 👍🏼 👍🏼
I kind of like that we start by showing the end-result! It's a good way to deliver the value proposition quickly in here (main purpose of this doc).
# Why GTO? | ||
|
||
**GTO** is a tool for creating an Artifact Registry in your Git repository. One | ||
of the special cases we would like to highlight is creating a | ||
[Machine Learning Model Registry](/doc/use-cases/model-registry). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole page is also not in sidebar.json (minor?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still secretly published in https://mlem.ai/doc/gto/why-gto.
content/docs/gto/install.md
Outdated
To create an Artifact Registry with GTO, you only need a Git repo and GTO | ||
package installed. There's no need to set up any services or databases, compared | ||
to many other Model Registry offerings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To create an Artifact Registry with GTO, you only need a Git repo and GTO | |
package installed. There's no need to set up any services or databases, compared | |
to many other Model Registry offerings. | |
You'll need [Python](https://www.python.org/) to install GTO, and | |
[Git](https://git-scm.com/) to use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear now why you need DB/Services at all - if we talk about GTO installation, let's remove all mentions of MR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the part about DBs because it didn't seem too relevant to mention in the installation page, but it may make sense in other docs.
Not sure I understood your suggestion wrt MR mentions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions on whether we want to expose Git technicalities or not (apply to all docs, mainly cmd ref, but I'm only commenting on a couple pages).
UPDATE: Please ignore this for now...
- We can address later. There are lower hanging fruit here.
Create an artifact version to signify an important, published or released | ||
iteration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should be more specific on what it does, like:
Create an artifact version to signify an important, published or released | |
iteration. | |
Create a Git tag containing the artifact's name and version. |
Not sure, e.g. in DVC sometimes we keep it general (https://dvc.org/doc/command-reference/commit) and sometimes specific (https://dvc.org/doc/command-reference/checkout).
I guess it depends on whether we expect users to be familiar enough with Git and/or whether we want them to keep in mind the mechanics. But if we consider these implementation details, then let's keep it general but also remove/ hide Git tag details in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the pages are auto-generated, this requires changing the code. Will do that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're mainly concerned with CI/CD let's be specific instead of saying just "downstream"? More concrete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should dissolve THE relatively long intro (reusing some of the text in each section). That way we get to something actionable faster:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last batch of suggestions on Get Started:
Co-authored-by: Jorge Orpinel <[email protected]>
merging the current docs to improve them later
Check out at https://mlem-ai-gto-docs-pzdfnkadkdwtv.herokuapp.com/doc/gto
close iterative/gto#293
First version of GTO docs.