iterative · aguschin · Nov 23, 2022 · Oct 24, 2022 · Oct 24, 2022 · Oct 24, 2022
diff --git a/content/docs/gto/command-reference/index.md b/content/docs/gto/command-reference/index.md
@@ -0,0 +1,10 @@
+# Using GTO Commands
+
+GTO is a command line tool. Here, we provide the specifications, complete
+descriptions, and comprehensive usage examples for different `gto` commands.
+
+For a list of all commands, type `gto -h`
+
+## Typical GTO workflow
+
+...
diff --git a/content/docs/gto/get-started.md b/content/docs/gto/get-started.md
@@ -0,0 +1,171 @@
+---
+description:
+  'Learn how you can use GTO to create Artifact Registry in Git repository'
+---
+
+# Get Started
+
+GTO helps you build an Artifact Registry on top of a Git repository (with a
+special case of Machine Learning Model Registry). You can register relevant
+versions of your files (e.g. ML model releases) and assign them to different
+deployment environments (testing, shadow, production, etc.). Git-native
+mechanisms are used, so you can automate the delivery of your ML project with
+CI/CD, and adopt a GitOps approach in general.
+
+This Get Started will walk you through basic GTO concepts and actions you would
+like to do in the Artifact Registry.
+
+## Showing the current state
+
+Assuming GTO is already [installed](/doc/gto/install) in your active Python
+environment, let's clone the example repo:
+
+```cli
+$ git clone https://github.com/iterative/example-gto
+$ cd example-gto
+```
+
+This repo represents a simple example of Machine Learning Model Registry. Let's
+review it:
+
+```cli
+$ gto show
+╒══════════╤══════════╤════════╤═════════╤════════════╕
+│ name     │ latest   │ #dev   │ #prod   │ #staging   │
+╞══════════╪══════════╪════════╪═════════╪════════════╡
+│ churn    │ v3.1.1   │ v3.1.1 │ v3.0.0  │ v3.1.0     │
+│ segment  │ v0.4.1   │ v0.4.1 │ -       │ -          │
+│ cv-class │ v0.1.13  │ -      │ -       │ -          │
+╘══════════╧══════════╧════════╧═════════╧════════════╛
+```
+
+Here we have 3 models: `churn`, `segment` and `cv-class`. The latest versions of
+them are shown in the column named `latest`. The latest is selected as the one
+having the greatest [SemVer](https://semver.org).
+
+Model versions could be promoted to different stages. Here we have 3 of them:
+`dev`, `prod` and `staging`. When a model was never promoted to a stage, we see
+`-` in the field.
+
+## Registering versions and assigning stages
+
+GTO can [register version](/doc/gto/command-reference/register) of artifacts and
+[assign stages to them](/doc/gto/command-reference/assign). Both functionalities
+work in a similar way, so let's walkthough only one of them here.
+
+Let's assume the version `v0.1.13` of `cv-class` looks very promising, and now
+we want to promote it to `dev` to test it:
+
+```cli
+$ gto assign cv-class --version v0.1.13 --stage dev
+Created git tag 'cv-class#dev#1' that assigns stage to version 'v0.1.13'
+To push the changes upstream, run:
+    git push origin cv-class#dev#1
+```
+
+GTO created a Git tag with a special format that contains instruction to assign
+a stage to a version. We can push to Git repository to start the CI, but let's
+ensure that changed our Registry first.
+
+```cli
+$ gto show
+╒══════════╤══════════╤═════════╤═════════╤════════════╕
+│ name     │ latest   │ #dev    │ #prod   │ #staging   │
+╞══════════╪══════════╪═════════╪═════════╪════════════╡
+│ churn    │ v3.1.1   │ v3.1.1  │ v3.0.0  │ v3.1.1     │
+│ segment  │ v0.4.1   │ v0.4.1  │ -       │ -          │
+│ cv-class │ v0.1.13  │ v0.1.13 │ -       │ -          │
+│ awesome  │ v0.0.1   │ -       │ -       │ -          │
+╘══════════╧══════════╧═════════╧═════════╧════════════╛
+```
+
+The `gto show` output confirms our expectation.
+
+## Acting downstream
+
+The power of using Git tags to register versions and assign stages is simple: we
+can act upon them in well-known way - in CI/CD.
+
+To see how it works, let's fork the
+[example-gto repo](https://github.com/iterative/example-gto/fork) and push the
+tag we just created to GitHub. For CI/CD to start, you'll need to enable them on
+the "Actions" page of your fork.
+
+<details>
+
+### Step-by-step instruction
+
+Fork the repo first. Make sure you uncheck "Copy the `main` branch only" to copy
+Git tags as well:
+<img width="877" alt="image" src="https://user-images.githubusercontent.com/6797716/199275275-439335f4-6f54-4cd7-910d-fc29ad3c095c.png">
+
+Then enable workflows in your repo, for a Git tag to trigger CI:
+<img width="869" alt="image" src="https://user-images.githubusercontent.com/6797716/199272682-dfd628bf-9599-4e85-a623-bf4a10c3d7e1.png">
+
+</details>
+
+Let's do the same thing we did locally, but for your remote repo. Don't forget
+to replace the URL:
+
+```cli
+$ gto assign cv-class --version v0.1.13 --stage dev \
+    --repo https://github.com/aguschin/example-gto
+Created git tag 'cv-class#dev#1' that assigns stage to version 'v0.1.13'
+Running `git push origin cv-class#dev#1`
+Successfully pushed git tag cv-class#dev#1 on remote.
+```
+
+Now the CI/CD should start, and you should see that we found out: it was
+`cv-class` artifact, version `v0.1.13` that was assigned to `dev` stage. Using
+this information, the step `Deploy (act on assigning a new stage)` was executed
+(while `Publish (act on registering a new version)` was skipped):
+
+<details>
+
+### CI/CD execution example
+
+<img width="875" alt="image" src="https://user-images.githubusercontent.com/6797716/199276636-bf996ad3-7d9c-4100-9f3c-6444730e4d19.png">
+
+If you want to see more CI examples, check out
+[the example-repo](https://github.com/iterative/example-gto/actions).
+
+</details>
+
+## Next steps
+
+Thanks for completing this Get Started!
+
+- If you want how to specify artifact's metainformation like `path`, `type` and
+  `description`, check out [User Guide](/doc/gto/user-guide).
+- If you want to learn about using DVC to keep your artifact binaries in remote
+  storages, check out [DVC docs](https://dvc.org/doc).
+- If you want to learn more about Studio, check out
+  [Studio docs](https://dvc.org/doc/studio).
+- If you want to learn about using MLEM to deploying your model upon GTO stage
+  assignments, check out [MLEM docs](/doc/).
+
+<!-- Adding a new artifact
+
+We just saw how to commit a new ML model to the repo. It's saved under
+`models/awesome.pkl`. Let's register the very first version of it.
+
+```cli
+$ gto register awesome
+Created git tag '[email protected]' that registers version
+To push the changes upstream, run:
+    git push origin [email protected]
+```
+
+Nice! Let's see the registry state now:
+
+```cli
+$ gto show
+╒══════════╤══════════╤════════╤═════════╤════════════╕
+│ name     │ latest   │ #dev   │ #prod   │ #staging   │
+╞══════════╪══════════╪════════╪═════════╪════════════╡
+│ churn    │ v3.1.1   │ v3.1.1 │ v3.0.0  │ v3.1.0     │
+│ segment  │ v0.4.1   │ v0.4.1 │ -       │ -          │
+│ cv-class │ v0.1.13  │ -      │ -       │ -          │
+│ awesome  │ v0.0.1   │ -      │ -       │ -          │
+╘══════════╧══════════╧════════╧═════════╧════════════╛
+``` -->
diff --git a/content/docs/gto/index.md b/content/docs/gto/index.md
@@ -0,0 +1,39 @@
+# GTO Documentation
+
+**GTO** is a tool for creating an Artifact Registry in your Git repository. One
+of the special cases we would like to highlight is creating a **Machine Learning
+Model Registry**.
+
+Such a registry serves as a centralized place to store and operationalize your
+artifacts along with their metadata; manage model life-cycle, versions &
+releases, and easily automate tests and deployments using GitOps.
+
+<cards>
+
+  <card href="/doc/gto/get-started" heading="Get Started">
+    A step-by-step introduction into basic GTO features
+  </card>
+
+  <card href="/doc/gto/user-guide" heading="User Guide">
+    Study the detailed inner-workings of GTO in its user guide.
+  </card>
+
+  <card href="/doc/gto/use-cases" heading="Use Cases">
+    Non-exhaustive list of scenarios GTO can help with
+  </card>
+
+  <card href="/doc/gto/command-reference" heading="Command Reference">
+    See all of GTO's commands
+  </card>
+
+</cards>
+
+✅ Please join our [community](https://dvc.org/community) or use the
+[support](https://dvc.org/support) channels if you have any questions or need
+specific help. We are very responsive ⚡.
+
+✅ Check out our [GitHub repository](https://github.com/iterative/gto) and give
+us a ⭐ if you like the project!
+
+✅ Contribute to MLEM [on GitHub](https://github.com/iterative/gto) or help us
+improve this [documentation](https://github.com/iterative/mlem.ai) 🙏.
diff --git a/content/docs/gto/install.md b/content/docs/gto/install.md
@@ -0,0 +1,33 @@
+# Installation
+
+To create an Artifact Registry with GTO, you only need a Git repo and GTO
+package installed. There's no need to set up any services or databases, compared
+to many other Model Registry offerings.
-To create an Artifact Registry with GTO, you only need a Git repo and GTO
-package installed. There's no need to set up any services or databases, compared
-to many other Model Registry offerings.
+You'll need [Python](https://www.python.org/) to install GTO, and
+[Git](https://git-scm.com/) to use it.
-To create an Artifact Registry with GTO, you only need a Git repo and GTO
-package installed. There's no need to set up any services or databases, compared
-to many other Model Registry offerings.
+You'll need [Python](https://www.python.org/) to install GTO, and
+[Git](https://git-scm.com/) to use it.
+
+To check whether GTO is installed in your environment, run `which gto`. To check
+which version is installed, run `gto --version`.
+
+## Install as a Python library
+
+GTO is a Python library. It works on any OS. You can install it with a package
+manager like [pip](https://pypi.org/project/pip/) or
+[Conda](https://docs.conda.io/en/latest/), or as a Python
+[requirement](https://pip.pypa.io/en/latest/user_guide/#requirements-files).
+
+<admon type="info">
+
+We **strongly** recommend creating a [virtual environment] or using [pipx] to
+encapsulate your local environment.
+
+[virtual environment]: https://python.readthedocs.io/en/stable/library/venv.html
+[pipx]:
+  https://packaging.python.org/guides/installing-stand-alone-command-line-tools/
+
+</admon>
+
+```cli
+$ pip install gto
+```
+
+This will install the `gto` command-line interface (CLI) and make the Python API
+available for use in code.
diff --git a/content/docs/gto/use-cases/index.md b/content/docs/gto/use-cases/index.md
@@ -0,0 +1,72 @@
+# Use Cases
+
+**GTO** is a tool for creating an Artifact Registry in your Git repository. One
+of the special cases we would like to highlight is creating a
+[Machine Learning Model Registry](/doc/gto/use-cases/model-registry).
+
+Such a registry serves as a centralized place to store and operationalize your
+artifacts along with their metadata; manage model life-cycle, versions &
+releases, and easily automate tests and deployments using GitOps.
+
+Usually, Artifact Registry usage follows these three steps:
+
+- **Registry**. Track new artifacts and their versions for releases and
+  significant changes. Usually this is needed for keeping track of lineage.
+- **Lifecycle Management**. Create actionable stages for versions marking status
+  of artifact or it's readiness to be consumed by a specific environment.
+- **Downstream Usage**. Signal CI/CD automation or other downstream systems to
+  act upon these new versions and lifecycle updates.
+
+GTO helps you achieve all of them in a [GitOps](https://www.gitops.tech) way. If
+you would like to see an example, please follow
+[Get Started](/doc/gto/get-started).
+
+## Why GTO?
+
+In Software Engineering, Git is a heart of the Software system. The code is
+committed to Git and CI/CD triggers on new commits making the downstream action
+necessary. Such approaches as [GitOps](https://www.gitops.tech) made huge steps
+towards automation of development cycles, reducing errors and helping maintain
+productive software development.
+
+Artifact Registries (and Model Registries in specific) usually introduce a
+separate service or infrastructure, as well as new set of APIs to integrate
+with. This often leads to a necessity to maintain two different systems, which
+is a significant overhead. For example, if you work in Machine Learning, you
+often need two teams (Data Science specialists and Software Engineers) each
+responsible of maintaining their part of the system.
+
+![](https://i.imgur.com/GTcrytE.png)
+
+GTO builds that on top of Git repository using Git tags to register versions and
+assign stages, and using `artifacts.yaml` file to keep the metainformation about
+artifacts, such as `path`, `type`, `description` and etc. If your artifact
+development is built around Git, you won't need to introduce new things for your
+team to manage.
+
+One example (although specific to Model Registry) is really good at
+demonstrating this problem of handling two worlds at the same time. When you
+train your Machine Learning models, you have to know what code and data was used
+to do it. If Model Registry lives in a separate system, you (or the code you've
+written) have to record the code and data snapshots (or just a Git commit
+hexsha). Now if you forgot to record the hexsha when you registered a new model
+version in Model Registry, or used an incorrect hexsha, no one can reproduce
+your training process. Keeping track of both models and their versions in Git
+solves that problem.
+
+![](https://i.imgur.com/gViAnOu.png)
+
+## Limitations
+
+There are few limitations to the GTO approach to building an Artifact Registry:
+
+- You shouldn't commit artifact binaries to Git itself. You should use Git-lfs,
+  or use DVC and other similar tools.
+- Some teams develop artifacts (models) in a single monorepository, sometimes in
+  many separate ones. Since GTO operates with Git tags and files in a Git
+  Repository, it can't handle multiple repositories at a single time.
+- GTO is a command-line and Python API tool. That makes it friendly for
+  engineers, although for less technical folks a Visual UI may be required.
+
+If you hit the last two limitations, you may find
+[Studio](https://dvc.org/doc/studio) useful.
diff --git a/content/docs/gto/use-cases/model-registry.md b/content/docs/gto/use-cases/model-registry.md
@@ -0,0 +1,62 @@
+# Machine Learning Model Registry
+
+A **model registry** is a tool to catalog ML models and their versions. Models
+from your data science projects can be discovered, tested, shared, deployed, and
+audited from there. [DVC](https://github.com/iterative/dvc), GTO, and [MLEM]
+enable these capabilities on top of Git, so you can stick to en existing
+software engineering stack. No more divide between ML engineering and
+operations!
+
+[mlem]: /doc
+
+ML model registries give your team key capabilities:
+
+- Collect and organize model [versions] from different sources effectively,
+  preserving their data provenance and lineage information.
+- Share metadata including [metrics and plots][mp] to help use and evaluate
+  models.
+- A standard interface to access all your ML artifacts, from early-stage
+  [experiments] to production-ready models.
+- Deploy specific models on different environments (dev, shadow, prod, etc.)
+  without touching the applications that consume them.
+- For security, control who can manage models, and audit their usage trails.
+
+[versions]: https://dvc.org/doc/use-cases/versioning-data-and-model-files
+[mp]: https://dvc.org/doc/start/metrics-parameters-plots
+[experiments]: https://dvc.org/doc/user-guide/experiment-management
+
+Many of these benefits are built into DVC: Your [modeling process] and
+[performance data][mp] become **codified** in Git-based <abbr>DVC
+repositories</abbr>, making it possible to reproduce and manage models with
+standard Git workflows (along with code). Large model files are stored
+separately and efficiently, and can be pushed to [remote storage] -- a scalable
+access point for [sharing].
+
+<admon type="info">
+
+See also [Data Registry](https://dvc.org/doc/use-cases/data-registry).
+
+</admon>
+
+To make a Git-native registry (on top of DVC or not), one option is to use GTO
+(Git Tag Ops). It tags ML model releases and promotions, and links them to
+artifacts in the repo using versioned annotations. This creates abstractions for
+your models, which lets you **manage their lifecycle** freely and directly from
+Git.
+
+And to **productionize** the models, you can save and build them with the [MLEM]
+Python API or CLI, which automagically captures all the context needed to
+distribute them. It can store model files on the cloud (by itself or with DVC),
+list and transfer them within locations, wrap them as a local REST server, or
+even containerize and deploy them to cloud providers!
+
+This ecosystem of tools from [Iterative](https://iterative.ai/) brings your ML
+process into [GitOps]. This means you can manage and deliver ML models with
+software engineering methods such as continuous integration (CI/CD), which can
+sync with the state of the artifacts in your registry.
+
+[modeling process]: https://dvc.org/doc/start/data-pipelines
+[remote storage]: https://dvc.org/doc/command-reference/remote
+[sharing]: https://dvc.org/doc/start/data-and-model-access
+[via cml]: https://cml.dev/doc/cml-with-dvc
+[gitops]: https://www.gitops.tech/