Skip to content
This repository has been archived by the owner on Oct 16, 2024. It is now read-only.

Misc. docs improvements #68

Merged
merged 9 commits into from
May 30, 2022
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions content/docs/api-reference/import_object.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ command.
'pandas']. Defaults to auto-infer.
- `copy_data` (optional) - Whether to create a copy of file in target location
or just link existing file. Defaults to True.
- `external` (optional) - Save result not in `.mlem`, but directly in repo
- `index` (optional) - Whether to index output in `.mlem` directory
- `external` (optional) - Save result directly in repo (not in `.mlem/`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `external` (optional) - Save result directly in repo (not in `.mlem/`)
- `external` (optional) - Save result directly to `target` (not in `.mlem/`)

This is more correct, since with external=True it doesn't take into account the MLEM project at all. Also, it's not repo, it's project now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you should've felt free to commit your fixes too 🙂 but anyway, we'll get to it... ⌛

- `index` (optional) - Whether to index output in `.mlem/` directory

## Exceptions

Expand Down
2 changes: 1 addition & 1 deletion content/docs/api-reference/init.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# mlem.api.init()

Creates `.mlem/` directory in `path`
Creates and populates the `.mlem/` directory in `path`.

```py
def init(path: str = ".") -> None
Expand Down
2 changes: 1 addition & 1 deletion content/docs/api-reference/save.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ systems (eg: `S3`). The function returns and saves the object as a
- `repo` (optional) - path to mlem repo
- `sample_data` (optional) - If the object is a model or function, you can
provide input data sample, so MLEM will include it's schema in the model's
metadata
metafile
- `fs` (optional) - FileSystem for the `path` argument
- `index` (optional) - Whether to add object to mlem repo index
- `external` (optional) - if obj is saved to repo, whether to put it outside of
Expand Down
13 changes: 6 additions & 7 deletions content/docs/command-reference/create.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# create

Creates a new [MLEM Object](/doc/user-guide/basic-concepts#mlem-objects)
metafile from conf args and config files.
metafile from config args and config files.

## Synopsis

Expand All @@ -16,10 +16,9 @@ PATH Where to save object [required]

## Description

Metadata files (with `.mlem` file extension) can be created for
`.mlem` metafiles can be created for
[MLEM Objects](/doc/user-guide/basic-concepts#mlem-objects) using this command.
This is particularly useful in filling up configuration values for environments
and deployments.
This is particularly useful for configuring environments and deployments.

Each MLEM Object, along with its subtype (which represents a particular
implementation), will accept different configuration arguments. The list of
Expand All @@ -38,18 +37,18 @@ check out the last example [here](/doc/command-reference/types#examples)

## Examples

Create an environment metafile with a config key
Create an environment object metafile with a config key:

```cli
# Fetch all config arguments which can be passed for a heroku env
# Fetch all available config args for a heroku env
$ mlem types env heroku
[not required] api_key: str = None

# Create the heroku env
$ mlem create env heroku production --conf api_key="mlem_heroku_staging"
💾 Saving env to .mlem/env/staging.mlem

# print the contents of the saved metafile for the heroku env
# Print the contents of the new heroku env metafile
$ cat .mlem/env/staging.mlem
api_key: mlem_heroku_staging
object_type: env
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/deploy/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ serving a specific model, using a specific environment definition, and running
on a target platform.

MLEM deployments allow `applying` methods and even whole datasets on models.
Each model lists its supported methods in its metafile, and those are
Each model lists its supported methods in its `.mlem` metafile, and those are
automatically used by MLEM to wire and expose endpoints on the application
server upon deployment. Applying datasets on the deployment is a very handy
shortcut of bulk inferring data on the served model.
Expand Down
10 changes: 5 additions & 5 deletions content/docs/command-reference/import.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# import

Create a MLEM model or dataset metadata from a file/directory.
Create a `.mlem` metafile for a model or dataset in any file or directory.

## Synopsis

Expand All @@ -14,10 +14,10 @@ TARGET Path to save MLEM object [required]

## Description

Use `import` on an existing datasets or model files (or directories) to
auto-generate the necessary MLEM metadata (`.mlem`) files for them. This is
useful to quickly make existing datasets and model files compatible with MLEM,
which can then be used in future operations such as `mlem apply`.
Use `import` on an existing datasets or model files (or directories) to generate
the necessary `.mlem` metafiles for them. This is useful to quickly make
existing datasets and model files compatible with MLEM, which can then be used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
existing datasets and model files compatible with MLEM, which can then be used
existing datasets and model files compatible with MLEM, which then can be used

?

in future operations such as `mlem apply`.

This command provides a quick and easy alternative to writing python code to
load those models/datasets into object for subsequent usage in MLEM context.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/command-reference/init.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ arguments: [PATH] Target path to workspace
## Description

The `init` command (without given `path`) defaults to the current directory for
the path argument. This creates a `.mlem` directory and an empty `config.yaml`
the path argument. This creates a `.mlem/` directory and an empty `config.yaml`
file inside it.

Although we recommend using MLEM within a Git repository to track changes using
Expand Down
4 changes: 2 additions & 2 deletions content/docs/command-reference/pprint.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ arguments: PATH Path to object [required]
## Description

All MLEM objects can be printed to view their metadata. This includes generic
metadata information such as requirements, type of object, hash, size, as well
as object specific information such as `methods` for a `model` or `reader` for a
information such as requirements, type of object, hash, size, as well as object
specific information such as `methods` for a `model` or `reader` for a
`dataset`.

Since only one specific object is printed, a `PATH` to the specific MLEM object
Expand Down
6 changes: 3 additions & 3 deletions content/docs/get-started/saving.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@ $ tree .mlem/model/
> changed, see [project structure](/doc/user-guide/project-structure) for
> reference.

What we see here is that model was saved along with some metadata about it: `rf`
containing the model binary and `.mlem` file containing metadata. Let's take a
look at it:
The model was saved along with some metadata about it: `rf` containing the model
binary and a `.mlem` metafile containing information about it. Let's take a look
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
binary and a `.mlem` metafile containing information about it. Let's take a look
binary and a `rf.mlem` metafile containing information about it. Let's take a look

at it:

<details>

Expand Down
10 changes: 0 additions & 10 deletions content/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -74,21 +74,11 @@
"label": "Basic concepts",
"source": "user-guide/basic-concepts.md"
},
{
"slug": "datasets",
"label": "Working with datasets",
"source": "user-guide/datasets.md"
},
{
"slug": "project-structure",
"label": "Project structure",
"source": "user-guide/project-structure.md"
},
{
"slug": "remote-repos",
"label": "Working with repositories and remote objects",
"source": "user-guide/remote-repos.md"
},
{
"slug": "configuration",
"label": "Configuration",
Expand Down
35 changes: 18 additions & 17 deletions content/docs/use-cases/dvc.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ $ mlem config set default_storage.type dvc
```

Also, let’s add `.mlem` files to `.dvcignore` so that metafiles are ignored by
DVC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated due to #79

DVC.

```cli
$ echo "/**/?*.mlem" > .dvcignore
Expand All @@ -66,15 +66,18 @@ $ git rm -r --cached .mlem/
$ python train.py
```

Finally, let’s add new metafiles to Git and artifacts to DVC respectively,
commit and push them
Finally, let’s add and commit new metafiles to Git and artifacts to DVC,
respectively:

```cli
$ dvc add .mlem/model/rf .mlem/dataset/*.csv
$ git add .mlem
$ git commit -m "Switch to dvc storage"
...

$ dvc push -r myremote
$ git push
...
```

Now, you can load MLEM objects from your repo even though there are no actual
Expand All @@ -89,18 +92,16 @@ DVC pipelines are the useful DVC mechanism to build data pipelines, in which you
can process your data and train your model. You may be already training your ML
models in them and what to start using MLEM to save those models.

MLEM could be easily plug in into existing DVC pipelines. If you already added
`.mlem` files to `.dvcignore`, you are good to go for most of the cases. Since
DVC will ignore `.mlem` files, you don't need to add them as outputs and mark
them with `cache: false`.
MLEM can be easily plugged into existing DVC pipelines. If you already added
`.mlem` files to `.dvcignore`, you are good to go in most cases.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

It becomes a bit more complicated when you need to add them as outputs, because
you want to use them as inputs to next stages. The case may be when model binary
doesn't change for you, but model metadata does. That may happen if you change
things like model description or labels.
It becomes a bit more complicated when you need to add them as inputs to
pipeline stages. For example, when a model binary doesn't change, but its
metadata (e.g. model description or labels) does. things like model description
or labels.

To work with that, you'll need to remove `.mlem` files from `.dvcignore` and
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
mark your outputs in DVC Pipeline with `cache: false`.
make them `cache: false` outputs in the pipeline.

jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
## Example

Expand All @@ -118,7 +119,8 @@ stages:
```

Next step would be to start saving your models with MLEM. Since MLEM saves both
**binary** and **metadata** you need to have both of them in DVC pipeline:
the binary and corresponding `.mlem` metafile, you need to have both of them in
the DVC pipeline:

```yaml
# dvc.yaml
Expand All @@ -133,9 +135,8 @@ stages:
cache: false
```

Since binary was already captured before, we don't need to add anything for it.
For metadata, we've added two rows to capture it and specify `cache: false`
since we want the metadata to be committed to Git, and not be pushed to DVC
remote.
The binary was already in, so there's no need to add it again. For the metafile,
we've added two rows and specify `cache: false` to track it with DVC while
storing it in Git.

Now MLEM is ready to be used in your DVC pipeline!
12 changes: 6 additions & 6 deletions content/docs/user-guide/basic-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ datasets and other types you can read about below.
> Also, MLEM Objects can be created with
> [`mlem create`](/doc/command-reference/create) CLI command

MLEM Objects are saved as `.mlem` files in `yaml` format. Sometimes they can
have other files attached to them, in that case we call `.mlem` file as a
"metadata file" or "metafile" and all the other files we call "artifacts".
MLEM Objects are saved as special _metafiles_ in YAML format with the `.mlem`
extension. These may or may not have _artifacts_ (other files or directories)
associated.

Typically, if **MLEM Object** have only one artifact, it will have the same name
without `.mlem` extension, for example `model.mlem` + `model`, or `data.csv` +
`data.csv.mlem`.
Typically, if **MLEM Object** have only one artifact, it will have the same file
name without `.mlem` extension, for example `model.mlem` and `model`, or
`data.csv` and `data.csv.mlem`.

If **MLEM Object** have multiple artifacts, they will be stored in a directory
with the same name, for example `model.mlem` + `model/data.pkl` +
Expand Down
113 changes: 0 additions & 113 deletions content/docs/user-guide/datasets.md

This file was deleted.

2 changes: 1 addition & 1 deletion content/docs/user-guide/mlem-abcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ will be pickled, and NN will be saved using `torch_io`

## DatasetType

Hold metadata about dataset, like type, dimensions, column names etc.
Holds metadata about dataset, like type, dimensions, column names etc.

**Base class**: `mlem.core.dataset_type.DatasetType`

Expand Down
3 changes: 2 additions & 1 deletion content/docs/user-guide/project-structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ To create one, use [`mlem init`](/doc/command-reference/init) or
`config.yaml` (see [Configuration](/doc/user-guide/configuration)).

> Some API and CLI commands like `mlem ls` and `mlem config` require this
> execution context. But in general, MLEM can work with `.mlem` files anywhere.
> execution context. But in general, MLEM can work with `.mlem` metafiles
> anywhere.

A common place to initialize MLEM is a data science Git repository. _MLEM
repositories_ help you better structure and easily address existing data
Expand Down
Loading