diff --git a/content/docs/api-reference/apply.md b/content/docs/api-reference/apply.md new file mode 100644 index 00000000..e099bb35 --- /dev/null +++ b/content/docs/api-reference/apply.md @@ -0,0 +1,68 @@ +# mlem.api.apply() + +Apply provided model against provided data + +```py +def apply( + model: Union[str, MlemModel], + *data: Union[str, MlemDataset, Any], + method: str = None, + output: str = None, + target_repo: str = None, + index: bool = None, + external: bool = None, + batch_size: Optional[int] = None, +) -> Optional[Any] +``` + +### Usage: + +```py +from mlem.api import apply + +y_pred = apply("rf", "data", method="predict_proba") +``` + +## Description + +This API is the underlying mechanism for the +[mlem apply](/doc/command-reference/apply) command and facilitates running +inferences on entire datasets. The API applies i.e. calls a model's method (eg: +`predict`) and returns the output (as a MLEM object) while also saving it if +required. + +## Parameters + +- **`model`** (required) - MLEM model (a MlemModel object). +- **`data`** (required) - Input to the model. +- `method` (optional) - Which model method to use. If None, use the only method + model has. If more than one is available, will fail. +- `output` (optional) - If value is provided, assume its path and save output + there. +- `target_repo` (optional) - The path to repo to save the results to. +- `index` (optional) - Whether to index saved output in MLEM root folder. +- `external` (optional) - Whether to save result outside mlem dir. +- `batch_size` (optional) - If data is to be loaded and applied in batches. + +## Exceptions + +- `WrongMethodError` - Thrown if wrong method name for model is provided +- `NotImplementedError` - Saving several input data objects is not implemented + yet + +## Examples + +```py +from numpy import ndarray +from sklearn.datasets import load_iris +from sklearn.tree import DecisionTreeClassifier +from mlem.core.objects import MlemDataset, MlemModel +from mlem.api import apply + +train, target = load_iris(return_X_y=True) +model = DecisionTreeClassifier().fit(train, target) +d = MlemDataset.from_data(train) +m = MlemModel.from_obj(model) +res = apply(m, d, method="predict") +assert isinstance(res, ndarray) +``` diff --git a/content/docs/api-reference/apply_remote.md b/content/docs/api-reference/apply_remote.md new file mode 100644 index 00000000..2ca41e04 --- /dev/null +++ b/content/docs/api-reference/apply_remote.md @@ -0,0 +1,67 @@ +# mlem.api.apply_remote() + +Apply deployed model (possibly remote) against provided data. + +```py +def apply_remote( + client: Union[str, BaseClient], + *data: Union[str, MlemDataset, Any], + method: str = None, + output: str = None, + target_repo: str = None, + index: bool = False, + **client_kwargs, +) -> Optional[Any] +``` + +### Usage: + +```py +from mlem.api import apply_remote + +res = apply_remote(client_obj, data, method="predict") +``` + +## Description + +This API is the underlying mechanism for the +[mlem apply-remote](/doc/command-reference/apply-remote) command and facilitates +running inferences on entire datasets for models which are deployed remotely or +are being served locally. The API requires an explicit client object, which +knows how to make requests to the deployed model. + +## Parameters + +- **`client`** (required) - The client to access methods of deployed model. +- **`data`** (required) - Input to the model. +- `method` (optional) - Which model method to use. If None, use the only method + model has. If more than one is available, will fail. +- `output` (optional) - If value is provided, assume it's path and save output + there. +- `target_repo` (optional) - The path to repo to save the results to. +- `index` (optional) - Whether to index saved output in MLEM root folder. +- `client_kwargs` (optional) - Keyword arguments for the underlying client + implementation being used. + +## Exceptions + +- `WrongMethodError` - Thrown if wrong method name for model is provided +- `InvalidArgumentError` - Thrown if arguments are invalid, when method cannot + be None +- `NotImplementedError` - Saving several input data objects is not implemented + yet + +## Examples + +```py +from numpy import ndarray +from sklearn.datasets import load_iris +from mlem.api import apply_remote +from mlem.runtime.client.base import HTTPClient + +train, _ = load_iris(return_X_y=True) +client = HTTPClient(host="0.0.0.0", port=8080) + +res = apply_remote(client, train, method="predict") +assert isinstance(res, ndarray) +``` diff --git a/content/docs/api-reference/clone.md b/content/docs/api-reference/clone.md new file mode 100644 index 00000000..adbbec7d --- /dev/null +++ b/content/docs/api-reference/clone.md @@ -0,0 +1,64 @@ +# mlem.api.clone() + +Clones MLEM object from `path` to `target` and returns Python representation for +the created object. + +```py +def clone( + path: str, + target: str, + repo: Optional[str] = None, + rev: Optional[str] = None, + fs: Optional[AbstractFileSystem] = None, + target_repo: Optional[str] = None, + target_fs: Optional[str] = None, + follow_links: bool = True, + load_value: bool = False, + index: bool = None, + external: bool = None, +) -> MlemObject +``` + +### Usage: + +```py +from mlem.api import clone + +cloned_obj = clone(path="rf", target="mymodel". repo="https://github.com/iterative/example-mlem-get-started", rev="main") +``` + +## Description + +This API is the underlying mechanism for the +[mlem clone](/doc/command-reference/clone) command and facilitates copying of a +[MLEM Object](/doc/user-guide/basic-concepts#mlem-objects) from source to +target. + +## Parameters + +- **`path`** (required) - Path to the object. Could be local path or path inside + a Git repo. +- **`target`** (required) - Path to save the copy of initial object to. +- `repo` (optional) - URL to repo if object is located there. +- `rev` (optional) - revision, could be Git commit SHA, branch name or tag. +- `fs` (optional) - filesystem to load object from +- `target_repo` (optional) - path to repo to save cloned object to +- `target_fs` (optional) - target filesystem +- `follow_links` (optional) - If object we read is a MLEM link, whether to load + the actual object link points to. Defaults to True. +- `load_value` (optional) - Load actual python object incorporated in MlemMeta + object. Defaults to False. +- `index` (optional) - Whether to index output in .mlem directory +- `external` (optional) - whether to put object inside mlem dir in target repo + +## Exceptions + +None + +## Example: Clone a remote model to a remote repo + +```py +from mlem.api import clone + +cloned_obj = clone(path="rf", target="mymodel". repo="https://github.com/iterative/example-mlem-get-started", rev="main", target_repo="s3://mybucket/mymodel", load_value=True) +``` diff --git a/content/docs/api-reference/deploy.md b/content/docs/api-reference/deploy.md new file mode 100644 index 00000000..8de4d721 --- /dev/null +++ b/content/docs/api-reference/deploy.md @@ -0,0 +1,58 @@ +# mlem.api.deploy() + +Deploy a model to target environment. Can use existing deployment declaration or +create a new one on-the-fly. + +```py +def deploy( + deploy_meta_or_path: Union[MlemDeploy, str], + model: Union[MlemModel, str] = None, + env: Union[MlemEnv, str] = None, + repo: Optional[str] = None, + fs: Optional[AbstractFileSystem] = None, + external: bool = None, + index: bool = None, + **deploy_kwargs, +) -> MlemDeploy +``` + +### Usage: + +```py +from mlem.api import deploy + +#TODO +``` + +## Description + +This API is the underlying mechanism for the +[mlem deploy create](/doc/command-reference/deploy/create) command and provides +a programmatic way to create deployments for a target environment. + +## Parameters + +- **`deploy_meta_or_path`** (required) - Path to deployment meta (will be + created if it does not exist) +- `model` (optional) - Path to model +- `env` (optional) - Path to target environment +- `repo` (optional) - Path to MLEM repo +- `fs` (optional) - filesystem to load deploy meta from. If not provided, will + be inferred from `deploy_meta_or_path` +- `external` (optional) - Save result not in mlem dir, but directly in repo +- `index` (optional) - Whether to index output in .mlem directory +- `deploy_kwargs` (optional) - Configuration for new deployment meta if it does + not exist + +## Exceptions + +- `MlemObjectNotFound` - Thrown if we can't find MLEM object +- `ValueError` - Please provide model and env args for new deployment + +## Examples + +```py +from mlem.api import deploy + +#TODO +``` diff --git a/content/docs/api-reference/import_object.md b/content/docs/api-reference/import_object.md new file mode 100644 index 00000000..a9a320c9 --- /dev/null +++ b/content/docs/api-reference/import_object.md @@ -0,0 +1,79 @@ +# mlem.api.import_object() + +Try to load an object as MLEM model (or dataset) and return it, optionally +saving to the specified target location. + +```py +def import_object( + path: str, + repo: Optional[str] = None, + rev: Optional[str] = None, + fs: Optional[AbstractFileSystem] = None, + target: Optional[str] = None, + target_repo: Optional[str] = None, + target_fs: Optional[AbstractFileSystem] = None, + type_: Optional[str] = None, + copy_data: bool = True, + external: bool = None, + index: bool = None, +) +``` + +### Usage: + +```py +import os +from mlem.api import import_object +from mlem.core.objects import MlemDataset +from mlem.contrib.pandas import DataFrameType + +path = os.path.join(os.getcwd(), "data.csv") +target_path = os.path.join(os.getcwd(), "imported_data") +meta = import_object(path, target=target_path, type_="pandas[csv]", copy_data=True) + +assert isinstance(meta, MlemDataset) +dt = meta.dataset +assert isinstance(dt, DataFrameType) +``` + +## Description + +Existing datasets and model files are imported as +[MLEM Objects](/doc/user-guide/basic-concepts#mlem-objects). Specifically, they +are tried to be loaded as `MlemModel` or `MlemDataset`. The function also +supports saving these objects for future use within the MLEM context. This API +is the underlying mechanism for the [mlem import](/doc/command-reference/import) +command. + +## Parameters + +- **`path`** (required) - Path of file to import. +- `repo` (optional) - Path to MLEM repo. +- `rev` (optional) - revision, could be Git commit SHA, branch name or tag. +- `fs` (optional) - FileSystem for the `path` argument +- `target` (optional) - Path to save MLEM object into. +- `target_repo` (optional) - Path to MLEM repo for `target`. +- `target_fs` (optional) - FileSystem for the `target` argument +- `type_` (optional) - Specify how to read file. Available types: ['pickle', + 'pandas']. Defaults to auto-infer. +- `copy_data` (optional) - Whether to create a copy of file in target location + or just link existing file. Defaults to True. +- `external` (optional) - Save result not in `.mlem`, but directly in repo +- `index` (optional) - Whether to index output in `.mlem` directory + +## Exceptions + +None + +## Example: Import a saved model as MlemModel + +```py +import os +from mlem.core.objects import MlemModel +from mlem.api import import_object + +path = os.path.join(os.getcwd(), "mymodel") +target_path = os.path.join(os.getcwd(), "mlem_model") +meta = import_object(path, target=target_path, type_="pickle", copy_data=True) +assert isinstance(meta, MlemModel) +``` diff --git a/content/docs/api-reference/index.md b/content/docs/api-reference/index.md new file mode 100644 index 00000000..ed771549 --- /dev/null +++ b/content/docs/api-reference/index.md @@ -0,0 +1,15 @@ +# Python API + +MLEM can be used as a python library, simply [install](/doc/install) with `pip` +or `conda`. This reference provides the details about the functions in the API +module `mlem.api`, which can be imported in any regular way, for example: + +```py +import mlem.api +``` + +The purpose of this API is to provide programmatic access to operate on models +and datasets from Python code. + +Please choose a function from the navigation sidebar to the left, or click the +`Next` button below to jump into the first one β†˜ diff --git a/content/docs/api-reference/init.md b/content/docs/api-reference/init.md new file mode 100644 index 00000000..d4ec2ce5 --- /dev/null +++ b/content/docs/api-reference/init.md @@ -0,0 +1,29 @@ +# mlem.api.init() + +Creates `.mlem/` directory in `path` + +```py +def init(path: str = ".") -> None +``` + +### Usage: + +```py +from mlem.api import init + +init(path) +``` + +## Description + +Initializes a MLEM repository by creating a `.mlem/` directory inside the given +path. A new and empty `config.yaml` is also created inside it. + +## Parameters + +- **`path`** (required) - location of the target where a MLEM repository has to + be initialized i.e. a `.mlem/` folder has to be created. `.` by default + +## Exceptions + +None diff --git a/content/docs/api-reference/link.md b/content/docs/api-reference/link.md new file mode 100644 index 00000000..e3523c3a --- /dev/null +++ b/content/docs/api-reference/link.md @@ -0,0 +1,76 @@ +# mlem.api.link() + +Creates MlemLink for an `source` object and dumps it if `target` is provided. + +```py +def link( + source: Union[str, MlemObject], + source_repo: Optional[str] = None, + rev: Optional[str] = None, + target: Optional[str] = None, + target_repo: Optional[str] = None, + external: Optional[bool] = None, + follow_links: bool = True, + absolute: bool = False, +) -> MlemLink +``` + +### Usage: + +```py +import os +from mlem.api import link + +model_path = os.path.join(os.getcwd(), "mymodel") +link_name = os.path.join(os.getcwd(), "latest") +link_obj = link( + model_path, + target=link_name, + target_repo=os.getcwd(), + external=False, +) +``` + +## Description + +This API is the underlying mechanism for the +[mlem link](/doc/command-reference/link) command and explicitly creates a +`MlemLink` object from a `source`. This `MlemLink` object is dumped to a +`target` (if provided). This allows us to refer objects (even remote ones) using +their aliases for all future purposes. + +## Parameters + +- **`source`** (required) - The object to create link from. +- `source_repo` (optional) - Path to mlem repo where to load obj from. +- `rev` (optional) - Revision if object is stored in Git repo. +- `target` (optional) - Where to store the link object. +- `target_repo` (optional) - If provided, treat `target` as link name and dump + link in MLEM DIR. +- `external` (optional) - Whether to save link outside mlem dir. +- `follow_links` (optional) - Whether to make link to the underlying object if + `source` is itself a link. Defaults to True. +- `absolute` (optional) - Whether to make link absolute or relative to mlem + repo. Defaults to False. + +## Exceptions + +- `MlemObjectNotSavedError` - Thrown if we can't do something before we save + MLEM object. + +## Examples + +```py +import os +from mlem.api import link, load_meta +from mlem.core.objects import MlemLink, MlemModel + +model_path = os.path.join(os.getcwd(), "mymodel") +link_path = os.path.join(os.getcwd(), "latest.mlem") +link(model_path, target=link_path, external=True) +assert os.path.exists(link_path) +link_object = load_meta(link_path, follow_links=False) +assert isinstance(link_object, MlemLink) +model = load_meta(link_path) +assert isinstance(model, MlemModel) +``` diff --git a/content/docs/api-reference/load.md b/content/docs/api-reference/load.md new file mode 100644 index 00000000..1cf9d377 --- /dev/null +++ b/content/docs/api-reference/load.md @@ -0,0 +1,58 @@ +# mlem.api.load() + +Load python object saved by MLEM + +```py +def load( + path: str, + repo: Optional[str] = None, + rev: Optional[str] = None, + batch_size: Optional[int] = None, + follow_links: bool = True, +) -> Any +``` + +### Usage: + +```py +import os +from mlem.api import load + +out_path = os.path.join(os.getcwd(), "saved-model") +loaded = load(out_path) +``` + +## Description + +Loads a python object from a given path. The path can belong to different file +systems (eg: `S3`). The function returns the underlying python object saved by +MLEM. + +## Parameters + +- **`path`** (required) - Path to the object. Could be local path or path inside + a Git repo. +- `repo` (optional) - URL to repo if object is located there. +- `rev` (optional) - revision, could be Git commit SHA, branch name or tag. +- `follow_links` (optional) - If object we read is a MLEM link, whether to load + the actual object link points to. Defaults to True. + +## Exceptions + +None + +## Example: Load a trained model saved with MLEM + +```py +import os +from sklearn.datasets import load_iris +from sklearn.tree import DecisionTreeClassifier +from mlem.api import load + +path = os.path.join(os.getcwd(), "saved-model") + +model = load(path) +assert isinstance(model, DecisionTreeClassifier) +train, _ = load_iris(return_X_y=True) +model.predict(train) +``` diff --git a/content/docs/api-reference/load_meta.md b/content/docs/api-reference/load_meta.md new file mode 100644 index 00000000..709eaacd --- /dev/null +++ b/content/docs/api-reference/load_meta.md @@ -0,0 +1,73 @@ +# mlem.api.load_meta() + +Loads MlemObject from a given path + +```py +def load_meta( + path: str, + repo: Optional[str] = None, + rev: Optional[str] = None, + follow_links: bool = True, + load_value: bool = False, + fs: Optional[AbstractFileSystem] = None, + *, + force_type: Optional[Type[T]] = None, +) -> MlemObject +``` + +### Usage: + +```py +import os +from mlem.api import load_meta + +out_path = os.path.join(os.getcwd(), "saved-model") +loaded = load_meta(out_path) +``` + +## Description + +Loads a [MlemObject](/doc/user-guide/basic-concepts#mlem-objects) from a given +path. This differs from [load](/doc/api-reference/load) since the latter loads +the actual python object incorporated within MlemObject. In fact, `load` uses +`load_meta` beneath and uses its `get_value()` method to get the underlying +python object. + +## Parameters + +- **`path`** (required) - Path to the object. Could be local path or path inside + a Git repo. +- `repo` (optional) - URL to repo if object is located there. +- `rev` (optional) - revision, could be Git commit SHA, branch name or tag. +- `follow_links` (optional) - If object we read is a MLEM link, whether to load + the actual object link points to. Defaults to True. +- `load_value` (optional) - Load actual python object incorporated in + MlemObject. Defaults to False. +- `fs` (optional) - filesystem to load from. If not provided, will be inferred + from path +- `force_type` (optional) - type of meta to be loaded. Defaults to MlemObject + (any mlem meta) + +## Exceptions + +- `WrongMetaType` - Thrown if the loaded meta object has a different type than + what is expected (force_type or MlemObject) + +## Examples + +```py +import os +from sklearn.datasets import load_iris +from sklearn.tree import DecisionTreeClassifier + +from mlem.core.objects import MlemModel +from mlem.api import load_meta + +train, _ = load_iris(return_X_y=True) +out_path = os.path.join(os.getcwd(), "saved-model") +meta = load_meta(out_path, load_value=True, force_type=MlemModel) + +model = meta.get_value() +assert isinstance(model, DecisionTreeClassifier) +model.predict(train) +``` diff --git a/content/docs/api-reference/ls.md b/content/docs/api-reference/ls.md new file mode 100644 index 00000000..3468ef9a --- /dev/null +++ b/content/docs/api-reference/ls.md @@ -0,0 +1,53 @@ +# mlem.api.ls() + +Get a view of the MLEM repository by listing all of its MLEM Objects + +```py +def ls( + repo: str = ".", + rev: Optional[str] = None, + fs: Optional[AbstractFileSystem] = None, + type_filter: Union[ + Type[MlemObject], Iterable[Type[MlemObject]], None + ] = None, + include_links: bool = True, +) -> Dict[Type[MlemObject], List[MlemObject]] +``` + +### Usage: + +```py +from mlem.api import ls + +objects = ls(".", rev=None, type_filter=None, include_links=True) +``` + +## Description + +Populates a dictionary where keys are different `types` of +[MlemObjects](/doc/user-guide/basic-concepts#mlem-objects) and values are a +collection of MlemObjects of that type. This API is internally used by the CLI +command [list](/doc/command-reference/list). + +## Parameters + +- **`repo`** (required) - Path or URL to repo +- `rev` (optional) - revision, could be Git commit SHA, branch name or tag. +- `fs` (optional) - filesystem to load from. If not provided, will be inferred + from repo +- `type_filter` (optional) - type of objects to be listed (eg: models / dataset + / etc.) +- `include_links` (optional) - whether to include links while fetching the list + of MlemObjects. Defaults to True + +## Exceptions + +None + +## Examples + +```py +from mlem.api import ls + +objects = ls(".") +``` diff --git a/content/docs/api-reference/mlem-object.md b/content/docs/api-reference/mlem-object.md new file mode 100644 index 00000000..ea86d4f5 --- /dev/null +++ b/content/docs/api-reference/mlem-object.md @@ -0,0 +1,22 @@ +# MlemObject API + +- MlemMeta.read +- MLemMeta.load_value +- MlemMeta.get_value +- MlemMeta.dump +- MlemMeta.make_link +- MlemMeta.clone +- MlemMeta.update + +- MlemLink.load_link +- MlemLink.parse_link +- MlemLink.from_location + +- ModelMeta.from_obj +- DatasetMeta.from_data + +- TargetEnvMeta.deploy +- TargetEnvMeta.destroy +- TargetEnvMeta.get_status + +Same for DeployMeta diff --git a/content/docs/api-reference/pack.md b/content/docs/api-reference/pack.md new file mode 100644 index 00000000..4bfa4806 --- /dev/null +++ b/content/docs/api-reference/pack.md @@ -0,0 +1,82 @@ +# mlem.api.pack() + +Package a [MLEM model](/doc/user-guide/mlem-abcs#modeltype) in pip-ready format, +a built package using whl, docker-build-ready folder or directly build a docker +image. + +```py +def pack( + packager: Union[str, Packager], + model: Union[str, MlemModel], + **packager_kwargs, +) +``` + +### Usage: + +```py +from mlem.api import pack + +pack("pip", "rf", target="build", package_name="example_mlem_get_started") +``` + +> The extra kwargs supplied above can be seen from the output of +> `mlem types packager pip` which gives us +> +> ```py +> [required] package_name: str +> [required] target: str +> [not required] templates_dir: str = [] +> [not required] python_version: str = None +> [not required] short_description: str = "" +> [not required] url: str = "" +> [not required] email: str = "" +> [not required] author: str = "" +> [not required] version: str = "0.0.0" +> [not required] additional_setup_kwargs: typing.Any = {} +> ``` + +## Description + +This API is the underlying mechanism for the +[mlem pack](/doc/command-reference/pack) command and allows us to +programmatically create ship-able assets from MlemModels such as pip-ready +packages, docker images, etc. + +## Parameters + +- **`packager`** (required) - Packager to use. Out-of-the-box supported string + values are ['whl', 'pip', 'docker_dir', 'docker']. +- **`model`** (required) - The model to pack. +- `packager_kwargs` (optional) - Keyword arguments for the underlying packager + being used. + +## Exceptions + +None + +## Examples + +```py +from sklearn.datasets import load_iris +from sklearn.tree import DecisionTreeClassifier + +from mlem.contrib.docker import DockerImagePackager +from mlem.contrib.docker.base import DockerImage +from mlem.contrib.fastapi import FastAPIServer + +from mlem.api import pack + +train, target = load_iris(return_X_y=True) +model = DecisionTreeClassifier().fit(train, target) +model_meta = MlemModel.from_obj(model) + +packed = pack( + DockerImagePackager( + server=FastAPIServer(), + image=DockerImage(name="pack_docker_test_image"), + force_overwrite=True, + ), + model_meta, +) +``` diff --git a/content/docs/api-reference/save.md b/content/docs/api-reference/save.md new file mode 100644 index 00000000..4422e47c --- /dev/null +++ b/content/docs/api-reference/save.md @@ -0,0 +1,74 @@ +# mlem.api.save() + +Saves given object to a given path + +```py +def save( + obj: Any, + path: str, + repo: Optional[str] = None, + sample_data=None, + fs: Union[str, AbstractFileSystem] = None, + index: bool = None, + external: Optional[bool] = None, + description: str = None, + params: Dict[str, str] = None, + labels: List[str] = None, + update: bool = False, +) -> MlemObject +``` + +### Usage: + +```py +from mlem.api import save + +save(obj, path, index=False, external=True) +``` + +## Description + +Saves a given object to a given path. The path can belong to different file +systems (eg: `S3`). The function returns and saves the object as a +[MLEM Object](/doc/user-guide/basic-concepts#mlem-objects). + +## Parameters + +- **`obj`** (required) - Object to dump +- **`path`** (required) - If not located on LocalFileSystem, then should be uri + or `fs` argument should be provided +- `repo` (optional) - path to mlem repo +- `sample_data` (optional) - If the object is a model or function, you can + provide input data sample, so MLEM will include it's schema in the model's + metadata +- `fs` (optional) - FileSystem for the `path` argument +- `index` (optional) - Whether to add object to mlem repo index +- `external` (optional) - if obj is saved to repo, whether to put it outside of + .mlem dir +- `description` (optional) - description for object +- `params` (optional) - arbitrary params for object +- `labels` (optional) - labels for object +- `update` (optional) - whether to keep old description/labels/params if new + values were not provided + +## Exceptions + +- `MlemObjectNotFound` - Thrown if we can't find MLEM object + +## Example: Save a trained model with MLEM + +```py +import os +from sklearn.datasets import load_iris +from sklearn.tree import DecisionTreeClassifier +from pandas import DataFrame +from mlem.api import save + +train, target = load_iris(return_X_y=True) +train = DataFrame(train) +train.columns = train.columns.astype(str) +model = DecisionTreeClassifier().fit(train, target) +path = os.path.join(os.getcwd(), "saved-model") + +save(model, path, sample_data=train, index=False) +``` diff --git a/content/docs/api-reference/serve.md b/content/docs/api-reference/serve.md new file mode 100644 index 00000000..08858f16 --- /dev/null +++ b/content/docs/api-reference/serve.md @@ -0,0 +1,58 @@ +# mlem.api.serve() + +Serve a model by exposing its methods as endpoints. + +```py +def serve( + model: MlemModel, + server: Union[Server, str], + **server_kwargs +) +``` + +### Usage: + +```py +from mlem.api import serve + +serve(model, server_obj) +``` + +## Description + +This API is the underlying mechanism for the +[mlem serve](/doc/command-reference/serve) command and allows us to locally +serve a model by exposing its methods as endpoints. This makes it possible to +easily make requests (for inference or otherwise) against the served model. + +## Parameters + +- **`model`** (required) - The model (a MlemModel object) to serve. +- **`server`** (required) - Which server implementation to use. Out-of-the-box + supported ones are ['fastapi', 'rmq', 'heroku'] +- `server_kwargs` (optional) - Keyword arguments for the underlying server + implementation being used. + +## Exceptions + +None + +## Examples + +```py +from sklearn.datasets import load_iris +from sklearn.tree import DecisionTreeClassifier +from mlem.core.objects import MlemModel +from mlem.runtime.interface.base import ModelInterface +from mlem.contrib.fastapi import FastAPIServer + +from mlem.api import serve + +train, target = load_iris(return_X_y=True) +model = DecisionTreeClassifier().fit(train, target) +m = MlemModel.from_obj(model, sample_data=train) +interface = ModelInterface.from_model(m) + +server_obj = FastAPIServer().app_init(interface) +serve(m, server_obj) +``` diff --git a/content/docs/command-reference/apply-remote.md b/content/docs/command-reference/apply-remote.md new file mode 100644 index 00000000..ef8782fd --- /dev/null +++ b/content/docs/command-reference/apply-remote.md @@ -0,0 +1,52 @@ +# apply-remote + +Apply a deployed-model (possibly remotely) to a dataset. The resulting dataset +will be saved as a MLEM object to `output` if provided. Otherwise, it will be +printed to `stdout`. + +## Synopsis + +```usage +usage: mlem apply-remote [options] [subtype] data + +arguments: +[SUBTYPE] Type of client. Choices: ['http', 'rmq'] +DATA Path to dataset object [required] +``` + +## Description + +Models which are deployed somewhere remotely or are being +[served](/doc/get-started/serving) locally, can have their methods called using +the `apply-remote` command. This command is similar to +[apply](/doc/command-reference/apply), with the only difference being the model +is deployed remotely using a deployment, or served locally. To access the +methods of the `served` model, a `client` is needed. Currently, the available +clients are `http` and `rmq` - which are used to launch requests against the +`fastapi` and `rmq` server types, correspondingly. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `-o, --output TEXT`: Where to store the outputs. +- `--target-repo, --tr TEXT`: Repo to save target to [default: (none)] +- `-m, --method TEXT`: Which model method is to be applied [default: predict] +- `--index / --no-index`: Whether to index output in .mlem directory +- `--json`: Output as json +- `-l, --load TEXT`: File to load client config from +- `-c, --conf TEXT`: Options for client in format `field.name=value` +- `-f, --file_conf TEXT`: File with options for client in format + `field.name=path_to_config` +- `-h, --help`: Show this message and exit. + +## Example: Apply a locally hosted model to a local dataset + +Given a hosted model server (see +[serve example](/doc/command-reference/serve#examples) as a way to easily do +this) and a local MLEM dataset `mydataset`, run the following command to infer +the entire dataset with the model and save the output dataset to `myprediction` + +```cli +$ mlem apply-remote http mydataset --conf host="127.0.0.1" --conf port=3000 --output myprediction +``` diff --git a/content/docs/command-reference/apply.md b/content/docs/command-reference/apply.md new file mode 100644 index 00000000..6be0c2e6 --- /dev/null +++ b/content/docs/command-reference/apply.md @@ -0,0 +1,69 @@ +# apply + +Apply a model to a dataset. The resulting dataset will be saved as a MLEM object +to `output` if provided. Otherwise, it will be printed to `stdout`. + +## Synopsis + +```usage +usage: mlem apply [options] model data + +arguments: +MODEL Path to model object [required] +DATA Path to dataset object [required] +``` + +## Description + +Applying a model to a dataset means calling a model's method (e.g. `predict`) +with all the data points in the dataset, and returning the output as a MLEM +Object. + +This command addresses a very common workflow, replacing the need to write a +python script to load models & datasets, apply the datasets on the models, and +save the resulting dataset. + +Models and Datasets, which represent +[MLEM objects](/doc/user-guide/basic-concepts#mlem-objects), can be used +directly through command line together to easily run inferences on entire +datasets. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `-o, --output TEXT`: Where to store the outputs. +- `-m, --method TEXT`: Which model method is to be applied [default: predict] +- `--data-repo, --dr TEXT`: Repo with dataset +- `--data-rev TEXT`: Revision of dataset +- `-i, --import`: Try to import data on-the-fly +- `--import-type, --it TEXT`: Specify how to read data file for import. + Available types: ['pandas', 'pickle'] +- `-b, --batch_size INTEGER`: Batch size for reading data in batches. +- `--index / --no-index`: Whether to index output in .mlem directory +- `-e, --external`: Save result not in .mlem, but directly in repo +- `--json`: Output as json +- `-h, --help`: Show this message and exit. + +## Examples + +Apply a local MLEM model to a local MLEM dataset + +```cli +$ mlem apply mymodel mydatset --method predict --output myprediction +``` + +Apply a local MLEM model to a dataset imported from a local data file + +```cli +$ mlem apply mymodel data.csv --method predict --import --import-type pandas[csv] --output myprediction +``` + +Apply a version of a remote model (from HEAD of `main` branch) to a version of a +remote dataset (again, HEAD of `main` branch) + +```cli +$ mlem apply rf --repo https://github.com/iterative/example-mlem-get-started --rev main + iris.csv --data-repo https://github.com/iterative/example-mlem-get-started --data-rev main + --method predict --output myprediction +``` diff --git a/content/docs/command-reference/clone.md b/content/docs/command-reference/clone.md new file mode 100644 index 00000000..82a89fd9 --- /dev/null +++ b/content/docs/command-reference/clone.md @@ -0,0 +1,46 @@ +# clone + +Copy a [MLEM Object](/doc/user-guide/basic-concepts#mlem-objects) from `uri` and +saves a copy of it to `target` path. + +## Synopsis + +```usage +usage: mlem clone [options] uri target + +arguments: +URI URI to object you want to clone [required] +TARGET Path to store the downloaded object. [required] +``` + +## Description + +Cloning a [MLEM Object](/doc/user-guide/basic-concepts#mlem-objects) from source +to target destination creates an independent copy of the original object. This +can be useful in cases where you need the model without cloning the whole +repository. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `--target-repo, --tr TEXT`: Repo to save target to [default: (none)] +- `-e, --external`: Save result not in .mlem, but directly in repo +- `--link / --no-link`: Whether to create link for output in .mlem directory +- `--help`: Show this message and exit. + +## Examples + +Copy a remote model (in GitHub) to a local directory + +```cli +$ mlem clone rf --repo https://github.com/iterative/example-mlem-get-started --rev main mymodel +... +``` + +Copy a remote model from a GitHub repo, to a different, remote, S3 MLEM repo + +```cli +$ mlem clone rf --repo https://github.com/iterative/example-mlem-get-started --rev main mymodel --tr s3://mybucket/mymodel +... +``` diff --git a/content/docs/command-reference/create.md b/content/docs/command-reference/create.md new file mode 100644 index 00000000..6d15c131 --- /dev/null +++ b/content/docs/command-reference/create.md @@ -0,0 +1,57 @@ +# create + +Creates a new [MLEM Object](/doc/user-guide/basic-concepts#mlem-objects) +metafile from conf args and config files. + +## Synopsis + +```usage +usage: mlem create [options] object_type [subtype] path + +arguments: +OBJECT_TYPE Type of metafile to create [required] +[SUBTYPE] Subtype of MLEM object [default: ] +PATH Where to save object [required] +``` + +## Description + +Metadata files (with `.mlem` file extension) can be created for +[MLEM Objects](/doc/user-guide/basic-concepts#mlem-objects) using this command. +This is particularly useful in filling up configuration values for environments +and deployments. + +Each MLEM Object, along with its subtype (which represents a particular +implementation), will accept different configuration arguments. The list of +configuration arguments per type can be fetched by running the +[`mlem types`](/doc/command-reference/types) command. For an example output, +check out the last example [here](/doc/command-reference/types#examples) + +## Options + +- `-c, --conf TEXT`: Values for object fields in format + `field.nested.name=value` +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `-e, --external`: Save result not in .mlem, but directly in repo +- `--index / --no-index`: Whether to index output in .mlem directory +- `-h, --help`: Show this message and exit. + +## Examples + +Create an environment metafile with a config key + +```cli +# Fetch all config arguments which can be passed for a heroku env +$ mlem types env heroku +[not required] api_key: str = None + +# Create the heroku env +$ mlem create env heroku production --conf api_key="mlem_heroku_staging" +πŸ’Ύ Saving env to .mlem/env/staging.mlem + +# print the contents of the saved metafile for the heroku env +$ cat .mlem/env/staging.mlem +api_key: mlem_heroku_staging +object_type: env +type: heroku +``` diff --git a/content/docs/command-reference/deploy/apply.md b/content/docs/command-reference/deploy/apply.md new file mode 100644 index 00000000..e2e2bc3f --- /dev/null +++ b/content/docs/command-reference/deploy/apply.md @@ -0,0 +1,39 @@ +# deploy apply + +Apply a deployed model to a dataset. + +## Synopsis + +```usage +usage: mlem deploy apply [options] path data + +arguments: +PATH Path to deployment meta [required] +DATA Path to dataset object [required] +``` + +## Description + +The `deploy apply` command lets us apply MLEM deployments (deployed models) to a +dataset (MLEM object). This means the server's method endpoints (such as +`predict` by default) will be called with the given dataset and the outputs +gathered and returned, also as a MLEM Object. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `--data-repo, --dr TEXT`: Repo with dataset +- `--data-rev TEXT`: Revision of dataset +- `-o, --output TEXT`: Where to store the outputs. +- `--target-repo, --tr TEXT`: Repo to save target to [default: (none)] +- `-m, --method TEXT`: Which model method is to be applied [default: predict] +- `--index / --no-index`: Whether to index output in .mlem directory +- `--json`: Output as json +- `-h, --help`: Show this message and exit. + +## Example: Apply a dataset on a deployed model + +```cli +$ mlem deploy apply service_name mydatset --method predict +``` diff --git a/content/docs/command-reference/deploy/create.md b/content/docs/command-reference/deploy/create.md new file mode 100644 index 00000000..1d83e29e --- /dev/null +++ b/content/docs/command-reference/deploy/create.md @@ -0,0 +1,61 @@ +# deploy create + +Deploy a model to a target environment. You can use an existing deployment +declaration or create a new one on-the-fly. + +## Synopsis + +```usage +usage: mlem deploy create [options] path + +arguments: +PATH Path to deployment meta (will be created if it does not exist) [required] +``` + +## Description + +The `deploy create` command creates a new deployment for a target environment. +One can either use an existing deployment declaration (created with +`mlem create deployment`) or create a new one on-the-fly with various available +options (see below). + +## Options + +- `-m, --model TEXT`: Path to model +- `-t, --env TEXT`: Path to target environment +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `-e, --external`: Save result not in .mlem, but directly in repo +- `--index / --no-index`: Whether to index output in .mlem directory +- `-c, --conf TEXT`: Configuration for new deployment meta if it does not exist +- `-h, --help`: Show this message and exit. + +## Example: Create a new deployment from scratch + +Here, we define an environment and then create a deployment on it, providing the +deployment configuration on-the-fly + +```cli +$ mlem create env heroku staging --conf api_key=... +... + +$ mlem deploy create service_name --model model --env staging --conf name=my_service +... +``` + +## Example: Create a deployment from a pre-configured deployment + +Here, we define an environment, configure a deployment declaration on it using +[`mlem create deployment`](/doc/command-reference/create), and then create our +deployment with a simple concise command which uses the existing pre-configured +deployment declaration + +```cli +$ mlem create env heroku staging --conf api_key=... +... + +$ mlem create deployment heroku service_name --conf app_name=my_service --conf model=model --conf env=staging +... + +$ mlem deploy create service_name +... +``` diff --git a/content/docs/command-reference/deploy/index.md b/content/docs/command-reference/deploy/index.md new file mode 100644 index 00000000..74bee919 --- /dev/null +++ b/content/docs/command-reference/deploy/index.md @@ -0,0 +1,38 @@ +# deploy + +A set of commands to set up and manage deployments. + +## Synopsis + +```usage +usage: mlem deploy [options] COMMAND [ARGS]... + +arguments: +COMMAND + apply Apply method of deployed service + create Deploy a model to target environment + status Print status of deployed service + teardown Stop and destroy deployed instance +``` + +## Description + +The `deploy` commands are used to manage the lifecycle of deployments along with +giving access to methods of the deployed model. + +A "deployment" is an application/service instance consisting of a server, +serving a specific model, using a specific environment definition, and running +on a target platform. + +MLEM deployments allow `applying` methods and even whole datasets on models. +Each model lists its supported methods in its metafile, and those are +automatically used by MLEM to wire and expose endpoints on the application +server upon deployment. Applying datasets on the deployment is a very handy +shortcut of bulk inferring data on the served model. + +> Currently, only `heroku` is supported as a target but more platforms will be +> added soon! + +## Options + +- `-h, --help`: Show this message and exit. diff --git a/content/docs/command-reference/deploy/status.md b/content/docs/command-reference/deploy/status.md new file mode 100644 index 00000000..9ba0d318 --- /dev/null +++ b/content/docs/command-reference/deploy/status.md @@ -0,0 +1,39 @@ +# deploy status + +Show the status of a deployment. + +## Synopsis + +```usage +usage: mlem deploy status [options] path + +arguments: +PATH Path to deployment meta [required] +``` + +## Description + +The `deploy status` command lets us check the status of the deployment, which is +the deployed app/service serving the model. + +### Heroku + +The possible statuses for deployments using the `heroku` target platform is: + +- unknown +- not_deployed +- starting +- crashed +- stopped +- running + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `-h, --help`: Show this message and exit. + +## Example: Get the status of a deployment + +```cli +$ mlem deploy status service_name +``` diff --git a/content/docs/command-reference/deploy/teardown.md b/content/docs/command-reference/deploy/teardown.md new file mode 100644 index 00000000..ff6bb7f9 --- /dev/null +++ b/content/docs/command-reference/deploy/teardown.md @@ -0,0 +1,29 @@ +# deploy teardown + +Stop and destroy a deployment. + +## Synopsis + +```usage +usage: mlem deploy teardown [options] path + +arguments: +PATH Path to deployment meta [required] +``` + +## Description + +The `deploy teardown` destroys the deployment by first setting its state to +`not_deployed` before proceeding to actually destroying the deployed service, +deleting its associated runtime resources. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `-h, --help`: Show this message and exit. + +## Example: Stop and destroy a deployment + +```cli +$ mlem deploy teardown service_name +``` diff --git a/content/docs/command-reference/import.md b/content/docs/command-reference/import.md new file mode 100644 index 00000000..9dc47a72 --- /dev/null +++ b/content/docs/command-reference/import.md @@ -0,0 +1,59 @@ +# import + +Create a MLEM model or dataset metadata from a file/directory. + +## Synopsis + +```usage +usage: mlem import [options] uri target + +arguments: +URI File to import [required] +TARGET Path to save MLEM object [required] +``` + +## Description + +Use `import` on an existing datasets or model files (or directories) to +auto-generate the necessary MLEM metadata (`.mlem`) files for them. This is +useful to quickly make existing datasets and model files compatible with MLEM, +which can then be used in future operations such as `mlem apply`. + +This command provides a quick and easy alternative to writing python code to +load those models/datasets into object for subsequent usage in MLEM context. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `--target-repo, --tr TEXT`: Repo to save target to [default: (none)] +- `--copy / --no-copy`: Whether to create a copy of file in target location or + just link existing file [default: copy] +- `--type TEXT`: Specify how to read file Available types: ['pandas', 'pickle'] + [default: (auto infer)] +- `--index / --no-index`: Whether to index output in .mlem directory +- `-e, --external`: Save result not in .mlem, but directly in repo +- `-h, --help`: Show this message and exit. + +## Examples + +Create a MLEM dataset from a local `.csv` file + +```cli +$ mlem import data/data.csv data/imported_data --type pandas[csv] +... +``` + +Create a MLEM model from local `.pkl` (pickle) file + +```cli +$ mlem import data/model.pkl data/imported_model +... +``` + +Create a MLEM model from remote `.pkl` (pickle) file + +```cli +$ mlem import .mlem/model/rf --repo https://github.com/iterative/example-mlem-get-started --rev simple data/imported_model --type pickle +πŸ’Ύ Saving model to .mlem/model/data/imported_model.mlem +``` diff --git a/content/docs/command-reference/index.md b/content/docs/command-reference/index.md new file mode 100644 index 00000000..c25cc54b --- /dev/null +++ b/content/docs/command-reference/index.md @@ -0,0 +1,19 @@ +# Using MLEM Commands + +MLEM is a command line tool. Here, we provide the specifications, complete +descriptions, and comprehensive usage examples for different `mlem` commands. + +For a list of all commands, type `mlem -h` + +## Typical MLEM workflow + +- Initialize a MLEM project in a Git Repo with + [mlem init](/doc/command-reference/init). +- Save Models and Datasets with MLEM. +- Load and Apply models with [mlem apply](/doc/command-reference/apply). +- Package models into python packages or docker images with + [mlem pack](/doc/command-reference/pack). +- Serve your models by exposing their methods as endpoints using + [mlem serve](/doc/command-reference/serve). +- Deploy your models to various target platforms in the cloud with + [mlem deploy](/doc/command-reference/deploy). diff --git a/content/docs/command-reference/init.md b/content/docs/command-reference/init.md new file mode 100644 index 00000000..0c11cdf6 --- /dev/null +++ b/content/docs/command-reference/init.md @@ -0,0 +1,46 @@ +# init + +Initialize a MLEM working directory. + +## Synopsis + +```usage +usage: mlem init [options] [path] + +arguments: [PATH] Target path to workspace +``` + +## Description + +The `init` command (without given `path`) defaults to the current directory for +the path argument. This creates a `.mlem` directory and an empty `config.yaml` +file inside it. + +Although we recommend using MLEM within a Git repository to track changes using +the standard Git workflows, this is not required. The existence of a `.mlem/` +directory in any path (including remote) constitutes a MLEM project, and MLEM +will be fully functional even without incorporating Git in one's workflow. + +## Options + +- `-h, --help`: Show this message and exit. + +## Examples + +Default Initialization (current working directory) + +```cli +$ mlem init +``` + +Initialization to a specified local path + +```cli +$ mlem init some/local/path +``` + +Initialization in a remote S3 bucket + +```cli +$ mlem init s3://bucket/path/in/cloud +``` diff --git a/content/docs/command-reference/link.md b/content/docs/command-reference/link.md new file mode 100644 index 00000000..12111d50 --- /dev/null +++ b/content/docs/command-reference/link.md @@ -0,0 +1,54 @@ +# link + +Create a link (read alias) for an existing +[MLEM Object](/doc/user-guide/basic-concepts#mlem-objects), including from +remote MLEM projects. + +## Synopsis + +```usage +usage: mlem link [options] source target + +arguments: +SOURCE URI of the object you are creating a link to [required] +TARGET Path to save link object [required] +``` + +## Description + +This command is used to create links to existing +[MLEM objects](/doc/user-guide/basic-concepts#mlem-objects), which in turn +allows you to refer to the object using the `TARGET` path in all future +operations. + +A common use-case is to create links for objects present in remote MLEM projects +to incorporate them in the local workspace. + +## Options + +- `--source-repo, --sr TEXT`: Repo for source object +- `--rev TEXT`: Repo revision to use [default: (none)] +- `--target-repo, --tr TEXT`: Repo to save target to [default: (none)] +- `-e, --external`: Save result not in .mlem, but directly in repo +- `--follow-links, --f / --no-follow-links, --nf`: If True, first follow links + while reading {source} before creating this link. [default: follow-links] +- `--absolute, --abs / --relative, --rel`: Which path to linked object to + specify: absolute or relative. [default: relative] +- `-h, --help`: Show this message and exit. + +## Examples + +Add a remote object to your local workspace (aka repo) without copying it + +```cli +$ mlem link rf --source-repo https://github.com/iterative/example-mlem-get-started remote_model +``` + +> The remote model can now be served with the link created above, using the +> command `mlem serve remote_model fastapi`. + +Alias a local object with a different name + +```cli +$ mlem link my_model latest +``` diff --git a/content/docs/command-reference/list.md b/content/docs/command-reference/list.md new file mode 100644 index 00000000..1336e62a --- /dev/null +++ b/content/docs/command-reference/list.md @@ -0,0 +1,50 @@ +# list + +List [MLEM objects](/doc/user-guide/basic-concepts#mlem-objects) inside a MLEM +workspace (location should be [initialized](/doc/command-reference/init)). + +> Aliased to `mlem ls` + +## Synopsis + +```usage +usage: mlem list [options] [repo] + +arguments: [REPO] Repo to list from [default: (current directory)] +``` + +## Description + +Produces a view of the MLEM repository listing +[MLEM objects](/doc/user-guide/basic-concepts#mlem-objects) like models, +datasets, and links. + +Running the command without an explicit `repo` argument defaults to the current +working directory. The `repo` argument can take a local path, or point to a +remote repository (e.g. GitHub). + +This command also supports additional options, allowing filtering of MLEM +Objects by type, producing JSON output, selectively displaying +[links](/doc/user-guide/linking) and choosing a particular revision in case of +remote repositories. + +## Options + +- `-t, --type [all|link|model|dataset|env|deployment|packager]`: Type of objects + to list [default: all] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `+l, --links / -l, --no-links`: Whether to include links [default: +l] +- `--json`: Output as json +- `--help`: Show this message and exit. + +## Examples + +List MLEM objects on a remote GitHub repository + +```cli +$ mlem list https://github.com/iterative/example-mlem-get-started +Models: +- rf +Datasets: +- iris.csv +``` diff --git a/content/docs/command-reference/pack.md b/content/docs/command-reference/pack.md new file mode 100644 index 00000000..30ab7bcd --- /dev/null +++ b/content/docs/command-reference/pack.md @@ -0,0 +1,52 @@ +# pack + +Package models to create re-usable, ship-able entities such as a Docker image or +Python package. + +## Synopsis + +```usage +usage: mlem pack [options] model [subtype] + +arguments: +MODEL Path to model [required] +[SUBTYPE] Type of packing. Choices: ['whl', 'pip', 'docker_dir', 'docker'] +``` + +## Description + +This command provides flexible options to create various distribution-ready +release assets from your models, like `pip`-ready Python packages or Docker +images. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `-l, --load TEXT`: File to load packing config from +- `-c, --conf TEXT`: Options for packing in format `field.name=value` +- `-f, --file_conf TEXT`: File with options for packing in format + `field.name=path_to_config` +- `-h, --help`: Show this message and exit. + +## Examples + +Build a docker image from a model + +```cli +$ mlem pack mymodel docker --conf server.type=fastapi --conf image.name=myimage +``` + +Create a `docker_dir` packager config called `pack_dock`, and use it to package +a model + +```cli +$ mlem create packager docker_dir --conf server=fastapi --conf target=build pack_dock +... + +$ mlem pack mymodel --load pack_dock +... +``` + +For a detailed example using python-package, see the get-started guide +[packaging example](/doc/get-started/packaging). diff --git a/content/docs/command-reference/pprint.md b/content/docs/command-reference/pprint.md new file mode 100644 index 00000000..abce41b0 --- /dev/null +++ b/content/docs/command-reference/pprint.md @@ -0,0 +1,75 @@ +# pprint + +Display all details about a specific +[MLEM object](/doc/user-guide/basic-concepts#mlem-objects) from an existing MLEM +project. + +## Synopsis + +```usage +usage: mlem pprint [options] path + +arguments: PATH Path to object [required] +``` + +## Description + +All MLEM objects can be printed to view their metadata. This includes generic +metadata information such as requirements, type of object, hash, size, as well +as object specific information such as `methods` for a `model` or `reader` for a +`dataset`. + +Since only one specific object is printed, a `PATH` to the specific MLEM object +is always required. + +> You can use [`mlem list`](/doc/command-reference/list) to list MLEM objects. + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `-f, --follow-links`: If specified, follow the link to the actual object. +- `--json`: Output as json +- `--help`: Show this message and exit. + +## Example: Showing local model + +```cli +$ mlem pprint rf +⏳️ Loading meta from .mlem/model/rf.mlem +{'artifacts': {'data': {'hash': 'a61a1fa54893dcebe6fa448df81a1418', + 'size': 163651, + 'type': 'dvc', + 'uri': 'rf'}}, + 'description': 'Random Forest Classifier', + 'model_type': {'methods': {'predict': {'args': [{'name': 'data', + 'type_': {'columns': ['sepal ' + 'length ' + '(cm)', +... +``` + +## Example: Showing remote dataset + +```cli +$ mlem pprint https://github.com/iterative/example-mlem-get-started/iris.csv +⏳️ Loading meta from https://github.com/iterative/example-mlem-get-started/tree/main/.mlem/dataset/iris.csv.mlem +{'artifacts': {'data': {'hash': '45109f850511f9474665f2c26f4c79f3', + 'size': 2470, + 'type': 'dvc', + 'uri': 'iris.csv'}}, + 'object_type': 'dataset', + 'reader': {'dataset_type': {'columns': ['sepal length (cm)', + 'sepal width (cm)', + 'petal length (cm)', + 'petal width (cm)'], + 'dtypes': ['float64', + 'float64', + 'float64', + 'float64'], + 'index_cols': [], + 'type': 'dataframe'}, + 'format': 'csv', + 'type': 'pandas'}, + 'requirements': [{'module': 'pandas', 'version': '1.4.2'}]} +``` diff --git a/content/docs/command-reference/serve.md b/content/docs/command-reference/serve.md new file mode 100644 index 00000000..521576b2 --- /dev/null +++ b/content/docs/command-reference/serve.md @@ -0,0 +1,58 @@ +# serve + +Locally deploy the model using a server implementation and expose its methods as +endpoints. + +## Synopsis + +```usage +usage: mlem serve [options] model [subtype] + +arguments: +MODEL Model to create service from [required] +[SUBTYPE] Server type. Choices: ['fastapi', 'heroku', 'rmq'] [default: ] +``` + +## Description + +An [MLEM Model](/doc/user-guide/basic-concepts#model) can be served via a server +implementation (e.g. `fastapi`) and its methods exposed as API endpoints. This +allows us to easily make requests (inference and others) against the served +model. + +For the common `fastapi` server implementation, the OpenAPI spec is available on +the `/docs` endpoint. + +HTTP Requests to the model-server can be made either with the corresponding +built-in client, or common HTTP clients, such as [`curl`](https://curl.se/) and +[`httpie`](https://httpie.io/) CLIs, or the +[`requests` python library](https://docs.python-requests.org/). + +## Options + +- `-r, --repo TEXT`: Path to MLEM repo [default: (none)] +- `--rev TEXT`: Repo revision to use [default: (none)] +- `-l, --load TEXT`: File to load server config from +- `-c, --conf TEXT`: Options for server in format `field.name=value` +- `-f, --file_conf TEXT`: File with options for server in format + `field.name=path_to_config` +- `--help`: Show this message and exit. + +## Example: FastAPI HTTP server + +Easily serve a model from a remote GitHub repository on a local FastAPI HTTP +server + +```cli +$ mlem serve https://github.com/iterative/example-mlem-get-started/rf fastapi --conf port=3000 +Starting fastapi server... +πŸ’… Adding route for /predict +πŸ’… Adding route for /predict_proba +πŸ’… Adding route for /sklearn_predict +πŸ’… Adding route for /sklearn_predict_proba +Checkout openapi docs at +INFO: Started server process [6083] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit) +``` diff --git a/content/docs/command-reference/types.md b/content/docs/command-reference/types.md new file mode 100644 index 00000000..bca7192d --- /dev/null +++ b/content/docs/command-reference/types.md @@ -0,0 +1,56 @@ +# types + +List different implementations available for a particular MLEM type. If a +subtype is not provided, list all available MLEM types. + +## Synopsis + +```usage +usage: mlem types [options] [abc] [sub_type] + +arguments: +[ABC] Subtype to list implementations. List subtypes if not provided +[SUB_TYPE] Type of `meta` subtype +``` + +## Description + +This command can be used to see all available MLEM classes, or to list the +different implementations available for a specific `SUB_TYPE` (argument). + +This can be useful, for example, to see which types of servers are supported for +hosting and serving a model (see [Examples](#examples)). + +Check out [MLEM ABCs](/doc/user-guide/mlem-abcs) for a list of abstract base +classes that subclass `mlem.core.base.MlemABC`. These constitute the building +blocks of MLEM, and can be subclassed to add new functionalities and +capabilities. + +## Options + +- `-h, --help`: Show this message and exit. + +## Examples + +List MLEM abstract base classes + +```cli +# List ABCs +$ mlem types +... +``` + +List available server implementations + +```cli +$ mlem types server +['rmq', 'heroku', 'fastapi'] +``` + +List configuration for a particular implementation + +```cli +$ mlem types server fastapi +[not required] host: str = "0.0.0.0" +[not required] port: int = 8080 +``` diff --git a/content/docs/contributing/core.md b/content/docs/contributing/core.md new file mode 100644 index 00000000..142192ea --- /dev/null +++ b/content/docs/contributing/core.md @@ -0,0 +1,176 @@ +# Contributing to MLEM + +We welcome contributions to [MLEM](https://github.com/iterative/mlem) by the +community. See the +[Contributing to the Documentation](/doc/user-guide/contributing/docs) guide if +you want to fix or update the documentation or this website. + +## How to report a problem + +Please search [issue tracker](https://github.com/iterative/mlem/issues) before +creating a new issue (problem or an improvement request). Feel free to add +issues related to the project. + +For problems with [mlem.ai](https://mlem.ai/) site please use this +[GitHub repository](https://github.com/iterative/mlem.ai/). + +If you feel that you can fix or implement it yourself, please read a few +paragraphs below to learn how to submit your changes. + +## Submitting changes + +- Open a new issue in the + [issue tracker](https://github.com/iterative/mlem/issues). +- Setup the [development environment](#development-environment) if you need to + run tests or [run](#running-development-version) the MLEM with your changes. +- Fork [MLEM](https://github.com/iterative/mlem.git) and prepare necessary + changes. +- [Add tests](#writing-tests) for your changes to `tests/`. You can skip this + step if the effort to create tests for your change is unreasonable. Changes + without tests are still going to be considered by us. +- [Run tests](#running-tests) and make sure all of them pass. +- Submit a pull request, referencing any issues it addresses. + +We will review your pull request as soon as possible. Thank you for +contributing! + +## Development environment + +Get the latest development version. Fork and clone the repo: + +```cli +$ git clone git@github.com:/mlem.git +``` + +Make sure that you have Python 3.7 or higher installed. On macOS, we recommend +using `brew` to install Python. For Windows, we recommend an official +[python.org release](https://www.python.org/downloads/windows/). + +> ℹ️ Note that `pip` version 20.3+ is required. + +Install MLEM in editable mode with `pip install -e ".[tests]"`. But before we do +that, we **strongly** recommend creating a +[virtual environment](https://python.readthedocs.io/en/stable/library/venv.html): + +```cli +$ cd mlem +$ python3 -m venv .env +$ source .env/bin/activate +$ pip install -e ".[tests]" +``` + +Install coding style pre-commit hooks with: + +```cli +$ pip install pre-commit +$ pre-commit install +``` + +That's it. You should be ready to make changes, run tests, and make commits! If +you experience any problems, please don't hesitate to ping us in our +[chat](/chat). + +## Writing tests + +We have unit tests in `tests/unit/` and functional tests in `tests/func/`. +Consider writing the former to ensure complicated functions and classes behave +as expected. + +For specific functionality, you will need to use functional tests alongside +[pytest](https://docs.pytest.org/en/latest/) fixtures to create a temporary +directory, Git and/or MLEM repo, and bootstrap some files. See the +[`dir_helpers` module](https://github.com/iterative/dvc/blob/master/tests/conftest.py) +for some usage examples. + +## Running tests + +The simplest way to run tests: + +```cli +$ cd mlem +$ python -m tests +``` + +This uses `pytest` to run the full test suite and report the result. At the very +end you should see something like this: + +```cli +$ python -m tests + +... + +============= 434 passed, 6 skipped, 8 warnings in 131.43 seconds ============== +``` + +Otherwise, for each failed test you should see the following output, to help you +identify the problem: + +```cli +... +[gw2] [ 84%] FAILED tests/unit/test_progress.py::TestProgressAware::test +tests/unit/test_prompt.py::TestConfirm::test_eof +tests/test_updater.py::TestUpdater::test +... +=================================== FAILURES =================================== +____________________________ TestProgressAware.test ____________________________ +... +======== 1 failed, 433 passed, 6 skipped, 8 warnings in 137.49 seconds ========= +``` + +You can pass any additional arguments to the script that will override the +default `pytest`'s scope: + +To run a single test case: + +```cli +$ python -m tests tests/func/test_metrics.py::TestCachedMetrics +``` + +To run a single test function: + +```cli +$ python -m tests tests/unit/utils/test_fs.py::test_get_inode +``` + +To pass additional arguments: + +```cli +$ python -m tests --pdb +``` + +## Code style guidelines (Python) + +We are using [PEP8](https://www.python.org/dev/peps/pep-0008/?) and checking +that our code is formatted with [black](https://github.com/ambv/black). + +For [docstrings](https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring), +we try to adhere by the +[Google Python Style Guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md#38-comments-and-docstrings). + +## Commit message format guidelines + +Format: + +``` +(component): (short description) + +(long description) + +Fixes #(GitHub issue id). +``` + +Message types: + +- _component_: If applicable, comma-separated list of affected component(s) +- _short description_: Short description of the patch +- _long description_: If needed, longer message describing the patch in more + details +- _github issue id_: ID of the GitHub issue that this patch is addressing + +Example: + +``` +remote: add support for Amazon S3 + +Fixes #123 +``` diff --git a/content/docs/contributing/docs.md b/content/docs/contributing/docs.md new file mode 100644 index 00000000..1bc0da16 --- /dev/null +++ b/content/docs/contributing/docs.md @@ -0,0 +1,221 @@ +# Contributing to the Documentation + +We welcome any contributions to our documentation repository, +[mlem.ai](https://github.com/iterative/mlem.ai). Contributions can be updates to +the documentation content, or (rare) changes to the JS engine we use to run the +website. + +In case of a minor change, you can use the **Edit on GitHub** button to open the +source code page. Use the Edit button (pencil icon) to edit the file in-place, +and then **Commit changes** from the bottom of the page. + +> Please see our +> [Writing a Blog Post guide](https://dvc.org/doc/user-guide/contributing/blog) +> for more details on how to write and submit a new blog post. + +## Structure of the project + +To contribute documentation, these are the relevant locations: + +- [Content](https://github.com/iterative/mlem.ai/tree/master/content/docs) + (`content/docs/`): + [Markdown](https://guides.github.com/features/mastering-markdown/) files. One + file β€” one page of the documentation. +- [Images](https://github.com/iterative/mlem.ai/tree/master/static/img) + (`static/img/`): Add new images (`.png`, `.svg`, etc.) here. Use them in + Markdown files like this: `![](/img/.gif)`. +- [Navigation](https://github.com/iterative/mlem.ai/tree/master/content/docs/sidebar.json) + (`content/docs/sidebar.json`): Edit it to add or change entries in the + navigation sidebar. + +Merging the appropriate changes to these files into the master branch is enough +to update the docs and redeploy the website. + +## Submitting changes + +- Find or open a new issue in the + [issue tracker](https://github.com/iterative/mlem.ai/issues) to let us know + that you are working on this. + +- Format the source code by following the + [style guidelines](#doc-style-guidelines-javascript-and-markdown) below. We + highly recommend setting up a + [development environment](#development-environment) as explained below. Among + other things, it can help format the documentation and JS code automatically. + +- Push the changes to your fork of + [mlem.ai](https://github.com/iterative/mlem.ai.git) and submit a PR to the + upstream repo. + +We will review your PR as soon as possible. Thank you for contributing! + +## Development environment + +We highly recommend running this web app locally to check documentation or blog +changes before submitting them, and it's quite necessary when making changes to +the website engine itself. Source code and content files need to be properly +formatted and linted as well, which is also ensured by the full setup below. + +Make sure you have [Python](https://www.python.org/downloads/) 3.7+, a recent +LTS version of [Node.js](https://nodejs.org/en/) (`>=14.0.0`, `<=16.x`), and +install [Yarn](https://yarnpkg.com/): + +> In Windows, you may need to install [Visual Studio Build Tools], and the +> [Windows SDK] first. + +[windows sdk]: + https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk/ +[visual studio build tools]: + https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2019 + +```cli +$ npm install -g yarn +``` + +Having cloned this project locally, navigate into the directory and install the +project dependencies with Yarn: + +```cli +$ yarn +``` + +Launch the server locally with: + +```cli +$ yarn develop +``` + +This will start the server on the default port, `8000`. Visit +`http://localhost:8000/` and navigate to the page in question. This will also +enable the pre-commit Git hook that will be formatting and linting your code and +documentation files automatically. + +### All commands + +These Node scripts are specified in the docs repo's `package.json` file. + +To build the project and run it: + +- `yarn develop` - run development server with hot reload. +- `yarn build` - build assets in the `public` directory. +- `yarn start` - run production static server over the `public` directory. + +> All the tests, formatting, and linters below will be enforced automatically +> upon [submitting PRs](#submitting-changes). + +If you change source code files, run tests: + +- `yarn test` - run tests. + +We use [Prettier](https://prettier.io/) to format our source code. Below is a +set of wrapper commands for your convenience: + +- `yarn format-check` - check all source and content files that they are + properly formatted. This script does not fix any found issue, only reports + them. +- `yarn format-all` - fix all found problems. +- `yarn format-staged` - same, but only on staged files. +- `yarn format ` - run this script `yarn format ` to format a + specific file. + +We use linters (e.g. [ESLint](https://eslint.org/)) to check source code style +and detect different errors: + +- `yarn lint-ts` - lint source code files (`.ts`, `.js`, `tsx`, etc). +- `yarn lint-css` - lint `.css` files. + +> Note that you can always use the formatter or linter directly (e.g. +> `yarn eslint ` or `yarn prettier --check `). + +### ENV variables + +Some environment variables are required to deploy this project to production, +others can be used to debug the project. Please check the production system +settings to see all the variables that production and deployment system depend +on. + +Some available variables: + +- `GA_ID` – ID of the Google Analytics counter. +- `ANALYZE` - boolean property to run + [webpack-analyzer](https://www.gatsbyjs.org/packages/gatsby-plugin-webpack-bundle-analyzer/). +- `SENTRY_DSN` - [Sentry](https://sentry.io/) URL for errors tracking. + +## Doc style guidelines (JavaScript and Markdown) + +Some of the following rules are applied automatically by a pre-commit Git hook +that is installed when `yarn` runs (see [dev env](#development-environment)). + +- No trailing white spaces are allowed. + +- Text content must be properly formatted at 80 symbols width. + + > πŸ’‘ We recommend using Visual Studio Code with the + > [Rewrap](https://marketplace.visualstudio.com/items?itemName=stkb.rewrap) + > plugin for help with this. + +- You can see the configuration of our formatter tool (Prettier) + [here](https://github.com/iterative/mlem.ai/blob/master/.prettierrc). You may + also run the formatting [commands](#all-commands) manually. + ([Advanced usage](https://prettier.io/docs/en/cli.html) of Prettier is + available through `yarn prettier ...`) + +- Markdown: Using `mlem `, the docs engine will create a link to that + command automatically. (No need to use `[]()` explicitly to create them.) + +- Markdown: Using `mlem.api.()` or `mlem.api`, the docs engine will + create a link to that API method automatically. (No need to use `[]()` + explicitly to create them.) + +- Markdown: Bullet lists shouldn't be too long (5-7 items max., ideally). + +- Markdown: The text in each bullet item also shouldn't be too long (3 sentence + paragraphs max.) Full sentence bullets should begin with a capital letter and + end in period `.`. Otherwise, they can be all lower case and have no ending + punctuation. Bullets can be separated by an empty line if they contain several + paragraphs, but this is discouraged: try to keep items short. + +- Markdown: Syntax highlighting in fenced code blocks should use the `usage` + `dvc`, `dvctable`, `yaml`, or `diff` custom languages. + - `usage` is employed to show the `mlem --help` output for each command + reference. + - `dvc` can be used to show examples of commands and their output in a + terminal session. + - `dvctable` is used for creating colored, bold, or italic table cells. (You + can see an [example](https://dvc.org/doc/start/experiments) of `dvctable` in + our "Get Started" section.) + - `yaml` is used to show samples of MLEM files, or other YAML + contents. + - `diff` is used mainly for examples of `git diff` output. + +> Check out the `.md` source code of any command reference to get a better idea, +> for example in +> [this very file](https://raw.githubusercontent.com/iterative/mlem.ai/master/content/docs/user-guide/contributing/docs.md). + +## General language guidelines + +We try to use a casual and fun tone in our docs. We also avoid authoritative +language such as "As you can see, clearly this is what happened, of course" etc. +which while good-intentioned, may scare readers off. + +We prefer general, human-friendly language rather than exact jargon as long as +it's correct. Example: avoid Git jargon such as _revision_ or _reference_, +preferring the more basic terms _commit_ or _version_. + +The [command reference](/doc/command-reference) contains some of our most +technical documents where specialized language is used the most, but even there, +we use expandable sections for complex implementation details. + +Start by writing the essence in simple terms, and complete it with +clarifications, edge cases, or other precisions in a separate iteration. + +We use **bold** text for emphasis, and _italics_ for special terms. + +We also use "emoji" symbols sparingly for visibility on certain notes. Mainly: + +- πŸ“– For notes that link to other related documentation +- ⚠️ Important warnings or disclaimers related to advanced MLEM usage +- πŸ’‘ Useful notes and tips, often related to external tools and integrations + +> Some other emojis currently in use here and there: βš‘βœ…πŸ™πŸ›β­βš™οΈ(ℹ️) (among +> others). diff --git a/content/docs/get-started/applying.md b/content/docs/get-started/applying.md new file mode 100644 index 00000000..3048ff4a --- /dev/null +++ b/content/docs/get-started/applying.md @@ -0,0 +1,102 @@ +# Applying models + +## Evaluating the model + +Now, we can use MLEM to apply the model against a dataset and calculate some +metrics: + +```py +# evaluate.py +import json + +from sklearn import metrics +from sklearn.datasets import load_iris + +from mlem.api import apply + + +def main(): + data, y_true = load_iris(return_X_y=True, as_frame=True) + y_pred = apply("rf", data, method="predict_proba") + roc_auc = metrics.roc_auc_score(y_true, y_pred, multi_class="ovr") + + with open("metrics.json", "w") as fd: + json.dump({"roc_auc": roc_auc}, fd, indent=4) + + + +if __name__ == "__main__": + main() + +``` + +Here we use the `apply` function that handles loading of the model for us. But +you can always load your model with [`mlem.api.load`](/doc/api-reference/load) +and call any method manually. + +Now, let's run the script + +```cli +$ python evaluate.py +$ cat metrics.json +{ + "roc_auc": 1.0 +} +``` + +
+ +### β›³ [Evaluation](https://github.com/iterative/example-mlem-get-started/tree/4-eval) + +```cli +$ git add metrics.json +$ git commit -m "Evaluate model" +$ git diff 4-eval +``` + +
+ +## Applying from CLI + +You can also apply your models directly from CLI. For that to work, your data +should be in a file that is supported by +[MLEM import](/doc/user-guide/importing) or you should have your +[dataset saved with MLEM ](/doc/user-guide/datasets). + +Let's create an example file and run `mlem apply` + +```cli +$ echo "sepal length (cm),sepal width (cm),petal length (cm),petal width (cm) +0,1,2,3" > new_data.csv +$ mlem apply rf new_data.csv -i --it pandas[csv] -o prediction +⏳️ Importing object from new_data.csv +⏳️ Loading model from .mlem/model/rf.mlem +🍏 Applying `predict` method... +πŸ’Ύ Saving dataset to .mlem/dataset/prediction.mlem +``` + +Or, if you save your dataset like this: + +```py +from sklearn.datasets import load_iris +from mlem.api import save + + +def main(): + data, _ = load_iris(return_X_y=True, as_frame=True) + save(data, "iris.csv") + + +if __name__ == '__main__': + main() +``` + +You can just reference it by name: + +```cli +$ mlem apply rf iris.csv -o prediction +⏳️ Loading dataset from .mlem/dataset/iris.csv.mlem +⏳️ Loading model from .mlem/model/rf.mlem +🍏 Applying `predict` method... +πŸ’Ύ Saving dataset to .mlem/dataset/prediction.mlem +``` diff --git a/content/docs/get-started/deploying.md b/content/docs/get-started/deploying.md new file mode 100644 index 00000000..229e55d7 --- /dev/null +++ b/content/docs/get-started/deploying.md @@ -0,0 +1,167 @@ +# Deploying models + +You can also create deployments in cloud from your models. + +> ⚠️ This functionality is experimental and is subject to change. We’ll add more +> target platforms in upcoming releases. + +Deployment often uses packaging and serving functionalities. For example, Heroku +deployment that is showcased in this section actually uses docker image +packaging with FastAPI serving. + +## Defining target environment + +To deploy something somewhere, we need to define this β€œsomewhere” first, or in +MLEM terms, create a `target environment` object. It will contain all the +information needed to access it. In the case of Heroku, all we need is an API +key. + +
+ +### βš™οΈHow to obtain Heroku API key + +- Go to [heroku.com](http://heroku.com) +- Sign up or login with existing account +- Go to account settings by clicking your profile picture on the main page +- Find API Key section and reveal existing one or re-generate it + +
+ +To create a new target env, run + +```cli +$ mlem create env heroku staging -c api_key= +πŸ’Ύ Saving env to .mlem/env/staging.mlem +``` + +> Note that api_key argument is optional and MLEM will use `HEROKU_API_KEY` env +> variable if you don’t provide it via config. + +## Defining deployment + +Now, as we defined our target env, we can deploy our model there. Deployments +are also MLEM objects, which means that they need to have their definition. To +create one for Heroku, we once again will use `create` command to configure our +deployment. + +```cli +$ mlem create deployment heroku myservice -c app_name=example-mlem-get-started -c model=rf -c env=staging +πŸ’Ύ Saving deployment to .mlem/deployment/service_name.mlem +``` + +> πŸ’‘ We use `example-mlem-get-started` for app_name, but you should change it to +> something unique. + +
+ +### β›³ [Create deployment definition](https://github.com/iterative/example-mlem-get-started/tree/5-deploy-meta) + +```cli +$ git add .mlem/env/staging.mlem .mlem/deployment/myservice.mlem +$ git commit -m "Add env and deploy meta" +$ git diff 5-deploy-meta +``` + +
+ +Now we can actually run the deployment process (this can take a while): + +```cli +$ mlem deploy create myservice +⏳️ Loading deployment from .mlem/deployment/myservice.mlem +πŸ”— Loading link to .mlem/env/staging.mlem +πŸ”— Loading link to .mlem/model/rf.mlem +πŸ’Ύ Updating deployment at .mlem/deployment/myservice.mlem +πŸ› Creating Heroku App example-mlem-get-started +πŸ’Ύ Updating deployment at .mlem/deployment/myservice.mlem +πŸ›  Creating docker image for heroku + πŸ’Ό Adding model files... + πŸ›  Generating dockerfile... + πŸ’Ό Adding sources... + πŸ’Ό Generating requirements file... + πŸ›  Building docker image registry.heroku.com/example-mlem-get-started/web... + βœ… Built docker image registry.heroku.com/example-mlem-get-started/web + πŸ”Ό Pushed image registry.heroku.com/example-mlem-get-started/web to remote registry at host registry.heroku.com +πŸ’Ύ Updating deployment at .mlem/deployment/myservice.mlem +πŸ›  Releasing app my-mlem-service formation +πŸ’Ύ Updating deployment at .mlem/deployment/myservice.mlem +βœ… Service example-mlem-get-started is up. You can check it out at https://my-mlem-service.herokuapp.com/ +``` + +> πŸ’‘ You can also create and configure deployment on-the-fly using `-c` options +> for `mlem deploy create` command: +> +> `$ mlem deploy create service_name -m model -t staging -c app_name=example-mlem-get-started` + +
+ +### β›³ [Service deployed](https://github.com/iterative/example-mlem-get-started/tree/8-deploy-create) + +```cli +$ git add .mlem/deployment/myservice.mlem +$ git commit -m "Deploy service" +$ git diff 8-deploy-service +``` + +
+ +## Making requests + +You can go [here](http://example-mlem-get-started.herokuapp.com) and see the +same OpenAPI documentation. For details on it, refer to the **Serving** section. +You can also try to do some requests: + +```py +from mlem.api import load +from mlem.runtime.client.base import HTTPClient + +client = HTTPClient(host="http://example-mlem-get-started.herokuapp.com", port=80) +res = client.predict(load("test_x.csv")) +``` + +Also, you can create a client using deployment meta object: + +```py +from mlem.api import load + +service = load("myservice") +client = service.state.get_client() +res = client.predict(load("test_x.csv")) +``` + +There is also the remote counterpart of `apply` command. It will send requests +to your service instead of loading model into memory. There are two options to +achieve this in CLI: using the service address or the deploy meta. + +```cli +$ mlem apply-remote http test_x.csv -c host=http://my-mlem-service.herokuapp.com -c port=80 --json +[1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0] + +$ mlem deploy apply myservice test_x.csv --json +[1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0] +``` + +> πŸ’‘ As always, you don’t need to have deployment meta locally: +> +> `$ mlem deploy apply https://github.com/iterative/example-mlem-get-started/myservice https://github.com/iterative/example-mlem-get-started/test_x.csv --json` + +## Managing deployment + +Finally, you can check the status of your service with: + +```cli +$ mlem deploy status myservice +running +``` + +And stop your service with + +```cli +$ mlem deploy teardown myservice +⏳️ Loading deployment from .mlem/deployment/myservice.mlem +πŸ”— Loading link to .mlem/env/staging.mlem +πŸ”» Deleting my-mlem-service heroku app +πŸ’Ύ Updating deployment at .mlem/deployment/myservice.mlem +``` + +Note, that it will not delete the deployment definition, just update its state. diff --git a/content/docs/get-started/index.md b/content/docs/get-started/index.md new file mode 100644 index 00000000..d8f60170 --- /dev/null +++ b/content/docs/get-started/index.md @@ -0,0 +1,100 @@ +--- +description: 'Learn how you can use MLEM to easily manage and deploy models' +--- + +# Get Started + +Assuming MLEM is already [installed](/doc/install) in your active python +environment, let's initialize it by running `mlem init` inside a Git project: + +
+ +### βš™οΈ Expand for setup instructions + +If you want to follow along with this tutorial and try MLEM, you can use our +[example repo](https://github.com/iterative/example-mlem-get-started). You'll +need to [fork] it first (so you can push models). Then clone it locally: + +[fork]: https://docs.github.com/en/get-started/quickstart/fork-a-repo + +```cli +$ git clone +$ cd example-mlem-get-started +``` + +Next let's create an isolated virtual environment to cleanly install all the +requirements (including MLEM) there: + +```cli +$ python3 -m venv .venv +$ source .venv/bin/activate +$ pip install -r requirements.txt +``` + +
+ +```cli +$ mlem init +``` + +A few [internal files](/doc/user-guide/project-structure) will be created: + +```cli +$ tree .mlem +.mlem +└─── config.yaml +``` + +Now you’re ready to MLEM! + +In our +[example repository](https://github.com/iterative/example-mlem-get-started), +you'll find tags for each step we take in the different sections of this +tutorial. You can just see what is going on there or reproduce everything +yourself and compare. In the different `Get Started` sections, those tags will +be marked with β›³Β emoji. Click on it to expand the section and see the `git` +commands to run if you are following along. Just like this Git tag that +concludes this section: + +
+ +# β›³ MLEM init + +Tag: +[1-mlem-init](https://github.com/iterative/example-mlem-get-started/tree/1-mlem-init) + +```cli +$ git add .mlem +$ git status +Changes to be committed: + new file: .mlem/config.yaml + ... +$ git commit -m "Initialize MLEM" +``` + +To compare your results with the tag you can also run the following + +```cli +$ git diff 1-mlem-init +``` + +The output will be empty if you have the same files staged/committed + +
+ +MLEM’s features can be grouped around those common functional use cases. We’ll +explore them one by one in the next few pages: + +- **[Saving models](/doc/get-started/saving)** (try this next) is the base layer + of MLEM for machine learning models and datasets. +- **[Applying models](/doc/get-started/applying)** explains how to load and + apply models +- **[Packaging models](/doc/get-started/packaging)** describes how models can be + built into python packages, docker images, etc. +- **[Serving models](/doc/get-started/serving)** shows how to create a service + from your model +- **[Deploying models](/doc/get-started/deploying)** shows how you can deploy + your model with MLEM. + +More examples on how to use MLEM in different scenarios can be found in +[Use Cases](/doc/use-cases) section diff --git a/content/docs/get-started/packaging.md b/content/docs/get-started/packaging.md new file mode 100644 index 00000000..6ffb708a --- /dev/null +++ b/content/docs/get-started/packaging.md @@ -0,0 +1,124 @@ +# Packaging models + +Saving and loading models is fun, but the real value of a model is how you can +use it. To make it easier to get models to production, MLEM has 3 related +functionalities: packaging, serving, and deploying. We’ll start with packaging. + +Packaging is a way to β€œbake” your model into something usable in production like +a Docker image, or export your model into another format. For this tutorial we +will create a pip-ready package from our model. You can see the full list of +available packagers [here](/doc/user-guide/mlem-abcs#packager). + +## Creating python package + +To create a `build/` directory with pip package run this command: + +```cli +$ mlem pack rf pip -c target=build/ -c package_name=example_mlem_get_started +⏳️ Loading model from .mlem/model/rf.mlem +πŸ’Ό Written `example_mlem_get_started` package data to `build` +``` + +In this command, we specified that we want to build `rf` model with `pip` +packager and then provided two arguments, `target` is the directory where the +packager will write all the files and `package_name` is the name of our package. + +
+ +### βš™οΈ About packagers and arguments + +There are more types of packagers and each one has it’s own set of available +arguments. They are listed [here](/doc/user-guide/mlem-abcs#packager), but for +quick reference you can run `mlem types packager` for list of packagers and +`mlem types packager pip` for list of available arguments + +
+ +## Exploring python package + +Let’s see what we’ve got + +```cli +$ tree build/ +build/ +β”œβ”€β”€ MANIFEST.in +β”œβ”€β”€ example_mlem_get_started +β”‚Β Β  β”œβ”€β”€ __init__.py +β”‚Β Β  β”œβ”€β”€ model +β”‚Β Β  └── model.mlem +β”œβ”€β”€ requirements.txt +└── setup.py +``` + +As you can see, the packager generated all the files necessary for a python +package. This includes sources, requirements, +[setup.py](https://docs.python.org/3/distutils/setupscript.html), and the model +itself. + +## Using python package + +Now you can distribute and install the package. It's code declares all the same +methods our model had, so you can try to use it like this: + +```py +import example_mlem_get_started + +example_mlem_get_started.predict(df) +``` + +## Pre-configured packagers + +Alternatively, you can pre configure your packager in the form of yaml file +either manually or via `mlem create` command which uses the same interface with +multiple `-c` options like this: + +```cli +$ mlem create packager pip pip_config \ + -c target=build/ -c package_name=example_mlem_get_started +πŸ’Ύ Saving packager to .mlem/packager/pip_config.mlem +$ cat .mlem/packager/pip_config.mlem +object_type: packager +package_name: example_mlem_get_started +target: build/ +type: pip +``` + +Now you can use this config as a value for `--load` option in `mlem pack` + +```cli +$ mlem pack rf -l pip_config +⏳️ Loading packager from .mlem/packager/pip_config.mlem +⏳️ Loading model from .mlem/model/rf.mlem +πŸ’Ό Written `example_mlem_get_started` package data to `build` +``` + +
+ +### β›³ [Add packager config](https://github.com/iterative/example-mlem-get-started/tree/5-pack) + +```cli +$ git add .mlem/packager/pip_config.mlem +$ git commit -m "Add package config" +$ git diff 5-pack +``` + +
+ +Also, you can do all of this programmatically via Python API: + +```py +from mlem.api import pack, load_meta + +pack("pip", "rf", target="build", package_name="example_mlem_get_started") +pack(load_meta("pip_config"), "rf") +``` + +
+ +### βš™οΈ Remote packager config + +Like every other MLEM object, packagers can be read from remote repos. Try + +`mlem pack rf -l https://github.com/iterative/example-mlem-get-started/pip_config` + +
diff --git a/content/docs/get-started/saving.md b/content/docs/get-started/saving.md new file mode 100644 index 00000000..7b4dccd6 --- /dev/null +++ b/content/docs/get-started/saving.md @@ -0,0 +1,212 @@ +# Saving models + +After initializing MLEM we have an empty repository (except for the config +file), but soon we'll save something with MLEM to fill it up. + +## Training the model + +To save models with MLEM you just need to use +[`mlem.api.save`](/doc/api-reference/save) method instead of some other way you +saved your model before. Let's take a look at the following python script: + +```py +# train.py +from sklearn.datasets import load_iris +from sklearn.ensemble import RandomForestClassifier + +from mlem.api import save + + +def main(): + data, y = load_iris(return_X_y=True, as_frame=True) + rf = RandomForestClassifier( + n_jobs=2, + random_state=42, + ) + rf.fit(data, y) + + save( + rf, + "rf", + sample_data=data, + description="Random Forest Classifier", + ) + + +if __name__ == "__main__": + main() + +``` + +Here we load well-known iris dataset with sklearn and train a simple classifier. +But instead of pickling the model we saved it with MLEM. + +Now let's run this script and see how we save the model. + +```cli +$ python train.py +$ tree .mlem/model/ +.mlem/model +β”œβ”€β”€ rf +└── rf.mlem +``` + +> By default, MLEM saves your files to `.mlem/` directory, but that could be +> changed, see [project structure](/doc/user-guide/project-structure) for +> reference. + +What we see here is that model was saved along with some metadata about it: `rf` +containing the model binary and `.mlem` file containing metadata. Let's take a +look at it: + +
+ +### `$ cat .mlem/model/rf.mlem` + +```yaml +artifacts: + data: + hash: 59440b4398b8d45d8ad64d8d407cfdf9 + size: 993 + uri: logreg +model_type: + methods: + predict: + args: + - name: data + type_: + columns: + - '' + - sepal length (cm) + - sepal width (cm) + - petal length (cm) + - petal width (cm) + dtypes: + - int64 + - float64 + - float64 + - float64 + - float64 + index_cols: + - '' + type: dataframe + name: predict + returns: + dtype: int64 + shape: + - null + type: ndarray + predict_proba: + args: + - name: data + type_: + columns: + - '' + - sepal length (cm) + - sepal width (cm) + - petal length (cm) + - petal width (cm) + dtypes: + - int64 + - float64 + - float64 + - float64 + - float64 + index_cols: + - '' + type: dataframe + name: predict_proba + returns: + dtype: float64 + shape: + - null + - 3 + type: ndarray + sklearn_predict: + args: + - name: X + type_: + columns: + - '' + - sepal length (cm) + - sepal width (cm) + - petal length (cm) + - petal width (cm) + dtypes: + - int64 + - float64 + - float64 + - float64 + - float64 + index_cols: + - '' + type: dataframe + name: predict + returns: + dtype: int64 + shape: + - null + type: ndarray + sklearn_predict_proba: + args: + - name: X + type_: + columns: + - '' + - sepal length (cm) + - sepal width (cm) + - petal length (cm) + - petal width (cm) + dtypes: + - int64 + - float64 + - float64 + - float64 + - float64 + index_cols: + - '' + type: dataframe + name: predict_proba + returns: + dtype: float64 + shape: + - null + - 3 + type: ndarray + type: sklearn +object_type: model +requirements: + - module: sklearn + version: 1.0.2 + - module: pandas + version: 1.4.1 + - module: numpy + version: 1.22.3 +``` + +
+ +It's a bit long, but we can see all that we need to use the model later: + +1. Model methods: `predict` and `predict_proba` +2. Input data schema that describes the DataFrame with the iris dataset +3. Requirements: `sklearn`, `numpy`, `pandas` with particular versions we need + to run this model. + +> Note that we didn't specify requirements: MLEM investigates the object you're +> saving (even if it's a complex one) and finds out all requirements needed. + +
+ +### β›³ Train + +Tag: +[2-train](https://github.com/iterative/example-mlem-get-started/tree/2-train) + +```cli +$ git add .mlem/model +$ git commit -m "Train the model" +$ git diff 2-train +``` + +
diff --git a/content/docs/get-started/serving.md b/content/docs/get-started/serving.md new file mode 100644 index 00000000..de96b429 --- /dev/null +++ b/content/docs/get-started/serving.md @@ -0,0 +1,81 @@ +# Serving models + +For online serving, you can create a server from your model. We will try out +FastAPI server. All available server implementations are listed +[here](/doc/user-guide/mlem-abcs#server). + +## Running server + +To start up FastAPI server simply run: + +```cli +$ mlem serve rf fastapi +⏳️ Loading model from .mlem/model/rf.mlem +Starting fastapi server... +πŸ’… Adding route for /predict +πŸ’… Adding route for /predict_proba +πŸ’… Adding route for /sklearn_predict +πŸ’… Adding route for /sklearn_predict_proba +Checkout openapi docs at +INFO: Started server process [2917] +INFO: Waiting for application startup. +INFO: Application startup complete. +INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) +``` + +Servers automatically create endpoints from model methods with payload schemas +corresponding to serialized dataset types. + +## Making requests + +You can open Swagger UI (OpenAPI) at +[http://localhost:8080/docs](http://localhost:8080/docs) to check out OpenAPI +spec and query examples. + +Each server implementation also has its client implementation counterpart, in +the case of FastAPI server it’s HTTPClient. Clients can be used to make requests +to servers. Since a server also exposes the model interface description, the +client will know what methods are available and handle serialization and +deserialization for you. You can use them via CLI: + +```cli +$ mlem apply-remote http test_x.csv -c host="0.0.0.0" -c port=8080 --json +[1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0] +``` + +or via Python API: + +```py +from mlem.api import load +from mlem.runtime.client.base import HTTPClient + +client = HTTPClient(host="localhost", port=8080) +res = client.predict(load("test_x.csv")) +``` + +
+ +### πŸ’‘ Or query the model directly with curl + +```cli +$ curl -X 'POST' \ + 'http://localhost:8080/predict_proba' \ + -H 'accept: application/json' \ + -H 'Content-Type: application/json' \ + -d '{ + "data": { + "values": [ + { + "": 0, + "sepal length (cm)": 0, + "sepal width (cm)": 0, + "petal length (cm)": 0, + "petal width (cm)": 0 + } + ] + } + }' +[[0.92,0.04,0.04]] +``` + +
diff --git a/content/docs/index.md b/content/docs/index.md index 7af2064a..19216928 100644 --- a/content/docs/index.md +++ b/content/docs/index.md @@ -1,11 +1,41 @@ # MLEM Documentation -**MLEM** is a tool to help you version and deploy your Machine Learning models. -It turns your Git repository into an easy-to-use model registry. +**MLEM** is a tool to easily package, deploy and serve Machine Learning models. +It seamlessly supports a variety of scenarios like real-time serving and batch +processing. -Have a centralized place to store your models along with all metadata and easily -turn them into python packages, docker images or services and deploy them to the -cloud. +> πŸ’‘ When combined with [GTO](https://github.com/iterative/gto), MLEM allows you +> to create a powerful Model Registry out of your Git repository! Such a +> registry serves as a centralized place to store and operationalize your models +> along with their metadata; manage model life-cycle, versions & releases, and +> easily automate tests and deployments using GitOps. -This section is currently just a stub, but keep an eye on this page as we're -getting a full set of docs out very soon! + + + + A step-by-step introduction into basic MLEM features + + + + Study the detailed inner-workings of MLEM in its user guide. + + + + Non-exhaustive list of scenarios MLEM can help with + + + + See all of MLEM's commands. + + + + +βœ… Please join our [community](/community) or use the [support](/support) +channels if you have any questions or need specific help. We are very responsive +⚑. + +βœ… Check out our [GitHub repository](https://github.com/iterative/mlem) and give +us a ⭐ if you like the project! + +βœ… Contribute to MLEM [on GitHub](https://github.com/iterative/mlem) or help us +improve this [documentation](https://github.com/iterative/mlem.ai) πŸ™. diff --git a/content/docs/install.md b/content/docs/install.md new file mode 100644 index 00000000..36521154 --- /dev/null +++ b/content/docs/install.md @@ -0,0 +1,30 @@ +# Installation + +To check whether MLEM is installed in your environment, run `which mlem`. To +check which version is installed, run `mlem --version`. + +## Install as a Python library + +MLEM is a Python library. You can install it with a package manager like +[pip](https://pypi.org/project/pip/) or +[Conda](https://docs.conda.io/en/latest/), or as a Python +[requirement](https://pip.pypa.io/en/latest/user_guide/#requirements-files). + + + +We **strongly** recommend creating a [virtual environment] or using [pipx] to +encapsulate your local environment. + +[virtual environment]: https://python.readthedocs.io/en/stable/library/venv.html +[pipx]: + https://packaging.python.org/guides/installing-stand-alone-command-line-tools/ + + + +```cli +$ pip install mlem +``` + + + + diff --git a/content/docs/sidebar.json b/content/docs/sidebar.json index 2dfc2440..f2c7a376 100644 --- a/content/docs/sidebar.json +++ b/content/docs/sidebar.json @@ -2,7 +2,293 @@ { "slug": "", "label": "MLEM Documentation", - "source": "index.md", "icon": "house" + }, + { + "slug": "install", + "label": "Installation" + }, + { + "slug": "get-started", + "label": "Get Started", + "children": [ + { + "slug": "saving", + "label": "Saving models", + "source": "get-started/saving.md" + }, + { + "slug": "applying", + "label": "Applying models", + "source": "get-started/applying.md" + }, + { + "slug": "packaging", + "label": "Packaging models", + "source": "get-started/packaging.md" + }, + { + "slug": "serving", + "label": "Serving models", + "source": "get-started/serving.md" + }, + { + "slug": "deploying", + "label": "Deploying models", + "source": "get-started/deploying.md" + } + ] + }, + { + "slug": "use-cases", + "label": "Use Cases", + "children": [ + { + "slug": "dvc", + "label": "Versioning MLEM objects with DVC", + "source": "use-cases/dvc.md" + }, + { + "slug": "mlem-mr", + "label": "Pure Mlem Model Registry", + "source": "use-cases/mlem-mr.md" + }, + { + "slug": "cicd", + "label": "Using in CI/CD", + "source": "use-cases/cicd.md" + }, + { + "slug": "model-registry", + "label": "Model Registry", + "source": "use-cases/model-registry.md" + } + ] + }, + { + "slug": "user-guide", + "label": "User Guide", + "children": [ + { + "slug": "basic-concepts", + "label": "Basic concepts", + "source": "user-guide/basic-concepts.md" + }, + { + "slug": "project-structure", + "label": "Project structure", + "source": "user-guide/project-structure.md" + }, + { + "slug": "configuration", + "label": "Configuration", + "source": "user-guide/configuration.md" + }, + { + "slug": "importing", + "label": "Importing existing files", + "source": "user-guide/importing.md" + }, + { + "slug": "linking", + "label": "Links", + "source": "user-guide/linking.md" + }, + { + "slug": "mlem-abcs", + "label": "MLEM ABCs", + "source": "user-guide/mlem-abcs.md" + }, + { + "slug": "extending", + "label": "Extending", + "source": "user-guide/extending.md" + }, + { + "slug": "analytics", + "label": "Anonymized Usage Analytics", + "source": "user-guide/analytics.md" + } + ] + }, + { + "slug": "command-reference", + "label": "Command Reference", + "children": [ + { + "slug": "init", + "label": "init", + "source": "command-reference/init.md" + }, + { + "slug": "list", + "label": "list", + "source": "command-reference/list.md" + }, + { + "slug": "pprint", + "label": "pprint", + "source": "command-reference/pprint.md" + }, + { + "slug": "create", + "label": "create", + "source": "command-reference/create.md" + }, + { + "slug": "serve", + "label": "serve", + "source": "command-reference/serve.md" + }, + { + "slug": "deploy", + "label": "deploy", + "children": [ + { + "slug": "apply", + "label": "deploy apply", + "source": "command-reference/deploy/apply.md" + }, + { + "slug": "create", + "label": "deploy create", + "source": "command-reference/deploy/create.md" + }, + { + "slug": "status", + "label": "deploy status", + "source": "command-reference/deploy/status.md" + }, + { + "slug": "teardown", + "label": "deploy teardown", + "source": "command-reference/deploy/teardown.md" + } + ] + }, + { + "slug": "types", + "label": "types", + "source": "command-reference/types.md" + }, + { + "slug": "link", + "label": "link", + "source": "command-reference/link.md" + }, + { + "slug": "clone", + "label": "clone", + "source": "command-reference/clone.md" + }, + { + "slug": "import", + "label": "import", + "source": "command-reference/import.md" + }, + { + "slug": "pack", + "label": "pack", + "source": "command-reference/pack.md" + }, + { + "slug": "apply", + "label": "apply", + "source": "command-reference/apply.md" + }, + { + "slug": "apply-remote", + "label": "apply-remote", + "source": "command-reference/apply-remote.md" + } + ] + }, + { + "slug": "api-reference", + "label": "Python API Reference", + "children": [ + { + "slug": "init", + "label": "init()", + "source": "api-reference/init.md" + }, + { + "slug": "save", + "label": "save()", + "source": "api-reference/save.md" + }, + { + "slug": "load", + "label": "load()", + "source": "api-reference/load.md" + }, + { + "slug": "load_meta", + "label": "load_meta()", + "source": "api-reference/load_meta.md" + }, + { + "slug": "ls", + "label": "ls()", + "source": "api-reference/ls.md" + }, + { + "slug": "import_object", + "label": "import_object()", + "source": "api-reference/import_object.md" + }, + { + "slug": "link", + "label": "link()", + "source": "api-reference/link.md" + }, + { + "slug": "clone", + "label": "clone()", + "source": "api-reference/clone.md" + }, + { + "slug": "apply", + "label": "apply()", + "source": "api-reference/apply.md" + }, + { + "slug": "apply_remote", + "label": "apply_remote()", + "source": "api-reference/apply_remote.md" + }, + { + "slug": "pack", + "label": "pack()", + "source": "api-reference/pack.md" + }, + { + "slug": "serve", + "label": "serve()", + "source": "api-reference/serve.md" + }, + { + "slug": "deploy", + "label": "deploy()", + "source": "api-reference/deploy.md" + } + ] + }, + { + "slug": "contributing", + "label": "Contributing", + "source": false, + "children": [ + { + "slug": "core", + "label": "MLEM Core Project", + "source": "contributing/core.md" + }, + { + "slug": "docs", + "label": "Docs and Website", + "source": "contributing/docs.md" + } + ] } ] diff --git a/content/docs/use-cases/cicd.md b/content/docs/use-cases/cicd.md new file mode 100644 index 00000000..a0499fda --- /dev/null +++ b/content/docs/use-cases/cicd.md @@ -0,0 +1,101 @@ +# Continuous Integration and Deployment for Machine Learning + +Applying DevOps methodologies to machine learning (MLOps) and data management +(DataOps) is increasingly common. This means resource orchestration +(provisioning servers for model training), model testing (validating model +inference), and model deployment to production, as well as monitoring & +feedback. MLEM provides you a simple way to publish or deploy your machine +learning models in CI/CD pipelines. + +- **Packaging and publishing models**: It is a common case when you need to wrap + your machine learning model into some specific format and publish it in some + registry. Examples include turning your ML model into a Python package and + publishing it on PyPi, or building a docker image and pushing it to DockerHub, + or just exporting your model to ONNX and publishing it as a artifact to + Artifactory. + +- **Deploying models**: Another common scenario is when you want to deploy your + model in your CI/CD pipeline. MLEM can help you with that by providing a + number of ready-to-use integrations with popular deployment platforms. + +## Examples + +### Package and publish + +To trigger the publishing or deploying of a new version, you usually create a +Git tag that kicks off CI process. To make packaging and deployment process +consistent you can create and commit MLEM declaration: + +```cli +$ mlem create packager pip -c package_name=mypackagename -c target=package pack-to-pip +πŸ’Ύ Saving packager to pack-to-pip.mlem +``` + +And then use that declaration in CI: + +```yaml +# .github/workflows/publish.yml +name: publish-my-model + +on: + push: tags + +jobs: + run: + runs-on: [ubuntu-latest] + + steps: + - uses: actions/checkout@v2 + + - uses: actions/setup-python@v2 + + - name: pack + run: | + pip3 install -r requirements.txt + mlem pack my-model --load pack-to-pip.mlem + + - name: publish + run: | + sh upload_to_pypi.sh package +``` + +Learn more about packaging in [Get Started](/doc/get-started/packaging). + +### Deploy + +Example with deployment is quite similar. First you need to create environment +and deployment declaration and commit them to Git: + +```cli +$ mlem create env heroku staging +πŸ’Ύ Saving env to staging.mlem + +$ mlem create deployment heroku myservice -c app_name=mlem-deployed-in-ci -c model=my-model -c env=staging +πŸ’Ύ Saving deployment to myservice.mlem +``` + +Then create and commit CI pipeline, e.g. in GH Actions: + +```yaml +# .github/workflows/publish.yml +name: publish-my-model + +on: + push: tags + +jobs: + run: + runs-on: [ubuntu-latest] + + steps: + - uses: actions/checkout@v2 + + - uses: actions/setup-python@v2 + + - name: pack + run: | + pip3 install -r requirements.txt + mlem deploy my-model --load myservice.mlem +``` + +Learn more about deploying in Get Started [Get Started](/doc/deploying). diff --git a/content/docs/use-cases/dvc.md b/content/docs/use-cases/dvc.md new file mode 100644 index 00000000..b05abba8 --- /dev/null +++ b/content/docs/use-cases/dvc.md @@ -0,0 +1,143 @@ +# Versioning MLEM objects with DVC + +
+ +### βš™οΈ Expand for setup instructions + +If you want to follow along with this tutorial and try MLEM, you can use our +[example repo](https://github.com/iterative/example-mlem-get-started). + +```shell +$ git clone https://github.com/iterative/example-mlem-get-started +$ cd example-mlem-get-started +$ git checkout 1-dvc-mlem-init +``` + +Next let's create an isolated virtual environment to cleanly install all the +requirements (including MLEM) there: + +```shell +$ python3 -m venv .venv +$ source .venv/bin/activate +$ pip install -r requirements.txt +``` + +
+ +Often it’s a bad idea to store binary files in Git, especially big ones. To +solve this MLEM can utilize DVC capabilities to connect external cloud storage +for model and dataset versioning. + +> You can learn more about DVC [here](https://dvc.org/doc). + +We will reorganize our example repo to use DVC. + +## Setting up repo + +First, let’s initialize DVC and add a remote (we will use azure, but you can use +whatever is available to you): + +```cli +$ dvc init +$ dvc remote add myremote -d azure://example-mlem +$ git add .dvc/config +``` + +Now, we also need to setup MLEM so it knows to use DVC. + +```cli +$ mlem config set default_storage.type dvc +βœ… Set `default_storage.type` to `dvc` in repo . +``` + +Also, let’s add `.mlem` files to `.dvcignore` so that metafiles are ignored by +DVC + +```cli +$ echo "/**/?*.mlem" > .dvcignore +$ git add .dvcignore +``` + +## Saving objects + +Next, let’s remove artifacts from Git and re-save them, so MLEM can use new +storage for them. You don't need to change a single line of code + +```cli +$ git rm -r --cached .mlem/ +$ python train.py +``` + +Finally, let’s add new metafiles to Git and artifacts to DVC respectively, +commit and push them + +```cli +$ dvc add .mlem/model/rf .mlem/dataset/*.csv +$ git add .mlem +$ git commit -m "Switch to dvc storage" +$ dvc push -r myremote +$ git push +``` + +Now, you can load MLEM objects from your repo even though there are no actual +binaries stored in Git. MLEM will know to use DVC to load them. + +β›³ +[Switch to DVC](https://github.com/iterative/example-mlem-get-started/tree/4-dvc-save-models) + +# Using MLEM in DVC Pipeline + +DVC pipelines are the useful DVC mechanism to build data pipelines, in which you +can process your data and train your model. You may be already training your ML +models in them and what to start using MLEM to save those models. + +MLEM could be easily plug in into existing DVC pipelines. If you already added +`.mlem` files to `.dvcignore`, you are good to go for most of the cases. Since +DVC will ignore `.mlem` files, you don't need to add them as outputs and mark +them with `cache: false`. + +It becomes a bit more complicated when you need to add them as outputs, because +you want to use them as inputs to next stages. The case may be when model binary +doesn't change for you, but model metadata does. That may happen if you change +things like model description or labels. + +To work with that, you'll need to remove `.mlem` files from `.dvcignore` and +mark your outputs in DVC Pipeline with `cache: false`. + +## Example + +You may have a simple pipeline in which you train your model, like this: + +```yaml +# dvc.yaml +stages: + train: + cmd: python train.py models/rf + deps: + - train.py + outs: + - models/rf +``` + +Next step would be to start saving your models with MLEM. Since MLEM saves both +**binary** and **metadata** you need to have both of them in DVC pipeline: + +```yaml +# dvc.yaml +stages: + train: + cmd: python train.py models/rf + deps: + - train.py + outs: + - models/rf + - models/rf.mlem: + cache: false +``` + +Since binary was already captured before, we don't need to add anything for it. +For metadata, we've added two rows to capture it and specify `cache: false` +since we want the metadata to be committed to Git, and not be pushed to DVC +remote. + +Now MLEM is ready to be used in your DVC pipeline! diff --git a/content/docs/use-cases/index.md b/content/docs/use-cases/index.md new file mode 100644 index 00000000..9c2d0445 --- /dev/null +++ b/content/docs/use-cases/index.md @@ -0,0 +1,36 @@ +# Use Cases + +We provide short articles on common data science scenarios that MLEM can help +with or improve. You can combine different scenarios for even more awesomeness. + +Our use cases are not written to be run end-to-end like tutorials. For more +general, hands-on experience with MLEM, please see +[Get Started](/doc/get-started) instead. + +## Why MLEM? + +Even with all the success we've seen today in machine learning, data scientists +and machine learning engineers still lack a simple way to deploy their models in +fast and easily manageable way. This is a critical challenge: while ML +algorithms and methods are no longer tribal knowledge, they are still difficult +to serve, scale and maintain in production. + +## Basic uses of MLEM + +If you train Machine Learning models and you want to + +- save machine learning models along with all meta-information that is required + to run them; +- pack your models into ready-to-use format like Python packages or Docker + Images; +- deploy your models, easily switching between different providers when you need + to; +- adopt engineering tools and best practices in data science projects; + +MLEM is for you! + +> We keep reviewing our docs and will include interesting scenarios that surface +> in the community. Please, contact us if you need help or have suggestions! + +Please choose from the navigation sidebar to the left, or click the Next button +below β†˜ diff --git a/content/docs/use-cases/mlem-mr.md b/content/docs/use-cases/mlem-mr.md new file mode 100644 index 00000000..2adfe9e4 --- /dev/null +++ b/content/docs/use-cases/mlem-mr.md @@ -0,0 +1,106 @@ +# Pure Mlem Model Registry + +If your Data Science team have a lot of different projects, it doesn't make +sense to develop them in a single repository. But for production it's good to +have a single source of truth to know what is deployed. + +[Mlem Links](/doc/user-guide/linking) can be used to create a separate Model +Registry repository, which will consist only of links to objects in developers +repositories. + +This way your deployment system doesn't need to know of every developer +repository. + +Also, you can use different branches in MR repo to employ Git flow processes. + +And via configuring permissions for this repo you can approve new model versions +for production. + + + +## Example + +Let's build an example using +[repository from Get Started](https://github.com/iterative/example-mlem-get-started). + +That repo already have some models in it: + +```cli +$ mlem ls https://github.com/iterative/example-mlem-get-started +``` + +```yaml +Datasets: + - test_x.csv + - test_y.csv + - train.csv +Models: + - rf +Deployments: + - myservice +Packagers: + - pip_config +Envs: + - staging +``` + +Let's create new repo first: + +```cli +$ mkdir links-mr +$ cd links-mr +$ git init +$ mlem init +``` + +Let's create some links to them: + +```cli +$ mlem link --sr https://github.com/iterative/example-mlem-get-started --rev main rf first-model +⏳️ Loading meta from https://github.com/iterative/example-mlem-get-started/tree/main/.mlem/model/rf.mlem +πŸ’Ύ Saving link to .mlem/link/first-model.mlem + +$ mlem link --sr https://github.com/iterative/example-mlem-get-started --rev 7-deploy-meta rf second-model +⏳️ Loading meta from https://github.com/iterative/example-mlem-get-started/tree/7-deploy-meta/.mlem/model/rf.mlem +πŸ’Ύ Saving link to .mlem/link/second-model.mlem +``` + +We've just linked two models from the other repo. You can see both if you run: + +```cli +$ mlem ls +``` + +```yaml +Models: + - first-model -> .mlem/model/rf + - second-model -> .mlem/model/rf +``` + +Let's check out each link: + +```cli +$ cat .mlem/link/first-model.mlem +link_type: model +object_type: link +path: .mlem/model/rf.mlem +repo: https://github.com/iterative/example-mlem-get-started/ +rev: main + +$ cat .mlem/link/second-model.mlem +link_type: model +object_type: link +path: .mlem/model/rf.mlem +repo: https://github.com/iterative/example-mlem-get-started/ +rev: 7-deploy-meta +``` + +Now you can commit those links, push the repo and use it as a model registry: + +```cli +$ git add .mlem/link/first-model.mlem .mlem/link/second-model.mlem +$ git commit -m "Add links to models" +``` diff --git a/content/docs/use-cases/model-registry.md b/content/docs/use-cases/model-registry.md new file mode 100644 index 00000000..efaa00a5 --- /dev/null +++ b/content/docs/use-cases/model-registry.md @@ -0,0 +1,64 @@ +# Machine Learning Model Registry + +A **model registry** is a tool to catalog ML models and their versions. Models +from your data science projects can be discovered, tested, shared, deployed, and +audited from there. [DVC](/doc), [GTO], and [MLEM] enable these capabilities on +top of Git, so you can stick to en existing software engineering stack. No more +divide between ML engineering and operations! + + + +[gto]: https://github.com/iterative/gto +[mlem]: https://mlem.ai/ + +ML model registries give your team key capabilities: + +- Collect and organize model [versions] from different sources effectively, + preserving their data provenance and lineage information. +- Share metadata including [metrics and plots][mp] to help use and evaluate + models. +- A standard interface to access all your ML artifacts, from early-stage + [experiments] to production-ready models. +- Deploy specific models on different environments (dev, shadow, prod, etc.) + without touching the applications that consume them. +- For security, control who can manage models, and audit their usage trails. + +[versions]: /doc/use-cases/versioning-data-and-model-files +[mp]: /doc/start/metrics-parameters-plots +[experiments]: /doc/user-guide/experiment-management + +Many of these benefits are built into DVC: Your [modeling process] and +[performance data][mp] become **codified** in Git-based DVC +repositories, making it possible to reproduce and manage models with +standard Git workflows (along with code). Large model files are stored +separately and efficiently, and can be pushed to [remote storage] -- a scalable +access point for [sharing]. + + + +See also [Data Registry](/doc/use-cases/data-registry). + + + +To make a Git-native registry (on top of DVC or not), one option is to use [GTO] +(Git Tag Ops). It tags ML model releases and promotions, and links them to +artifacts in the repo using versioned annotations. This creates abstractions for +your models, which lets you **manage their lifecycle** freely and directly from +Git. + +And to **productionize** the models, you can save and package them with the +[MLEM] Python API or CLI, which automagically captures all the context needed to +distribute them. It can store model files on the cloud (by itself or with DVC), +list and transfer them within locations, wrap them as a local REST server, or +even containerize and deploy them to cloud providers! + +This ecosystem of tools from [Iterative](https://iterative.ai/) brings your ML +process into [GitOps]. This means you can manage and deliver ML models with +software engineering methods such as continuous integration (CI/CD), which can +sync with the state of the artifacts in your registry. + +[modeling process]: /doc/start/data-pipelines +[remote storage]: /doc/command-reference/remote +[sharing]: /doc/start/data-and-model-access +[via cml]: https://cml.dev/doc/cml-with-dvc +[gitops]: https://www.gitops.tech/ diff --git a/content/docs/user-guide/analytics.md b/content/docs/user-guide/analytics.md new file mode 100644 index 00000000..582861f0 --- /dev/null +++ b/content/docs/user-guide/analytics.md @@ -0,0 +1,50 @@ +# Anonymized Usage Analytics + +To help us better understand how MLEM is used and improve it, MLEM captures and +reports _anonymized_ usage statistics. You will be notified the first time you +run `mlem init`. + +## Motivation + +Analytics help us to decide on how best to design future features and prioritize +current work. Anonymous aggregates of user analytics allow us to prioritize +fixes and improvements based on how, where and when people use MLEM. + +## Retention period + +User and event data have a 14 month retention period. + +## What + +MLEM's analytics record the following information per event: + +- MLEM version (e.g., `0.1.2+5fb5a3.mod`) and OS version (e.g., `MacOS 10.16`) +- Command name and exception type (e.g., `ls, ValueError` or + `get, MLEMRootNotFound`) +- Country, city (e.g., `RU, Moscow`) +- A random user_id (e.g. `8ca59a29-ddd9-4247-992a-9b4775732aad`), generated with + [`uuid`](https://docs.python.org/3/library/uuid.html) + +This _does not allow us to track individual users_ but does enable us to +accurately measure user counts vs. event counts. + +## Implementation + +The code is viewable in +[analytics.py](https://github.com/iterative/mlem/mlem/analytics.py). They are +done in a separate background process and fail fast to avoid delaying any +execution. They will fail immediately and silently if you have no network +connection. + +MLEM's analytics are sent through Iterative's proxy to Google BigQuery over +HTTPS. + +## Opting out + +MLEM analytics help the entire community, so leaving it on is appreciated. +However, if you want to opt out of MLEM's analytics, you can disable it via +setting an environment variable `MLEM_NO_ANALYTICS=true` or by adding +`no_analytics: true` to `.mlem/config.yaml` + +This will disable it for the project. We'll add an option to opt out globally +soon. diff --git a/content/docs/user-guide/basic-concepts.md b/content/docs/user-guide/basic-concepts.md new file mode 100644 index 00000000..ee6fba26 --- /dev/null +++ b/content/docs/user-guide/basic-concepts.md @@ -0,0 +1,121 @@ +# Basic concepts + +## MLEM Objects + +The most important concept in MLEM is **MLEM Object**. Basically, MLEM is a +library to create, manage and use different **MLEM Objects**, such as models, +datasets and other types you can read about below. + +> So, when you use `save` API method, you create MLEM Object from an arbitrary +> supported Python object. + +> Also, MLEM Objects can be created with +> [`mlem create`](/doc/command-reference/create) CLI command + +MLEM Objects are saved as `.mlem` files in `yaml` format. Sometimes they can +have other files attached to them, in that case we call `.mlem` file as a +"metadata file" or "metafile" and all the other files we call "artifacts". + +Typically, if **MLEM Object** have only one artifact, it will have the same name +without `.mlem` extension, for example `model.mlem` + `model`, or `data.csv` + +`data.csv.mlem`. + +If **MLEM Object** have multiple artifacts, they will be stored in a directory +with the same name, for example `model.mlem` + `model/data.pkl` + +`model/data2.pkl`. + +
+ +### Implementation details + +From a developer's perspective, MLEM Objects are instances of one of the +subclasses of `MlemMeta` class. MLEM is using extended +[pydantic](https://pydantic-docs.helpmanual.io/) functionality to save and load +them from files. + +You can get `MlemMeta` instance if you use `load_meta` API method instead of +simple `load`. + +See also [MLEM Object API](/doc/api-reference/mlem-object) + +
+ +## Common fields + +Each MLEM Object has `object_type` field which determines the type of the +object. Different types have different additional fields and methods, but all +MLEM Objects have the following fields: + +- `description` - for storing user-provided description +- `params` - arbitrary object with additional parameters +- `tags` - list of string tags + +> Also, when you load MLEM Object via API, it will have `location` field that +> holds information from where you loaded this object + +You can check out what methods MLEM Objects have in +[API Reference](/doc/api-reference/mlem-object) + +## MLEM Object Types + +Here are all the builtin MLEM Object types + +Model and Datasets are special types that can have artifacts, so they have two +additional fields: + +- `artifacts` - a string-to-artifacts mapping, where artifact is an instance of + [`Artifact`](/doc/user-guide/mlem-abcs#artifact) which represents a file + stored somewhere (local/cloud/dvc cache etc) +- `requirements` - a list of + [`Requirement`](/doc/user-guide/mlem-abcs#requirement) which are needed to use + that object in runtime + +### Model + +Represents an ML model, but can be generalized to any model or even any +"function" or any "transformation", thanks to `callable` +[ModelType](/doc/user-guide/mlem-abcs#modeltype). + +**Base class**: `mlem.core.objects.ModelMeta` + +**Fields** (in addition to inherited): + +- `model_type` (_lazy_) - [ModelType](/doc/user-guide/mlem-abcs#modeltype), + which is polymorphic and holds metadata about model's framework, methods and + io. + +### Dataset + +Represent a dataset, which can be used as an input to one of Model's methods. + +**Base class**: `mlem.core.objects.DatasetMeta` + +**Fields** (in addition to inherited): + +- `reader` (_lazy_) - [DatasetReader](/doc/user-guide/mlem-abcs#datasetreader) - + how to read saved files and resulting dataset metadata +- `dataset` (_transient_) - + [`DatasetType`](/doc/user-guide/mlem-abcs#datasettype) with dataset value and + metadata (available once data is read) + +### Link + +Represents a link (pointer) to another MLEM Object. More on that +[here](/doc/user-guide/linking) + +**Base class**: `mlem.core.objects.MlemLink` + +**Fields** (in addition to inherited): + +- `path` - path to MLEM Object +- `repo` - location of MLEM Repo with referenced object +- `rev` - revision of the object +- `link_type` - type of the referenced object + +### Other types + +Some of the `MLEM ABCs` are also MLEM Objects. + +- [Packager](/doc/user-guide/mlem-abcs#packager) +- [Target Environment](/doc/user-guide/mlem-abcs#targetenvmeta) +- [Deployment](/doc/user-guide/mlem-abcs#deploymeta) diff --git a/content/docs/user-guide/configuration.md b/content/docs/user-guide/configuration.md new file mode 100644 index 00000000..5597d084 --- /dev/null +++ b/content/docs/user-guide/configuration.md @@ -0,0 +1,34 @@ +# Configuration + +## Ways to set + +MLEM uses `.mlem/config.yaml` file to load configuration from, but it can be +overridden (or set) via corresponding env variable with `MLEM_` prefix. + +Also, [`mlem config`](/doc/command-reference/config) allows you to manipulate +config. + +## Options + +- `log_level` - logging level to use. Default `INFO` +- `debug` - whether to run MLEM in debug mode. Sets `log_level` to `DEBUG`. + Default `False` +- `no_analytics` - whether to stop collecting usage telemetry. Default `False` +- `default_storage` - where to store saved artifacts by default. Should be + serialized `Storage` instance. Default is `LocalStorage`, which means save + artifacts locally. +- `default_external` - whether to save objects as + [external](/docs/user-guide/project-structure#External) by default. Default is + `False` +- `emojis` - whether to show πŸ’…πŸ¦‰πŸ€©πŸ‡ͺπŸ‡²πŸ…ΎοΈπŸ‡―β„ΉοΈπŸ‡ΈπŸ€©πŸ¦‰πŸ’… in CLI output. Default βœ… +- `additional_extensions` - comma-separated list of extension modules to + force-load on MLEM import. +- `autoload_exts` - turn on + [dynamic extension loading](/doc/user-guide/extending#extension-dynamic-loading). + Default `True` + +## Extension config + +Different MLEM extensions can provide additional options that you also can set +via `.mlem/config.yaml` file. Please refer to corresponding extension +documentation. diff --git a/content/docs/user-guide/datasets.md b/content/docs/user-guide/datasets.md new file mode 100644 index 00000000..413013e4 --- /dev/null +++ b/content/docs/user-guide/datasets.md @@ -0,0 +1,115 @@ +# WIP Woring with datasets + +## Getting the data + +The first step is to get some data. For this tutorial, we’ll just generate it. +Let's take a look at this python script: + +```py +# prepare.py +from mlem.api import save +from sklearn.datasets import load_iris +from sklearn.model_selection import train_test_split + +def main(): + data, y = load_iris(return_X_y=True, as_frame=True) + data["target"] = y + train_data, test_data = train_test_split(data, random_state=42) + save(train_data, "train.csv") + save(test_data.drop("target", axis=1), "test_x.csv") + save(test_data[["target"]], "test_y.csv") + +if __name__ == "__main__": + main() +``` + +Here we load the well-known iris dataset with sklearn, and then save parts of it +with MLEM. For now, we just save them locally and push them to Git later. + +Let's execute this script and see what was produced: + +```cli +$ python prepare.py +$ tree .mlem/dataset/ +.mlem/dataset/ +β”œβ”€β”€ test_x.csv +β”œβ”€β”€ test_x.csv.mlem +β”œβ”€β”€ test_y.csv +β”œβ”€β”€ test_y.csv.mlem +β”œβ”€β”€ train.csv +└── train.csv.mlem +``` + +What we see here is that every DataFrame was saved along with some metadata +about it. Let's see one example: + +```cli +$ head -5 .mlem/dataset/train.csv +,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target +4,5.0,3.6,1.4,0.2,0 +32,5.2,4.1,1.5,0.1,0 +142,5.8,2.7,5.1,1.9,2 +85,6.0,3.4,4.5,1.6,1 +``` + +
+ +### `$ cat .mlem/dataset/train.csv.mlem` + +```yaml +artifacts: + data: + hash: add43029d2b464d0884a7d3105ef0652 + size: 2459 + uri: train.csv +object_type: dataset +reader: + dataset_type: + columns: + - '' + - sepal length (cm) + - sepal width (cm) + - petal length (cm) + - petal width (cm) + - target + dtypes: + - int64 + - float64 + - float64 + - float64 + - float64 + - int64 + index_cols: + - '' + type: dataframe + format: csv + type: pandas +requirements: + - module: pandas + version: 1.4.2 +``` + +
+ +We can see here what was saved: dataset schema and requirements on the libraries +which were used to save the dataset. That doesn't mean you can't read that +`train` any other way, but if you would use MLEM to load it, it would know that +it needs pandas to do that for you. + +
+ +### β›³ [Data prepared](https://github.com/iterative/example-mlem-get-started/tree/2-prepare) + +```cli +$ git add .mlem +$ git commit -m "Added data" +$ git diff 2-prepare +``` + +
+ +# --------- + +> Note that we didn't specify whether the saved dataset was `pd.DataFrame`, +> `np.array` or `tf.Tensor`. MLEM is getting that for you, and this handy magic +> extends to ML models πŸ‘‹ diff --git a/content/docs/user-guide/extending.md b/content/docs/user-guide/extending.md new file mode 100644 index 00000000..3f62fd03 --- /dev/null +++ b/content/docs/user-guide/extending.md @@ -0,0 +1,82 @@ +# Extending + +MLEM can be extended to support more model types, datasets, servers, packagers +and basically everything listed [here](/doc/user-guide/mlem-abcs). Most of the +builtin implementations are also extensions located in `mlem.contrib` package. +It allows MLEM to not load their code if it is not used, which is especially +cool because it means their requirements are optional. + +## Implementing MlemABC + +You can start extending MLEM by subclassing any of the `MlemAbc` subclass that +you need. + +
+ +### You can even try to add new `MlemObject` type or new `MlemAbc` interface + +But no one tried it so far ;) + +
+ +Your subclass should implement all the abstract methods of the base class. + +Also, it needs to define `type: ClassVar[str]` class field, which will be used +as an alias for your implementation. + +
+ +### Default `type` value + +By default, `type` will have `.` value, but that's not very +handy to type in cli, e.g. you'll need to run +`mlem serve model my_awesome_package.submodule_of_my_awesome_package.abstract.bean.factory.MyAwesomeServerImplementation` +instead of `mlem serve model ΡŠΡƒΡŠ` if you don't set `type: ClassVar = "ΡŠΡƒΡŠ"` for +your class + +
+ +## Entry points + +For MLEM to know about your implementations, you need to register them via +[entry points](https://packaging.python.org/en/latest/specifications/entry-points/) +in your `setup.py`. + +You should list all of them in the form +`{abs_name}.{type} = {module_path}:{class_name}` under `mlem.contrib` entry +point key, where + +- `abs_name` is `MlemABC.abs_name` of the interface you are implementing +- `type` is a value of `type` field name of your class +- `module path` is full path to python module +- `class name` is the name of your class + +You can see examples in MLEM's +[setup.py](https://github.com/iterative/mlem/blob/main/setup.py) + +## Extension dynamic loading + +By default, when you import MLEM or run MLEM cli commands, MLEM will not load +any extensions to minimize overhead. But that would mean that users will have to +import them manually, and we don't want that. MLEM can load extensions +dynamically, depending on what is imported in user's environment. For example, +`sklearn` extension will be loaded in one of the following cases: + +1. When user imported mlem, `sklearn` module was already imported +2. After importing mlem, user imported `sklearn` +3. User loaded any object that uses any of `sklearn` extension implementation. + +> Note that some of the fields in MlemObjects are lazy, which mean they will be +> loaded only if users accesses them. + +## Subclassing MlemConfig + +As part of your extension, you also can have some configuration options. For +that you can subclass `MlemConfig` class and list your options there just like +any `pydantic` +[BaseSettings](https://pydantic-docs.helpmanual.io/usage/settings/) class. In +the inner `Config` class you should set `section` option, and after that values +for your configuration will be loaded from `.mlem/config.yaml` from +corresponding section. See +[`PandasConfig`](https://github.com/iterative/mlem/blob/main/mlem/contrib/pandas.py) +for example diff --git a/content/docs/user-guide/importing.md b/content/docs/user-guide/importing.md new file mode 100644 index 00000000..9ca7b311 --- /dev/null +++ b/content/docs/user-guide/importing.md @@ -0,0 +1,13 @@ +# Importing existing files + +If you already have your models/datasets saved, but want to use them as MLEM +Objects, you can use [`mlem import`](/doc/command-reference/import) or +[`mlem.api.import_object`](/doc/api-reference/import_object) commands. + +They will try to load the path you provided and analyze the object saved there. + +> Obviously, importing is more limited than `save` API, since MLEM do not have +> live python object to analyze and tries to recreate it, which may fail. + +You can see list of available import implementations +[here](/doc/user-guide/mlem-abcs#importhook). diff --git a/content/docs/user-guide/index.md b/content/docs/user-guide/index.md new file mode 100644 index 00000000..f357ad75 --- /dev/null +++ b/content/docs/user-guide/index.md @@ -0,0 +1,11 @@ +# User Guide + +Our guides describe the major concepts in MLEM and how it works comprehensively, +explaining when and how to use what, as well as inter-relationship between them. + +The topics here range from more foundational (impacting many parts of MLEM) to +more specific and advanced things you can do. We also include a few misc. +guides, for example related to [contributing to MLEM](/doc/contributing) itself. + +Please choose from the navigation sidebar to the left, or click the `Next` +button below β†˜ diff --git a/content/docs/user-guide/linking.md b/content/docs/user-guide/linking.md new file mode 100644 index 00000000..9097e6a6 --- /dev/null +++ b/content/docs/user-guide/linking.md @@ -0,0 +1,35 @@ +# Links + +Another powerful feature of MLEM if MLEM Links. Links are special lightweight +MLEM Objects that represent MLEM Objects in different locations. That means you +can [reference](/doc/user-guide/project-structure#referencing-mlem-objects) +links everywhere you need to specify MLEM Object. + +> Since MLEM Links are also a type of MLEM Objects, they share the same logic, +> for example they are saved under `.mlem/link` directory. To load and instance +> of `MlemLink` (and not the object it references) provide `follow_links=False` +> to `load_meta` method. + +## Link structure + +The contents of the link is very lightweight and consist of the following +fields: + +- `link_type` - type of referenced object +- location fields (except `fs`) as in + [here](/doc/user-guide/project-structure#referencing-mlem-objects) +- [Common MLEM Object fields](/doc/user-guide/basic-concepts#common-fields), + including `object_type="link""` + +## Using links + +Links can be created via [`mlem link`](/doc/command-reference/link) or +[`mlem.api.link`](/doc/api-reference/link) commands, as well as +`MlemMeta.make_link` method. + +> You can create relative links inside the same repository, which will basically +> create an alias for that object. + +Also, since links can target specific commits, tags or branches in a versioned +repository, they can be used in a variety of different scenarios, for example to +create a [centralized Model Registry](/doc/use-cases/mlem-mr). diff --git a/content/docs/user-guide/mlem-abcs.md b/content/docs/user-guide/mlem-abcs.md new file mode 100644 index 00000000..60ef56b6 --- /dev/null +++ b/content/docs/user-guide/mlem-abcs.md @@ -0,0 +1,323 @@ +# MLEM ABCs + +MLEM has a number of abstract base classes that anyone can implement to +[extend](/doc/user-guide/extending) to add new capabilities to MLEM. + +
+ +### Internal details + +Each abstract base class in this list is a subclass of `mlem.core.base.MlemABC` +class, which is a subclass of pydantic `BaseModel` with additional polymorphic +magic. + +That means that all subclasses are also `BaseModel`s and should be serializable. +This way MLEM can save/load them as part of the other objects or configure them +via `-c` notation in CLI. + +
+ +> Fields marked as **transient** are used to hold some operational object and +> are not saved when you dump the objects. After loading objects with such +> fields they will be empty until you somehow "load" the object. + +> Fields marked as **lazy** are used to hold implementation-related objects and +> are not deserialized right away when you load parent object. This helps avoid +> `ImportError` if you do not have dependencies required for undelying +> implementation, or just to avoid unneccessary imports. The field value will be +> loaded when you try to access it. If you don't want to load it, you can access +> unserialized data in `_raw` field. + +Here is the list of all MLEM ABCs. + +# General + +## MlemMeta + +Represents a **[MLEM Object](/doc/user-guide/basic-concepts)** + +**Base class**: `mlem.core.objects.MlemMeta` + +For more info and list of subtypes look +[here](/doc/user-guide/basic-concepts#mlem-object-types) + +## Requirement + +Represents different types of requirements for MLEM Object. + +**Base class**: `mlem.core.requirements.Requirement` + +Implementations: + +- `installable` - a python requirement typcally installed through `pip`. Can + have specific version and alternative package name. Default type +- `custom` - a python requirement in a form of a local `.py` file or a python + package. Contains name and source code for the module/package +- `unix` - unix package typically installed through `apt` or `yum` + +## ImportHook + +Represents some file format that MLEM can try to +[import](/doc/user-guide/importing). + +**Base class**: `mlem.core.import_objects.ImportHook` + +Implementations: + +- `pickle` - simply unpickle the contens of file and use default MLEM object + analyzer. Works with pickle files +- `pandas` - try to read a file into `pandas.DataFrame`. Works with files saved + with pandas in formats like + `csv, json, excel, parquet, feather, stata, html, parquet`. Some formats + require additional dependencies. + +# Models + +## ModelType + +This class is basically a wrapper for all Model classes of different libraries. +Yes, yet another standard. If you want to add support for your ML Model in MLEM, +this is what you implement! + +**Base class**: `mlem.core.model.ModelType` + +> This class is polymorphic, which means it can have more fields depending on +> implementation. + +**Fields**: + +- `io` - an instance of [`ModelIO`](#modelio), a way to save and load the model +- `method` - a string-to-signature mapping which holds information about + available model methods +- `model` (_transient_) - will hold the actual model object, if it was loaded + +There are implementations of this class for all supported libraries: `xgboost`, +`catboost`, `lightgbm`, `torch`, `sklearn`. + +The one notable implementation is `callable`: it treats any python callable +object as a model with a single method `__call__`. That means you can turn +functions and class methods into MLEM Models as well! + +## ModelIO + +Represents a way that model can be saved and loaded. A required field of +`ModelType` class. If a ML library has it's own way to save and load models, it +goes here. + +**Base class**: `mlem.core.model.ModelIO` + +There are implementations for all supported libraries: `torch_io`, `xgboost_io`, +`lightgbm_io`, `catboost_io` + +Also, universal `simple_pickle` is available, which simply pickles the model +(used by sklearn, for example). + +There is also separate `pickle` implementation, which can detect other model +types inside your object and use their IO's for them. This is very handy when +you for example wrap your torch NN with a python function: the function part +will be pickled, and NN will be saved using `torch_io` + +# Datasets + +## DatasetType + +Hold metadata about dataset, like type, dimensions, column names etc. + +**Base class**: `mlem.core.dataset_type.DatasetType` + +**Fields**: + +- `data` (transient) - underlying dataset object, if it was read + +**Implementations**: + +Python: + +- `primitive` - any of the python primitives +- `tuple` - a tuple of objects, each can have different type +- `list` - a list of objects, but they should be the same type +- `tuple_like_list` - a list of objects, each can have different type +- `dict` - a dictionary, each key can have different type + +Pandas: + +- `dataframe` - `pd.DataFrame`. Holds info about columns, their types and + indexes +- `series` - `pd.Series`. Holds info about columns, their types and indexes + +Numpy: + +- `ndarray` - `np.ndarray`. Holds info about type and dimensions +- `number` - `np.number`. Holds info about type + +ML Libraries: + +- `xgboost_dmatrix` - `xgboost.DMatrix`. Holds info about feature names and + their types +- `lightgbm` - `lightgbm.Dataset`. Holds information about inner data object + (dataframe or ndarray) +- `torch` - `torch.Tensor`. Holds information about type and dimensions + +Special: + +- `unspecified` - Special dataset type when no dataset info was provided + +## DatasetReader + +Holds all the information needed to read dataset. + +**Base class**: `mlem.core.dataset_type.DatasetReader` + +**Fields**: + +- `dataset_type` - resulting dataset_type + +**Implementations**: + +- `pandas` +- `numpy` + +## DatasetWriter + +Writes datasets to files, producing a list of `Artifact` and corresponding +[`DatasetReader`](#datasetreader) + +**Base class**: `mlem.core.dataset_type.DatasetWriter` + +**Implementations**: + +- `pandas` +- `numpy` + +# Storage + +## Artifact + +Represents a file save in some storage. + +**Base class**: `mlem.core.artifacts.Artifact` + +**Implementations**: + +- `local` - local file +- `fsspec` - file in remote file system +- `dvc` - file in dvc cache + +## Storage + +Defines where the artifacts will be written. Produces corresponding `Artifact` +instances. + +**Base class**: `mlem.core.artifacts.Storage` + +**Implementations**: + +- `local` - store files on the local file system +- `fsspec` - store files in some remote file system +- `dvc` - store files locally, but try to read them from DVC cache if they are + absent + +# Runtime + +## Interface + +Represents an interface for service runtime. Provides a mapping method name to +its signature. Also provides executor functions for those methods. + +**Base class**: `mlem.runtime.interface.base.Interface` + +**Implementations**: + +- `simple` - base class for interfaces created manually. Will expose subclass + methods marked with `@expose` decorator. +- `model` - dynamically create interface from [`ModelType`](#modeltype) + +## Server + +Runs configured interface, exposing its methods as endpoints. + +**Base class**: `mlem.runtime.server.base.Server` + +**Implementations**: + +- `fastapi` - starts `FastAPI` server +- `rmq` - creates a queue in `RabbitMQ` instance and a consumer for each + interface method + +## Client + +Clients for corresponding servers + +**Base class**: `mlem.runtime.client.base.BaseClient` + +**Implementations**: + +- `http` - makes request for http servers like `fastapi` +- `rmq` - client for `rmq` server + +# Packing + +## Packager + +Declaration for creating a `Package` from model. You can learn more about +packaging [here](/doc/get-started/packaging) + +**Base class**: `mlem.pack.base.Packager` + +Related commands: [API](/doc/api-reference/pack), +[CLI](/doc/command-reference/pack) + +**Implementations**: + +Python packages: + +- `pip` - create a directory with python package from model +- `whl` - create a `.whl` file with python package + +Docker: + +- `docker_dir` - create a directory with context for docker image building +- `docker` - build a docker image from model + +# Deployment + +## TargetEnvMeta + +Declaration of target environment for deploying models. + +**Base class**: `mlem.core.objects.TargetEnvMeta` + +**Implementations**: + +- `heroku` - an account on heroku platform + +## DeployMeta + +Declaration and state of deployed model. + +**Base class**: `mlem.core.objects.DeployMeta` + +Related commands: [API](/doc/api-reference/deploy), +[CLI](/doc/command-reference/deploy) + +**Fields**: + +- `env_link` - link to targeted environment +- `env` (_transient_) - loaded targeted environment +- `model_link` - link to deployed model object +- `model` (_transient_) - loaded model object +- `state` - deployment state + +**Implementations**: + +- `heroku` - app deployed to Heroku platform + +## DeployState + +Represents state of the deployment + +**Base class**: `mlem.core.objects.DeployState` + +**Implementations**: + +- `heroku` - state of the deployed Heroku app diff --git a/content/docs/user-guide/project-structure.md b/content/docs/user-guide/project-structure.md new file mode 100644 index 00000000..febac298 --- /dev/null +++ b/content/docs/user-guide/project-structure.md @@ -0,0 +1,71 @@ +# Project structure + +## MLEM Repo + +MLEM can work with any `.mlem` files anywhere, but if you are using Git it is +worth to turn your repo into a **MLEM Repo**. + +Having a **MLEM Repo** will allow you to save config options and index your +objects. Also it will bring some structure to your project and help you address +objects more easily. + +> Of course, you can create MLEM Repo even without Git, because actually any +> path with a `.mlem` directory is considered **MLEM Repo** whether it is local, +> on GitHub or on some cloud file storage. + +Once you have **MLEM Repo**, you will be able to use API and CLI commands that +require it like `mlem ls` and `mlem config`. + +## mlem init + +To create **MLEM Repo**, simply run [`mlem init`](/doc/command-reference/init) +or [`mlem.api.init`](/doc/api-reference/init). It accepts path as an argument, +which defaults to current directory. + +It will create `.mlem` directory and an empty `config.yaml` file inside. You can +learn more about configuration [here](/doc/user-guide/configuration). + +## External objects + +By default, any objects that you save into repo will be **internal**, which +means they will be saved under `.mlem/{object type}/`. + +If you don't want this behavior, you can specify `external` flag when saving or +set `default_external` to `True` via configuration. After that saved objects +will be **external** and they will be saved under the path you specify. + +Also, they will be indexed via links under `.mlem/link/`. +That is needed for MLEM to keep track of all MLEM Objects in the repo. + +> You can also turn this off via `link=False` flag, but in that case your object +> will not appear in `mlem ls` output for example. + +## Referencing MLEM Objects + +Everywhere you need to reference any saved MLEM Object, you can do so by +providing those arguments: + +- `path` is path to object +- `repo` is repository to look in. This is optional +- `rev` is revision of the repository, also optional +- `fs` (API-only) fsspec FileSystem implementation to use + +All of those are saved in `location` field of a MLEM Object. + +If you didn't provide `repo` and/or `rev`, MLEM will try to deduce them from +`path`. `fs` is also can be deduced from `repo` or `path`. Also, if you are +referencing object in **MLEM Repo**, you can omit `.mlem/{object_type}` from +`path`. + +Here is the example of how the same object can be referenced + +- `path = rf, repo = https://github.com/iterative/example-mlem-get-started, rev=main` - + classic +- `path = .mlem/model/rf, repo = https://github.com/iterative/example-mlem-get-started, rev=main` - + can also provide full path +- `path = https://github.com/iterative/example-mlem-get-started/tree/main/rf` - + everything could be provided via path (depends on implementation) +- `path = https://github.com/iterative/example-mlem-get-started/.mlem/model/rf` - + also can omit `tree/main` since `main` is default. +- `path = rf, fs = GithubFileSystem(org="iterative", repo="example-mlem-get-started", sha="main")` - + API only, can provide pre-configured fs diff --git a/content/docs/user-guide/remote-repos.md b/content/docs/user-guide/remote-repos.md new file mode 100644 index 00000000..4d014796 --- /dev/null +++ b/content/docs/user-guide/remote-repos.md @@ -0,0 +1,128 @@ +# WIP Working with repositories and remote objects + +
+ +### 🧳 Requirements + +We need to install DVC since model binaries in the remote example repo are +stored in the cloud remote with DVC’s help. In another section we’ll show how +MLEM works with DVC in more details. + +`pip install dvc[s3]` + +
+ +## Listing objects + +Since we've saved the data and the model in the repository, let's list them: + +```cli +$ mlem ls +``` + +```yaml +Datasets: + - test_x.csv + - test_y.csv + - train.csv +Models: + - rf +``` + +Note that we are actually listing models and data which is saved in the +repository we're in. + +But what if they are stored in a remote Git repository, and we don't want to +clone it? MLEM can also work with remote repositories: + +```cli +$ mlem ls https://github.com/iterative/example-mlem-get-started --type model +``` + +```yaml +Models: + - rf +``` + +We also can use URL addresses to load models from remote repositories directly: + +```py +from mlem.api import load + +model = load("https://github.com/iterative/example-mlem-get-started/rf") +# or +model = load( + "rf", + repo="https://github.com/iterative/example-mlem-get-started", + rev="main" +) +``` + +If we just want to download the model to a local disk to use it later, we can +run `clone` command + +```cli +$ mlem clone https://github.com/iterative/example-mlem-get-started/rf ml_model +``` + +The other way to do it is to run + +```cli +$ mlem clone rf --repo https://github.com/iterative/example-mlem-get-started --rev main ml_model +``` + +
+ +### πŸ’‘ Expand to use your own repo + +We use [example repo](https://github.com/iterative/example-mlem-get-started) in +the commands, but you can create your own repo and use it if you want. + +To push your models and datasets to the repo, add them to Git and commit + +```cli +$ git add .mlem *.py +$ git commit -am "committing mlem objects and code" +$ git push +``` + +
+ +## Cloud remotes + +If you don’t have the need to version your models, but you want to store your +objects in some remote location, you can use MLEM with any cloud/remote +supported by +[fsspec](https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations), +e.g. s3. + +To do so, use paths with corresponding file system protocol and path like +`s3:///` + +```cli +$ mlem init s3://example-mlem-get-started +$ mlem clone rf s3://example-mlem-get-started/rf +⏳️ Loading meta from .mlem/model/rf.mlem +🐏 Cloning .mlem/model/rf.mlem +πŸ’Ύ Saving model to s3://example-mlem-get-started/.mlem/model/rf.mlem +``` + +Now you can load this model via API or use it in CLI commands just like if it +was local: + +```py +from mlem.api import load +model = load("rf", repo="s3://example-mlem-get-started") +``` + +```cli +$ mlem apply rf --repo s3://example-mlem-get-started test_x.csv --json +[1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0] +``` + +TL;DR: we've just + +1. Listed all MLEM models in the Git repo, +2. Loaded model from Git repo directly, +3. Initialized MLEM in remote bucket and worked with just like with a regular + folder. diff --git a/content/docs/user-guide/what-is-mlem.md b/content/docs/user-guide/what-is-mlem.md new file mode 100644 index 00000000..b81d33b7 --- /dev/null +++ b/content/docs/user-guide/what-is-mlem.md @@ -0,0 +1,13 @@ +# What is MLEM + +**MLEM** is an open-source python tool, providing an easy and flexible way to +package and serve Machine Learning models. + +**MLEM** allows you transform your models into python modules to use +programmatically, or as fully deployable services packaged in easily shippable +docker images. + +**MLEM** defines standard interfaces and formats for datasets and models, and a +modular, flexible design. This allows supporting a wide variety of model types +and deployment targets, enabling you to easily serve models locally or on any +cloud infrastructure. diff --git a/src/components/NavBar/OtherToolsPopup/index.tsx b/src/components/NavBar/OtherToolsPopup/index.tsx index c94f26bd..3fee3e4b 100644 --- a/src/components/NavBar/OtherToolsPopup/index.tsx +++ b/src/components/NavBar/OtherToolsPopup/index.tsx @@ -42,8 +42,7 @@ const otherToolsPopupData: Array<{ { title: 'MLEM', icon: MlemSVG, - description: - 'Open-source model registry and deployment tool for ML projects', + description: 'Open-source tool to simplify ML model deployment', href: '/' } ] diff --git a/static/img/ml_model_registry.jpg b/static/img/ml_model_registry.jpg new file mode 100644 index 00000000..248f72e5 Binary files /dev/null and b/static/img/ml_model_registry.jpg differ