We need a process to keep track of the model. It's not recommendable to use just the file manager.
We need to use this command to disable auto-logging
mlflow.xgboost.autolog(disable=True)
Using mlflow logging utilities:
# Log the model as an artifact
mlflow.log_artifact("models/my_best_model.b", artifact_path="my_best_model")
# Log the model as and mlflow model
mlflow.xgboost.log_model(booster, artifact_path="models_mlflow")
we can obtain the following information.
# Where the model is stored
artifact_path: models_mlflow
flavors:
# We can use our model as a python function
python_function:
data: model.xgb
env: conda.yaml
loader_module: mlflow.xgboost
# python version
python_version: 3.9.12
# We can also use it as a xgboost object
xgboost:
code: null
data: model.xgb
model_class: xgboost.core.Booster
# xbg library version
xgb_version: 1.6.1
mlflow_version: 1.26.0
model_uuid: cb835e881b7a410a9f4e1332a5cd4433
run_id: 99a8073207d14d0f86bae40f3489cc44
utc_time_created: '2022-06-04 11:09:59.226073'
Apart from the above we also have access to a python_env.yaml
and conda.yaml
to recreate the environments that were used to train and test the model.
Clearly this is very powerful and useful to keep track of our model and reproduce it in a controlled environment.
By logging our model as an mlflow model we can load our saved model in two different ways like this:
logged_model = 'runs:/99a8073207d14d0f86bae40f3489cc44/models_mlflow'
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
#Load model as an xgboost model.
xgboost_model = mlflow.xgboost.load_model(logged_model)
This is very important because we can save a model from any framework into the mlflow format and then load them easily. Mlflow serves as an interface between the ML frameworks. This then facilitates deploying models to different services like Spark, Sagemaker, etc.