Skip to content

Latest commit

 

History

History
54 lines (43 loc) · 2.94 KB

EXPERIMENT_TRACKING_NOTES.md

File metadata and controls

54 lines (43 loc) · 2.94 KB

Experiment Tracking and MLflow

Module Overview:

  • This module focuses on experiment tracking.
  • Introduction to MLflow, a tool used for managing the ML lifecycle.

Key Concepts:

  1. Experiment Tracking:

    • Experiment tracking involves keeping track of all relevant information from an ML experiment.
    • Relevant information can include data sources, preprocessing steps, hyperparameters, metrics, source code, environment, etc.
  2. ML Experiments vs. Experiment Runs:

    • An ML Experiment is the process of building an ML model, including all trials and variations.
    • An Experiment Run is each trial within the larger experiment, testing different models or hyperparameters.
  3. Artifacts and Metadata:

    • Artifacts: Any file associated with an experiment run, such as datasets, model weights, visualizations.
    • Metadata: Information related to the experiment, such as parameters, metrics, tags, and more.
  4. Importance of Experiment Tracking:

    • Reproducibility: Ensures experiments can be replicated.
    • Organization: Helps manage and find information efficiently, whether working alone or in a team.
    • Optimization: Facilitates the optimization of ML models by providing clear records of what has been tried and the results.

Challenges with Basic Experiment Tracking:

  • Manual methods, like spreadsheets, are error-prone and lack standardization.
  • Difficult to understand and collaborate on experiments due to inconsistent documentation.

Introduction to MLflow:

  • MLflow is an open-source platform for managing the ML lifecycle.
  • Four main modules:
    1. Tracking: Organizes experiments into runs and tracks parameters, metrics, and artifacts.
    2. Models: Manages and deploys ML models.
    3. Model Registry: Manages model versions.
    4. Projects: Manages ML projects (out of scope for this course).

Setting Up and Using MLflow:

  • MLflow can be installed as a Python package with pip install mlflow or conda install conda-forge::mlflow
  • Can be used locally or with a server for collaboration.
  • Organizes experiments into runs, tracking parameters, metrics, metadata, and artifacts.

Demonstration:

  • Launch MLflow UI with mlflow ui or mlflow ui --backend-store-uri sqlite:///mlflow.db
  • UI provides a way to explore experiments, runs, and models.
  • Experiments can be created and managed through the UI.
  • Model Registry feature requires a backend store (e.g., PostgreSQL, MySQL, SQLite).

Next Steps:

  • The next session will cover installing MLflow and integrating experiment tracking into Jupyter notebooks.

Summary:

  • Experiment tracking is crucial for reproducibility, organization, and optimization.
  • MLflow provides a structured and efficient way to manage ML experiments, track performance, and collaborate on ML projects.
  • Future lessons will include practical implementation of MLflow in your ML workflows.