Skip to content

boukepostma/example-project

Repository files navigation

New QuickStart ML Project

This is your new Kedro project configured according to QuickStart ML principles. Modify this README as you develop your project, for now you will find here some basic info that you need to get started. For more detailed assistance please refer to the Kedro documenation and QuickStart ML Blueprints.

Additionally to a blank Kedro template it features technological stack used in QuickStart ML approach, such as:

Apart from that, there are no pre-implemented nodes or pipelines here. For blueprints showing different machine learning use cases, please go to the main QuickStart ML Blueprints repo and feel free to take as much as you need from our examples.

Rules and guidelines

In order to get the best out of the template:

  • Don't remove any lines from the .gitignore file we provide
  • Make sure your results can be reproduced by following a data engineering convention
  • Don't commit data to your repository
  • Don't commit any credentials or your local configuration to your repository. Keep all your credentials and local configuration in conf/local/

Setting up the project

Below there are short instructions on how to get the environment for your new project up and running. Detailed version with some remarks and specific cases described are available in QuickStart ML Blueprints documentation.

Setting up cloud infrastructure (Terraform)

  1. Create service principal with contributor role and write down the appid, password and tenant
az ad sp create-for-rbac --role="Contributor" --scopes="/subscriptions/b3427a92-a6bb-4354-9822-98b9a61bbe58"
  1. Use these values to fill in arm_client_id, arm_secret_id and arm_tenant_id in secret.tfvars respectively
  2. Initialize terraform
terraform init
  1. With terraform as current directory, apply terraform script
terraform apply --var-file secret.tfvars

Local Setup using VSCode devcontainers (recommended)

This approach facilitates use of VSCode devcontainers. It is the easiest way to set up the development environment.

Prerequisites:

Setting up:

  1. Clone this repository and open it in a container.
  2. You're good to go!

Local Manual Setup

The project is using pyenv Python version management. It lets you easily install and switch between multiple versions of Python. To install pyenv, follow these steps for your operating system.

To install a specific Python version use this command:

pyenv install 3.8.16
pyenv shell 3.8.16

Virtual environment

It is recommended to create a virtual environment in your project:

python -m venv venv
source ./venv/bin/activate

Installing dependencies with Poetry

To install libraries declared in the pyproject.toml you need to have Poetry installed. Install it from here and then run this command:

poetry install

To add and install dependencies with:

# dependencies
poetry add <package_name>

# dev dependencies
poetry add -D <package_name>

How to run Kedro

You can run your Kedro project with:

kedro run

To run a specific pipeline:

kedro run -p "<PIPELINE_NAME>"

Kedro plugins

  • visualizes Kedro pipelines in an informative way
  • to run, kedro viz --autoreload inside project's directory
  • this will run a server on http://127.0.0.1:4141
  • lightweight integration of MLflow inside Kedro projects
  • configuration can be specified inside conf/<ENV>/mlflow.yml file
  • by default, experiments are saved inside mlruns local directory
  • to see all the local experiments, run kedro mlflow ui

Configuring kedro-mlflow:

  1. Login and configure workspace
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>
  1. You can get tracking URI using az ml workspace command
az ml workspace show --query mlflow_tracking_uri
  1. Place this URI in mlflow.yml under server.mlflow_tracking_uri

TODO: Things to automate in Terraform:

Keyvault access policy storage container with project name (container name with - not _ setup az login inside container ask for subscription in starter to fill in README and setup.sh script

Setup flow:

  1. build devcontainer
  2. .devcontainer/setup.sh
  3. azure is logged in
  4. azure subscription activated
  5. azure ml settings activated for mlflow

Need from starter: ask for subscription & location. auto-fill terraform: use naming convention for workspace and resource group Auto-align workspace/rg name in .devcontainer/setup.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published