Skip to content

valohai/tensorflow-example

Repository files navigation

Valohai TensorFlow Examples

This repository serves as an example for the Valohai MLOps platform. It implements handwritten digit detection using TensorFlow, based on TensorFlow's example.

The project demonstrates a complete machine learning pipeline consisting of five distinct steps:

  1. Preprocess data: Prepare the input data for training.

  2. Train model: Train the TensorFlow model using the preprocessed data.

  3. Batch inference: Perform inference on a batch of data using the trained model.

  4. Compare predictions: Analyze and compare the predictions generated by the model.

  5. Online inference deployment: Deploy the trained model for online inference.

Within the Valohai platform, you can explore and test four different pipelines:

  1. Basic training pipeline: This pipeline covers the essential steps for training the model.
  2. Three parallel trainings pipeline: This pipeline demonstrates parallel training of the model using three different configurations.
  3. Three parallel trainings with deployment pipeline, including a special human approval block: This pipeline showcases parallel training with deployment, incorporating a unique human approval block for manual verification.
  4. Additional: 'Broken' pipeline: This pipeline highlights a distinctive feature of Valohai that allows successful nodes to be reused.

If you are just starting out, it is recommended to follow the learning path in the Valohai documentation. This learning path recreates the Train model step of this example.

Running on Valohai

Login to the Valohai app and create a new project.

Configure the repository:

Configure this repository as the project's repository, by following these steps:

  1. Go to your project's page.
  2. Navigate to the Settings tab.
  3. Under the Repository section, locate the URL field.
  4. Enter the URL of this repository.
  5. Click on the Save button to save the changes.

Now you are ready to run executions, tasks and pipelines.

Running Executions:

  1. Go to the Executions tab in your project.
  2. Create a new execution by selecting the predefined steps: batch-inference, compare-predictions, preprocess-dataset, train-model.
  3. Customize the execution parameters if needed.
  4. Start the execution to run the selected step.

Running Pipelines:

  1. Navigate to the Pipelines tab.
  2. Create a new pipeline by selecting one of the four available options. Note: To run the pipeline with deployment you should first create deployment with name 'deployment-test'.
  3. Configure the pipeline settings.
  4. Start the pipeline.

Running on Valohai with Terminal

Configure the repository:

To run your code on Valohai using the terminal, follow these steps:

  1. Install Valohai on your machine by running the following command:
pip install valohai-cli valohai-utils
  1. Log in to Valohai from the terminal using the command:
vh login
  1. Create a project for your Valohai workflow.

Start by creating a directory for your project:

mkdir valohai-tensorflow-example
cd valohai-tensorflow-example

Then, create the Valohai project:

vh project create
  1. Clone the repository to your local machine:
git clone https://github.com/valohai/tensorflow-example.git .

Congratulations! You have successfully cloned the repository, and you can now modify the code and run it using Valohai.

Running Executions:

To run individual steps, execute the following command:

vh execution run <step-name> --adhoc

For example, to run the preprocess-dataset step, use the command:

vh execution run preprocess-dataset --adhoc

Running Pipelines:

To run pipelines, use the following command:

vh pipeline run <pipeline-name> --adhoc

For example, to run the three-trainings-pipeline-w-deployment pipeline, use the command:

vh pipeline run three-trainings-pipeline-w-deployment --adhoc

These commands will execute your code and run it on Valohai's platform.

Running Locally

You can run all the steps of the pipeline locally. This requires Python 3.9 and specific packages, which you can install with:

pip install -r requirements.txt

The steps require different inputs to run, so you need to run them in order.

Preprocess data has all the required inputs defined as defaults and can be run with:

python preprocess_dataset.py

Train model requires the preprocessed dataset, but that is also defined as a default, so you can run:

python train_model.py

Batch inference requires both a model and some new data. The new data has default values, but the model needs to be provided, for example from an earlier train model run:

python batch_inference.py --model .valohai/outputs/{local_run_id}/train-model/model-{suffix}.h5

Compare predictions requires two or more batch inference results and optionally the corresponding models. We can run it for example like this:

python compare_predictions.py --predictions .valohai/outputs/{local_run_id}/batch-inference/predictions-{suffix}.json .valohai/outputs/{local_run_id}/batch-inference/predictions-{suffix}.json