Automating Visual Inspections with AI on Red Hat OpenShift Data Science - Hands-on tutorial
GPU worker node are not mandatory, but recommended when you would like to train the model by yourself.
-
Redhatters can order the "NVIDIA GPU Operator Red Hat OpenShift Container Platform 4 Workshop". Please be aware of the costs and shutdown the service.
-
Upgrade OpenShift to 4.11.x
-
Install the Node Feature Discovery (NFD) Operator.
-
Install the NVIDIA GPU Operator.
-
Create the ClusterPolicy instance.
-
Verify the successful installation of the NVIDIA GPU Operator.
Follow "Installing OpenShift Data Science on OpenShift Container Platform"
Model Serving requires a S3 bucket with an ACCESS_KEY and SECRET_KEY. In case you don't have S3 already available (e.g. ODF or on AWS) you can deploy Minio on your OpenShift Cluster:
oc apply -f https://raw.githubusercontent.com/mamurak/os-mlops/master/manifests/minio/minio.yaml
- Minio is deployed to the project/namespace
minio
. - Launch the minio web UI (see Route) and create a bucket (e.g.
manu-vi
).
-
Log in to the OpenShift web console
-
Launch RHODS via the application launcher (nine-dots) ->
Red Hat OpenShift Data Science
-
Create a new Data Science project ->
Create data science project
.If you have your own OpenShift cluster, you can name the project 'manuela-visual-inspection'. If not add your initials. E.g. 'manu-vi-stb'. Don't choose to long names, because project and model server names are internally concatenated, which could lead into problems.
- Name:
manu-vi
- Resource name:
manu-vi
- ->
Create
.
- Name:
-
Create a data connection with your S3 configuration
- Data connections ->
Add Data connections
. - Name:
manu-vi
- AWS_ACCESS_KEY_ID:
minio
- AWS_SECRET_ACCESS_KEY:
minio123
- AWS_S3_ENDPOINT:
http://minio-service.minio.svc.cluster.local:9000
- AWS_DEFAULT_REGION: does not matter, is ignored
- AWS_S3_BUCKET:
manu-vi
- Data connections ->
-
Create new RHODS workbench
- Workbenches ->
Create workbench
. - Name:
manu-vi
- Image:
CUDA
(assuming you have a cluster with a Nvidia GPU) - Deployment size:
Small
- Number of GPUs: 1
- Cluster storage:
Create new cluster storage
- Data connection:
Use existing data connection
->manu-vi
- ->
Create workbench
.
- Workbenches ->
-
Note, in case the workbench does not start, please check if LimitRanges block the pod. Find the created project in the OpenShift console, navigate to Administration -> LimitRanges and delete the LimitRange that was auto created
-
Open the workbench and clone https://github.com/stefan-bergstein/manuela-visual-inspection.git (there are at least 4 ways to do this - find out the approach you like)
PyTorch is internally using shared memory (/dev/shm) to exchange data between its internal worker processes. However, default container engine configurations limit this memory to the bare minimum, which can make the process exhaust this memory and crash. The solution is to manually increase this memory by mounting a emptyDir volume or to run the model training without PyTorch workers (which will slowdown the training).
- Patch the Notebook as described here: README.md You might have to adapt the namespace/name to match your setup.
- If the workbench is not automatically restarted, stop and start your workbench in your Data Science Project.
- Navigate to
manuela-visual-inspection/ml/pytorch
and openManuela_Visual_Inspection_Yolov5_Model_Training.ipynb
- Explore or explain and run cells step by step
- Setup and test the Ultralytics Yolov5 toolkit
- Inspect training dataset (image and labels)
- Model training
- Model training can take ~30 minutes or more even with GPUs. You could jump to Model Serving, use a pre-trained model and come back later.
- Model validation
- Convert model to onnx format and upload it to S3
Note: the notebook's cells contain output messages from a previous successful run. This is so you could explain the demo without actually run anything (i.e. no GPUs required, etc...). But in order to run the demo yourself, you need to run every cell successfully once, even if the outputs might suggest is has already run.
In case you have to not had the time or resources to train the model by yourself, you can download a pre-trained manu-vi model and upload it to your S3 bucket.
- Open your workbench (with your manu-vi data connection)
- Navigate to
manuela-visual-inspection/ml/pytorch
and openUpload_pretrained_model.ipynb
- Run the notebook to upload the model
-
Create model server in your data science project
- Models and model servers ->
Configure server
- Number of model server replicas to deploy:
1
- Model server size:
Small
- Model route: ->
Check/Enable
'Make deployed models available through an external route' - Token authorization ->
Uncheck/Disable
'Require token authentication' - ->
Configure
- Models and model servers ->
-
Deploy the trained model ->
Deploy Model
- Model Name:
manu-vi
- Model framework:
onnx - 1
- Model location:
Existing data connection
- Name:
manu-vi
- Folder path:
manu-vi-best.onnx
- ->
Deploy
- Model Name:
-
Wait until Status is green / loaded
- Copy and save the inference URL
Show how an ML REST call could be integrated into your 'intelligent' Python application.
- Return to the workbench
- Navigate to
manuela-visual-inspection/ml/pytorch
and openManuela_Visual_Inspection_Yolov5_Infer_Rest.ipynb
- Study or explain and run cells step by step
- Please don´t forget to update the inferencing URL
- Demonstrate cool inferencing with RHODS :-)