Training a Custom YOLO v4 Darknet Model on Azure and Running with Azure Live Video Analytics on IoT Edge
- Train a custom YOLO v4 model
- TensorFlow Lite conversion for fast inferencing
- Azure Live Video Analytics on IoT Edge
- Links/References
- SSH client or command line tool - for Windows try putty.exe
- SCP client or command line tool - for Windows try pscp.exe
- Azure Subscription - a Free Trial available for new customers.
- Familiarity with Unix commands - e.g.
vim
,nano
,wget
,curl
, etc. - Visual Object Tagging Tool - VoTT
- Set up an N-series Virtual Machine by using the michhar/darknet-azure-vm-ubuntu-18.04 project VM setup.
- SSH into the Ubuntu DSVM w/ username and password (of if used ssh key, use that)
- If this is a corporate subscription, may need to delete an inbound port rule under “Networking” in the Azure Portal (delete Cleanuptool-Deny-103)
- Test the Darknet executable by running the following.
- Get the YOLO v4 tiny weights
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
- Run a test on a static image from repository. Run the following command and then give the path to a test image (look in the
data
folder for sample images e.g.data/giraffe.jpg
). Thecoco.data
gives the links to other necessary files. Theyolov4-tiny.cfg
specifies the architecture and settings for tiny YOLO v4.
./darknet detector test ./cfg/coco.data ./cfg/yolov4-tiny.cfg ./yolov4-tiny.weights
- Check
predictions.jpg
for the bounding boxes overlaid on the image. You may "shell copy" (SCP) this file down to your machine to view it or alternatively remote desktop into the machine with a program like X2Go.
- Label some test data locally (aim for about 500-1000 bounding boxes drawn, noting that less will result is less accurate results for those classes)
- Label data with VoTT and export as
json
- Convert the
json
files to YOLO.txt
files by running the following script (vott2.0_to_yolo.py
). In this script, a change must be made. Update line 13 (LABELS = {'helmet': 0, 'no_helmet': 1}
) to reflect your classes. Running this script should result in one.txt
file per.json
VoTT annotation file. The.txt
files are the YOLO format thatdarknet
can use. Run this conversion script as follows, for example.
python vott2.0_to_yolo.py --annot-folder path_to_folder_with_json_files --out-folder new_folder_for_txt_annotations
- Darknet will need a specific folder structure. Structure the data folder as follows where in the
data/img
folder the image is placed along with the.txt
annotation file.
data/ img/ image1.jpg image1.txt image2.jpg image2.txt ... train.txt valid.txt obj.data obj.names
obj.data
is a general file to directdarknet
to the other data-related files and model folder. It looks simliar to the following with necessary changes toclasses
for your scenario.
classes = 2 train = build/darknet/x64/data/train.txt valid = build/darknet/x64/data/valid.txt names = build/darknet/x64/data/obj.names backup = backup/
obj.names
contains the class names, one per line.train.txt
andvalid.txt
should look as follows, for example. Note,train.txt
is the training images and is a different subset from the smaller list found invalid.txt
. As a general rule, 5-10% of the image paths should be placed invalid.txt
. These should be randomly distributed.
build/darknet/x64/data/img/image1.jpg build/darknet/x64/data/img/image5.jpg ...
- These instructions may also be found in How to train on your own data.
- Label data with VoTT and export as
- Upload data to the DSVM as follows.
- Zip the
data
folder (zip -r data.zip data
if using the command line) and copy (scp data.zip <username>@<public IP or DNS name>:~/darknet/build/darknet/x64/
- usepscp.exe
on Windows) the data up to VM (may need to delete networking rule Cleanuptool-Deny-103 again if this gives a timeout error). Note thedata.zip
is placed in thedarknet/build/darknet/x64
folder. This is wheredarknet
will look for the data. - Log in to the DSVM with SSH
- On the DSVM, unzip the compressed
data.zip
found, now, in the folderdarknet/build/darknet/x64
.
- Zip the
- Read through How to train on your own data from the Darknet repo, mainly on updating the
.cfg
file. We will be using the tiny archicture of YOLO v4 so will calculate anchors and update the config accordingly (thecfg/yolov4-tiny-custom.cfg
). The following summarizes the changes for reference, but please refer to the Darknet repo for more information/clarification.- Calculate anchor boxes (especially important if you have very big or very small objects on average). We use
-num_of_clusters 6
because of the tiny architecture configuration. IMPORTANT: make note of these anchors (darknet creates a file for you calledanchors.txt
) for the section on converting the model to TFLite so you will need them later on../darknet detector calc_anchors build/darknet/x64/data/obj.data -num_of_clusters 6 -width 416 -height 416`
- Configure the cfg file (you will see a file called
cfg/yolov4-tiny-custom.cfg
). Open the file with an editor likevim
ornano
. Modify the following to your scenario. For example, this header (net
block):[net] # Testing #batch=1 #subdivisions=1 # Training batch=16 subdivisions=2 ... learning_rate=0.00261 burn_in=1000 max_batches = 4000 policy=steps steps=3200,3600 ...
- Info for the
yolo
blocks (in each YOLO block or just before - there are two blocks in the tiny architecture):- Class number – change to your number of classes (each YOLO block)
- Filters – (5 + num_classes)*3 (neural net layer before each YOLO block)
- Anchors – these are also known as anchor boxes (each YOLO block) - use the calculated anchors from the previous step.
- Info for the
- Calculate anchor boxes (especially important if you have very big or very small objects on average). We use
- Train the model with the following two commands.
- This will download the base model weights:
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29
- This will run the training experiment (where
-clear
means it will start training from the base model just downloaded rather than already present weights in thebackup
folder; thebackup
folder is where the weights will show up after training).
./darknet detector train build/darknet/x64/data/obj.data cfg/yolov4-tiny-custom.cfg yolov4-tiny.conv.29 -map -dont_show -clear
-
If using the michhar/darknet-azure-vm-ubuntu-18.04 GitHub VM setup as instructed above, the hunglc007/tensorflow-yolov4-tflite project will have already been cloned and the correct Python environment set up with TensorFlow 2.
-
You can use an editor like VSCode or any other text editor will work for the following.
- Change
coco.names
toobj.names
incore/config.py
- Update the anchors on line 17 of
core/config.py
to match the anchor sizes used to train the model, e.g.:
__C.YOLO.ANCHORS_TINY = [ 81, 27, 28, 80, 58, 51, 76,100, 109, 83, 95,246]
- Place
obj.names
file from your Darknet project in thedata/classes
folder.
- Change
-
Convert from Darknet to TensorFlow Lite (with quantization) with the two steps as follows. Use the weights from your Darknet experiment (as found in the
~/darknet/backup/
folder).- In the
tensorflow-yolov4-tflite
folder activate the Python environment with:source env/bin/activate
- Save the model to TensorFlow protobuf intermediate format.
python save_model.py --weights yolov4-tiny-custom_best.weights --output ./checkpoints/yolov4-tiny-416-tflite2 --input_size 416 --model yolov4 --framework tflite --tiny
- Convert the protobuf model weights to TFLite format with quantization.
python convert_tflite.py --weights ./checkpoints/yolov4-tiny-416-tflite2 --output ./checkpoints/yolov4-tiny-416-fp16.tflite --quantize_mode float16
- In the
-
[Optional] Run the video test on remote desktop (recommended to use X2Go client for Windows or Mac with your VM user and IP address) to check that everything is ok. Once X2Go has connected and you have a remote desktop instance running, open a terminal window ("terminal emulator" program).
- Navigate to the project folder.
cd tensorflow-yolov4-tflite
- Run the video demo.
python detectvideo.py --framework tflite --weights ./checkpoints/yolov4-tiny-416-fp16.tflite --size 416 --tiny --model yolov4 --video <name of your video file> --output <new name for output result video> --score 0.4
- You can then navigate to the output video file and play it with VLC in the remote desktop environment or download the video to play locally.
- If you wish to start from this point (do not have a trained model) please refer to the releases (v0.1) for the
.tflite
model,obj.names
file, anchors (in notes) and sample video (.mkv
file) to create your RTSP server for simulation: https://github.com/michhar/yolov4-darknet-notes/releases/tag/v0.1.
On your development machine you will need the following.
git
command line tool or client such as GitHub Desktop- SCP client or command line tool - for Windows try pscp.exe
- A sample video in
.mkv
format (only some audio formats are supported so you may see an error regarding audio format - you may wish to strip audio in this case for the simulator) - Your
.tflite
model, anchors andobj.names
files - Docker - such as Docker Desktop
- VSCode and Azure IoT Tools extension (search "Azure IoT Tools" in extensions withing VSCode)
- .NET Core 3.1 SDK - download
- Azure CLI - download and install
curl
command line tool - download curl
On Azure:
- Have gone through the this Live Video Analytics quickstart and the Live Video Analytics cloud to device sample console app to set up the necessary Azure Resources and learn how to use VSCode to see the results with .NET app.
- OR have the following Azure resources provisioned:
- Create a custom RTSP simulator with your video for inferencing with LVA with live555 media server
- Clone the official Live Video Analytics GitHub repo:
git clone https://github.com/Azure/live-video-analytics.git
- Open the repository folder in VSCode to make it easier to modify files
- Go to the RTSP simulator instructions:
cd utilities/rtspsim-live555/
- Replace line 21 with your
.mkv
file (can use the ffmpeg command line tool to convert from other formats like .mp4
to.mkv
) - Copy your
.mkv
video file to the same folder as Dockerfile - Build the docker image according to the Readme
- Push the docker image to your ACR according to the Readme
- Login to ACR:
az acr login --name myregistry
- Use docker to push:
docker push myregistry.azurecr.io/my-rtsp-sim:latest
- Login to ACR:
- Clone the official Live Video Analytics GitHub repo:
- To prepare the ML model wrapper code, from the base of the live-video-analytics folder:
- Go to the Docker container building instructions:
cd utilities/video-analysis/yolov4-tflite-tiny
- Copy your
.tflite
model into theapp
folder - Perform the following changes to files for your custom scenario:
- In
app/core/config.py
:- Update the
__C.YOLO.ANCHORS_TINY
line to be the same as training Darknet - Update the
__C.YOLO.CLASSES
to be./data/classes/obj.names
- Update the
- In
app/data/classes
folder:- Add your file called
obj.names
(with your class names, one per line)
- Add your file called
- In
app/yolov4-tf-tiny-app.py
- Update line 31 to use the name of your model
- Update line 45 to be
obj.names
instead ofcoco.names
- In the
Dockerfile
- We do not need to pull down the yolov4 base tflite model so delete line 19
- In
- Follow instructions here to build, test, and push to ACR the docker image:
- Go to the Docker container building instructions:
- To run the sample app and view your inference results:
- Clone the official Live Video Analytics CSharp sample app:
git clone https://github.com/Azure-Samples/live-video-analytics-iot-edge-csharp.git
- In the
src/edge
folder, updateyolov3.template.json
as follows.- Rename to
yolov4.template.json
- Update (or ensure this is the case) the
runtime
at the beginning of the file looks like:"runtime": { "type": "docker", "settings": { "minDockerVersion": "v1.25", "loggingOptions": "", "registryCredentials": { "$CONTAINER_REGISTRY_USERNAME_myacr": { "username": "$CONTAINER_REGISTRY_USERNAME_myacr", "password": "$CONTAINER_REGISTRY_PASSWORD_myacr", "address": "$CONTAINER_REGISTRY_USERNAME_myacr.azurecr.io" } } } }
- This section will ensure the deployment can find your custom
rtspsim
andyolov4
images in your ACR.
- This section will ensure the deployment can find your custom
- Change the
yolov3
name toyolov4
as in the following modules section (the image location is an example) pointing the yolov4 module to the correct image location in your ACR."yolov4": { "version": "1.0", "type": "docker", "status": "running", "restartPolicy": "always", "settings": { "image": "myacr.azurecr.io/my-awesome-custom-yolov4:latest", "createOptions": {} } }
- For
rtspsim
module ensure the image points to your image in ACR (the image location is an example) and ensure thecreateOptions
look as follows:"rtspsim": { "version": "1.0", "type": "docker", "status": "running", "restartPolicy": "always", "settings": { "image": "myacr.azurecr.io/my-rtsp-sim:latest", "createOptions": { "PortBindings": { "554/tcp": [ { "HostPort": "5001" } ] } } } }
- Also, in the
rtspsim
modulecreateOptions
make sure to delete the folder bindings, so delete any section like:"HostConfig": { "Binds": [ "$INPUT_VIDEO_FOLDER_ON_DEVICE:/live/mediaServer/media" ] }
- This will ensure that LVA looks in the
rtspsim
module for the video rather than on the IoT Edge device.
- This will ensure that LVA looks in the
- Rename to
- Make the appropriate changes to the
.env
file (this should be located in thesrc/edge
folder:- Update the
CONTAINER_REGISTRY_USERNAME_myacr
andCONTAINER_REGISTRY_PASSWORD_myacr
- Recall the
.env
file (you can modify in VSCode) should have the following format (fill in the missing parts for your Azure resources):SUBSCRIPTION_ID= RESOURCE_GROUP= AMS_ACCOUNT= IOTHUB_CONNECTION_STRING= AAD_TENANT_ID= AAD_SERVICE_PRINCIPAL_ID= AAD_SERVICE_PRINCIPAL_SECRET= INPUT_VIDEO_FOLDER_ON_DEVICE="/live/mediaServer/media" OUTPUT_VIDEO_FOLDER_ON_DEVICE="/var/media" APPDATA_FOLDER_ON_DEVICE="/var/lib/azuremediaservices" CONTAINER_REGISTRY_USERNAME_myacr= CONTAINER_REGISTRY_PASSWORD_myacr=
- When you create the manifest template file in VSCode it will use these values to create the actual deployment manifest file.
- Update the
- In the
src/cloud-to-device-console-app
folder, make the appropriate changes to theoperations.json
.- In the
"opName": "GraphTopologySet"
, update thetopologyUrl
to be the http extension topology as follows.
{ "opName": "GraphTopologySet", "opParams": { "topologyUrl": "https://raw.githubusercontent.com/Azure/live-video-analytics/master/MediaGraph/topologies/httpExtension/topology.json" } }
- In the
"opName": "GraphInstanceSet"
, update thertspUrl
value to have your video file name (heremy_video.mkv
) andinferencingUrl
with"value": "http://yolov4/score"
, as in:
{ "opName": "GraphInstanceSet", "opParams": { "name": "Sample-Graph-1", "properties": { "topologyName" : "InferencingWithHttpExtension", "description": "Sample graph description", "parameters": [ { "name": "rtspUrl", "value": "rtsp://rtspsim:554/media/my_video.mkv" }, { "name": "rtspUserName", "value": "testuser" }, { "name": "rtspPassword", "value": "testpassword" }, { "name": "imageEncoding", "value": "jpeg" }, { "name": "inferencingUrl", "value": "http://yolov4/score" } ] } } },
- In the
- Make the appropriate changes to the
appsettings.json
, a file that you may need to create if you haven't done the quickstarts. It should look as follows and be located in thesrc/cloud-to-device-console-app
folder.{ "IoThubConnectionString" : "connection_string_of_iothub", "deviceId" : "name_of_your_edge_device_in_iot_hub", "moduleId" : "lvaEdge" }
- The IoT Hub connection string may be found in the Azure Portal under your IoT Hub -> Settings -> Shared access policies blade -> iothubowner Policy -> Connection string—primary key
- Build the app with
dotnet build
from thesrc/cloud-to-device-console-app
folder. - Run the app with
dotnet run
- Clone the official Live Video Analytics CSharp sample app:
- Darknet Azure DSVM
- Visual Object Tagging Tool (VoTT)
- Darknet on GitHub
- Python virtual environments
- Conversion of Darknet model to TFLite on GitHub
- Create a movie simulator docker container with a test video for LVA
- TensorFlow Lite Darknet Python AI container sample for LVA
- Run LVA sample app locally