DNN Bench is a library that lets you benchmark your deep learning models against various frameworks and backends, with a single command.
With DNN Bench you can answer questions like:
- to which hardware should I deploy my model?
- which backend should I use?
- should I apply an optimisation technique, e.g. quantisation, before I deploy it?
The goal is to make it easy for developers to choose the most optimal deployment configuration (optimization on/off, backend, hardware) for their particular use-cases.
Side note: Models are benchmarked within docker containers.
Performance of BERT-Squad and ResNet on c5a.4xlarge, an AWS EC2 CPU compute instance. It shows number of processed samples per second, where more is better.
See further analysis for more models benchmarked on different hardware.
PyTorch | TensorFlow | ONNX-Runtime | OpenVINO* | Nuphar* | CUDA* | TensorRT* | |
---|---|---|---|---|---|---|---|
CPU | ✅ | ✅ | ✅ | ✅ | ✅ | ||
GPU | ✅ | ✅ | ✅ | ✅ | |||
ARM | ✅ |
*Marked backends are executed within ONNX-Runtime framework.
./install_dependencies.sh cpu
Replace cpu
argument with gpu
for nvidia-docker.
- Install docker.
- Install nvidia-docker
- Add yourself to docker group
sudo usermod -aG docker $USER
to run docker commands without sudo.
You can use pre-compiled images from dockerhub.
They will be downloaded automatically when running ./bench_model.sh
Optional.
Prepare docker images for various deep learning backends locally.
./prepare_images.sh cpu
Replace cpu
argument with gpu
for gpu backends or arm
for arm backends.
Benchmark an onnx model against different backends:
./bench_model.sh path_to_model --repeat=100 --number=1 --warmup=10 --device=cpu \
--tf --onnxruntime --openvino --pytorch --nuphar
Possible backends:
--tf (with --device=cpu or gpu)
--onnxruntime (with --device=cpu or arm)
--openvino (with --device=cpu)
--pytorch (with --device=cpu or gpu)
--nuphar (with --device=cpu)
--ort-cuda (with --device=gpu)
--ort-tensorrt (with --device=gpu)
Additional Parameters:
--output OUTPUT Directory of benchmarking results. Default: ./results
--repeat REPEAT Benchmark repeats. Default: 1000
--number NUMBER Benchmark number. Default: 1
--warmup WARMUP Benchmark warmup repeats that are discarded. Default: 100
--device DEVICE Device backend: CPU or GPU or ARM. Default: CPU
--quantize Dynamic quantization in a corresponding backend.
Results are stored by default to ./results
directory. Each benchmarking result
is stored in a json format.
{
'model_path': '/models/efficientnet-lite4.onnx',
'output_path': '/results/efficientnet-lite4-onnxruntime-openvino.json',
'backend': 'onnxruntime',
'backend_meta': 'openvino',
'device': 'cpu',
'number': 1,
'repeat': 100,
'warmup': 10,
'size': 51946641,
'input_size': [[1, 224, 224, 3]],
'min': 0.038544699986232445,
'max': 0.05930669998633675,
'mean': 0.04293907555596282,
'std': 0.0039751552053260125,
'data': [0.04748649999964982,
0.05760759999975562, ... ]
}
- model_path: path to the input model
- output_path: path to the results file
- backend: deep learning backend used to produce the results
- backend_meta: special parameters used with the backend. Example: onnxruntime used with openvino.
- device: gpu, cpu, arm, etc. where the model was benchmarked.
- number: Number of inferences in a single experiment.
- repeat: Number of repeated experiments.
- warmup: Number of discarded experiments. Reasoning: inference might not reach its optimal performance in the first few runs.
- size: Size of the model in bytes.
- min: Minimum time of an experiment run.
- max: Maximum time of an experiment run.
- mean: Mean time of an experiment run.
- std: Standard deviation of an experiment run.
- data: All measurements of the experiment runs.
A simple plotting utility to generate quick plots is available in plot_results.py.
- Dependencies:
pip install seaborn matplotlib pandas
- Usage:
python vis/plot_results.py results_dir plots_dir
--quantize
flag not supported for--ort-cuda
,--ort-tensorrt
and--tf
- Current version supports onnx models only. To convert models from other frameworks
follow these examples. - The following docker images for CPU execution utilize only half of the CPUs on Linux
ec2 instances:
- onnxruntime with openvino,
- pytorch
- onnxruntime with nuphar utilizes total count of CPUs - 1 on Linux ec2 instances.
- If running tensorflow image fails due to onnx-tf conversion,
re-build the image locally:
docker build -f dockerfiles/Dockerfile.tf -t toriml/tensorflow:latest .
- If you have permission errors to run docker, add yourself to docker group
sudo usermod -aG docker $USER
and re-loginsu - $USER
.