Skip to content

A library that lets you easily increase efficiency of your deep learning models with no loss of accuracy.

License

Notifications You must be signed in to change notification settings

Talmaj/DNN-bench

Repository files navigation

DNN Bench

GitHub ToriML

DNN Bench is a library that lets you benchmark your deep learning models against various frameworks and backends, with a single command.

With DNN Bench you can answer questions like:

  • to which hardware should I deploy my model?
  • which backend should I use?
  • should I apply an optimisation technique, e.g. quantisation, before I deploy it?

The goal is to make it easy for developers to choose the most optimal deployment configuration (optimization on/off, backend, hardware) for their particular use-cases.

Side note: Models are benchmarked within docker containers.

Example

Performance of BERT-Squad and ResNet on c5a.4xlarge, an AWS EC2 CPU compute instance. It shows number of processed samples per second, where more is better.

Bert-CPU Resnet-CPU

See further analysis for more models benchmarked on different hardware.

Supported devices and backends

PyTorch TensorFlow ONNX-Runtime OpenVINO* Nuphar* CUDA* TensorRT*
CPU
GPU
ARM

*Marked backends are executed within ONNX-Runtime framework.

Installation

Dependencies

Ubuntu

./install_dependencies.sh cpu

Replace cpu argument with gpu for nvidia-docker.

Other

Deep learning backends

You can use pre-compiled images from dockerhub. They will be downloaded automatically when running ./bench_model.sh

Optional.
Prepare docker images for various deep learning backends locally.

./prepare_images.sh cpu

Replace cpu argument with gpu for gpu backends or arm for arm backends.

Usage

Benchmark an onnx model against different backends:

./bench_model.sh path_to_model --repeat=100 --number=1 --warmup=10 --device=cpu \
--tf --onnxruntime --openvino --pytorch --nuphar

Possible backends:

  --tf              (with --device=cpu or gpu)
  --onnxruntime     (with --device=cpu or arm)
  --openvino        (with --device=cpu)
  --pytorch         (with --device=cpu or gpu)
  --nuphar          (with --device=cpu)
  --ort-cuda        (with --device=gpu)
  --ort-tensorrt    (with --device=gpu)

Additional Parameters:

  --output   OUTPUT       Directory of benchmarking results. Default: ./results
  --repeat   REPEAT       Benchmark repeats. Default: 1000
  --number   NUMBER       Benchmark number. Default: 1
  --warmup   WARMUP       Benchmark warmup repeats that are discarded. Default: 100
  --device   DEVICE       Device backend: CPU or GPU or ARM. Default: CPU
  --quantize              Dynamic quantization in a corresponding backend.

Results

Results are stored by default to ./results directory. Each benchmarking result is stored in a json format.

{
   'model_path': '/models/efficientnet-lite4.onnx',
   'output_path': '/results/efficientnet-lite4-onnxruntime-openvino.json',
   'backend': 'onnxruntime',
   'backend_meta': 'openvino',
   'device': 'cpu',
   'number': 1,
   'repeat': 100,
   'warmup': 10,
   'size': 51946641,
   'input_size': [[1, 224, 224, 3]],
   'min': 0.038544699986232445,
   'max': 0.05930669998633675,
   'mean': 0.04293907555596282,
   'std': 0.0039751552053260125,
   'data': [0.04748649999964982,
            0.05760759999975562, ... ]
}
  • model_path: path to the input model
  • output_path: path to the results file
  • backend: deep learning backend used to produce the results
  • backend_meta: special parameters used with the backend. Example: onnxruntime used with openvino.
  • device: gpu, cpu, arm, etc. where the model was benchmarked.
  • number: Number of inferences in a single experiment.
  • repeat: Number of repeated experiments.
  • warmup: Number of discarded experiments. Reasoning: inference might not reach its optimal performance in the first few runs.
  • size: Size of the model in bytes.
  • min: Minimum time of an experiment run.
  • max: Maximum time of an experiment run.
  • mean: Mean time of an experiment run.
  • std: Standard deviation of an experiment run.
  • data: All measurements of the experiment runs.

Plotting

A simple plotting utility to generate quick plots is available in plot_results.py.

  • Dependencies:
    pip install seaborn matplotlib pandas
  • Usage:
    python vis/plot_results.py results_dir plots_dir

Limitations and known issues

  • --quantize flag not supported for --ort-cuda, --ort-tensorrt and --tf
  • Current version supports onnx models only. To convert models from other frameworks
    follow these examples.
  • The following docker images for CPU execution utilize only half of the CPUs on Linux ec2 instances:
    • onnxruntime with openvino,
    • pytorch
  • onnxruntime with nuphar utilizes total count of CPUs - 1 on Linux ec2 instances.

Troubleshoot

  • If running tensorflow image fails due to onnx-tf conversion, re-build the image locally: docker build -f dockerfiles/Dockerfile.tf -t toriml/tensorflow:latest .
  • If you have permission errors to run docker, add yourself to docker group sudo usermod -aG docker $USER and re-login su - $USER.

About

A library that lets you easily increase efficiency of your deep learning models with no loss of accuracy.

Topics

Resources

License

Stars

Watchers

Forks