This crate contains example projects showing how to convert and run models for common tasks across various modalities. See Example descriptions for a summary of what each example does.
Each example has a main
function with a comment above it describing the steps
to fetch the ONNX model, convert it to the format used by this library and run
the example.
The general steps to run an example are:
-
Download the ONNX model. These usually come from Hugging Face, the ONNX Model Zoo or pre-created ONNX models by the model authors.
-
Convert the ONNX model to this library's format using the
rten-convert
package:$ rten-convert <onnx_model> <output_model>
-
Run the example using:
$ cargo run -r --bin <example_name> <model_path> <...args>
Where
...args
refers to the example-specific arguments, such as input data. The syntax and flags for an individual example can be displayed using its--help
command:Note the
-r
flag to create a release build. This is required as the examples will run very slowly in debug builds.$ cargo run -r --bin <example_name> -- --help
Note the
--
before--help
. Without thiscargo
will print its own help info.
Some of the examples have reference implementations in Python using PyTorch and
Transformers. These are found in
src/{example_name}_reference.py
and enable comparison of RTen outputs with the
original models.
The examples have been chosen to cover common tasks and popular models.
- clip - Match images against text descriptions using CLIP
- imagenet - Classification of images using models trained on ImageNet. This example works with a wide variety of models, such as ResNet, MobileNet, ConvNeXt, ViT.
- deeplab - Semantic segmentation of images using DeepLabv3
- depth_anything - Monocular depth estimation using Depth Anything
- detr - Object detection using DETR
- distilvit - Image captioning using Mozilla's DistilViT
- nougat - Extract text from academic PDFs as Markdown using Nougat
- rmbg - Background removal using BRIA Background Removal
- segment_anything - Image segmentation using Segment Anything
- trocr - Recognize text using TrOCR
- yolo - Object detection using Ultralytics YOLO
- bert_qa - Extractive question answering using BERT-based models which have been fine-tuned on the SQuAD dataset
- gpt2 - Text generation using the GPT-2 language model.
- jina_similarity - Sentence similarity using vector embeddings of sentences
- qwen2_chat - Chatbot using Qwen2
- piper - Text-to-speech using Piper models
- silero - Speech detection using Silero VAD
- wav2vec2 - Speech recognition of .wav audio using wav2vec2
- whisper - Speech recognition of .wav audio using OpenAI's Whisper