Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnnxSlim support #1744

Open
inisis opened this issue Mar 5, 2024 · 0 comments
Open

OnnxSlim support #1744

inisis opened this issue Mar 5, 2024 · 0 comments
Labels
onnx Related to the ONNX export onnxruntime Related to ONNX Runtime

Comments

@inisis
Copy link

inisis commented Mar 5, 2024

Feature request

Hi, we have developed a tool called onnxslim, which can help slim exported onnx model.

pip install onnxslim

# bash
onnxslim raw_onnx_model slimmed_onnx_model --skip_fusion_patterns FusionGelu  # low onnxruntime version may not support Gelu.

# python
import onnx
from onnxslim import slim

onnx_model = "your_onnx_model.onnx"
slimmed_model = slim(onnx_model)
onnx.save(slimmed_model, "slimmed_onnx_model.onnx")

image

Motivation

I want to slim onnx model so we can achieve better performance, I have tested provided cases, and after onnxslim, we can achieve about 3% performance gain.

import time
import requests
from PIL import Image
from optimum.onnxruntime import ORTModelForImageClassification
from transformers import AutoFeatureExtractor

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

preprocessor = AutoFeatureExtractor.from_pretrained("optimum/vit-base-patch16-224")
model = ORTModelForImageClassification.from_pretrained("optimum/vit-base-patch16-224")
inputs = preprocessor(images=image, return_tensors="pt")

warmup_runs = 5
actual_runs = 100

for _ in range(warmup_runs):
    outputs = model(**inputs)

# Actual timing phase
start_time = time.time()
for _ in range(actual_runs):
    outputs = model(**inputs)
end_time = time.time()

# Calculate average time per run
total_time = end_time - start_time
average_time_per_run = total_time / actual_runs
print("Average time per run: {:.6f} seconds".format(average_time_per_run))

logits = outputs.logits

with slimmed model: Average time per run: 0.246237 seconds
without slimmed model: Average time per run: 0.253707 seconds

Your contribution

I can submit a pr, and help slim existed onnx models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
onnx Related to the ONNX export onnxruntime Related to ONNX Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants