[Performance] Resize node shows huge performance drop on Windows #23430

MatteoPagliani · 2025-01-20T09:37:37Z

Describe the issue

Hi!

We observed a significant performance regression in the Resize operator when using CPUExecutionProvider on Windows. On Linux the behavior is the one expected. In particular, when opening the profile trace we realized that it is the round operation that shows the biggest performance drop between the two OS.

The mean latency of the Resize node over 50 runs after warmup is ~6ms on Windows and ~0.1ms on Linux.

The same performance drop ratio appears disabling profiling and enabling all optimizations with GraphOptimizationLevel.ORT_ENABLE_ALL.

Model graph:

Windows profile:

Linux profile:

To reproduce

Here is the code used to reproduce the issue:

import torch
import torchvision.transforms as T
import onnxruntime as ort
import numpy as np
import time

class ResizeModel(torch.nn.Module):
    def __init__(self, size):
        super(ResizeModel, self).__init__()
        self.resize = T.Resize(size,
                               interpolation=T.InterpolationMode.BILINEAR,
                               antialias=False)

    def forward(self, x):
        return self.resize(x)


resize_model = ResizeModel(size=224)

dummy_input = torch.randint(0, 255, (1, 3, 390, 388), dtype=torch.uint8)

onnx_path = "resize_model.onnx"
resize_model.eval()
with torch.no_grad():
    torch.onnx.export(
        resize_model,
        (dummy_input,),
        onnx_path,
        input_names=["input"],
        output_names=["output"],
        opset_version=18
    )
print(f"Model exported to {onnx_path}")

print(ort.get_available_providers())
session_options = ort.SessionOptions()
session_options.enable_profiling = True
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
session = ort.InferenceSession(onnx_path,
                               session_options,
                               providers=["CPUExecutionProvider"])
onnx_input = {session.get_inputs()[0].name: dummy_input.numpy()}

# Warm-up runs
for _ in range(5):
    session.run(None, onnx_input)

# Benchmarking
latencies = []
for _ in range(50):
    start_time = time.time()
    session.run(None, onnx_input)
    latencies.append((time.time() - start_time) * 1000)  # Convert to milliseconds

# Compute mean latency
mean_latency = np.mean(latencies)
print(f"Mean latency over 50 runs: {mean_latency:.2f} ms")

Urgency

This slowdown severely impacts workloads that rely heavily on the Resize operator, particularly in image processing tasks.

Platform

Windows

OS Version

10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.20.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

MatteoPagliani added the performance issues related to performance regressions label Jan 20, 2025

github-actions bot added the platform:windows issues related to the Windows platform label Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Resize node shows huge performance drop on Windows #23430

[Performance] Resize node shows huge performance drop on Windows #23430

MatteoPagliani commented Jan 20, 2025 •

edited

Loading

[Performance] Resize node shows huge performance drop on Windows #23430

[Performance] Resize node shows huge performance drop on Windows #23430

Comments

MatteoPagliani commented Jan 20, 2025 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

MatteoPagliani commented Jan 20, 2025 •

edited

Loading