[Performance] InferenceSession with enable_profiling runs much slower #20238

MalekItani · 2024-04-09T01:53:14Z

Describe the issue

I have a network which runs fast enough for CPU deployment.
I would like to get somewhat accurate measurements of the runtime of individual kernel calls within the network. However, when I enable profiling in the session options, the runtime of the entire network increases significantly.

For example, if I run the code below with the provided model without enabling profiling, then the network runs on average in 810us. However, if I enable profiling, the average runtime of the overall network increases to 2330us. Of course, the exact numbers will differ between runs and platforms but the runtime increase is always persistent.

The increase in runtime also appears when I try to look solely at the sum of the durations of the individual kernel calls in the output JSON file. Obviously profiling should increase the runtime, but I wouldn't expect it to increase so drastically. I'm even a little hesitant to draw any conclusions from the traced output. Is there a better way to measure the runtime of individual kernel calls without introducing such a significant delay?

To reproduce

import onnxruntime as ort
import time
import numpy as np

RUNS = 100

# Without profiling
sess_options = ort.SessionOptions()
ort_sess = ort.InferenceSession('my_model.onnx', providers=['CPUExecutionProvider'], sess_options=sess_options)

inputs = {}

metadata = ort_sess.get_inputs()
for node in metadata:
    inputs[node.name] = np.zeros(node.shape).astype(np.float32)

# Warmup
ort_sess.run(None, inputs)

t1 = time.time()
for i in range(RUNS):
    output_list = ort_sess.run(None, inputs)
t2 = time.time()

t_no_prof = (t2 - t1) / RUNS

# With profiling
sess_options.enable_profiling = True
ort_sess = ort.InferenceSession('my_model.onnx', providers=['CPUExecutionProvider'], sess_options=sess_options)

# Warmup
ort_sess.run(None, inputs)

t1 = time.time()
for i in range(RUNS):
    output_list = ort_sess.run(None, inputs)
t2 = time.time()

t_prof = (t2 - t1) / RUNS


print(f"Time without profiling: {t_no_prof:.05f}s")
print(f"Time with profiling: {t_prof:.05f}s")

Urgency

No response

Platform

Linux

OS Version

6.5.0-15-generic #15~22.04.1-Ubuntu

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.2

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

my_model.onnx.zip

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

skottmckay · 2024-04-15T21:54:03Z

This is completely expected. Mixing measuring individual kernels latency with overall latency doesn't quite make sense as a meaningful thing to do. The overhead of the former will always invalidate the latter.

Measure each separately.

skottmckay closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] InferenceSession with enable_profiling runs much slower #20238

[Performance] InferenceSession with enable_profiling runs much slower #20238

MalekItani commented Apr 9, 2024 •

edited

Loading

skottmckay commented Apr 15, 2024

[Performance] InferenceSession with enable_profiling runs much slower #20238

[Performance] InferenceSession with enable_profiling runs much slower #20238

Comments

MalekItani commented Apr 9, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

skottmckay commented Apr 15, 2024

MalekItani commented Apr 9, 2024 •

edited

Loading