You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a network which runs fast enough for CPU deployment.
I would like to get somewhat accurate measurements of the runtime of individual kernel calls within the network. However, when I enable profiling in the session options, the runtime of the entire network increases significantly.
For example, if I run the code below with the provided model without enabling profiling, then the network runs on average in 810us. However, if I enable profiling, the average runtime of the overall network increases to 2330us. Of course, the exact numbers will differ between runs and platforms but the runtime increase is always persistent.
The increase in runtime also appears when I try to look solely at the sum of the durations of the individual kernel calls in the output JSON file. Obviously profiling should increase the runtime, but I wouldn't expect it to increase so drastically. I'm even a little hesitant to draw any conclusions from the traced output. Is there a better way to measure the runtime of individual kernel calls without introducing such a significant delay?
To reproduce
import onnxruntime as ort
import time
import numpy as np
RUNS = 100
# Without profiling
sess_options = ort.SessionOptions()
ort_sess = ort.InferenceSession('my_model.onnx', providers=['CPUExecutionProvider'], sess_options=sess_options)
inputs = {}
metadata = ort_sess.get_inputs()
for node in metadata:
inputs[node.name] = np.zeros(node.shape).astype(np.float32)
# Warmup
ort_sess.run(None, inputs)
t1 = time.time()
for i in range(RUNS):
output_list = ort_sess.run(None, inputs)
t2 = time.time()
t_no_prof = (t2 - t1) / RUNS
# With profiling
sess_options.enable_profiling = True
ort_sess = ort.InferenceSession('my_model.onnx', providers=['CPUExecutionProvider'], sess_options=sess_options)
# Warmup
ort_sess.run(None, inputs)
t1 = time.time()
for i in range(RUNS):
output_list = ort_sess.run(None, inputs)
t2 = time.time()
t_prof = (t2 - t1) / RUNS
print(f"Time without profiling: {t_no_prof:.05f}s")
print(f"Time with profiling: {t_prof:.05f}s")
This is completely expected. Mixing measuring individual kernels latency with overall latency doesn't quite make sense as a meaningful thing to do. The overhead of the former will always invalidate the latter.
Describe the issue
I have a network which runs fast enough for CPU deployment.
I would like to get somewhat accurate measurements of the runtime of individual kernel calls within the network. However, when I enable profiling in the session options, the runtime of the entire network increases significantly.
For example, if I run the code below with the provided model without enabling profiling, then the network runs on average in 810us. However, if I enable profiling, the average runtime of the overall network increases to 2330us. Of course, the exact numbers will differ between runs and platforms but the runtime increase is always persistent.
The increase in runtime also appears when I try to look solely at the sum of the durations of the individual kernel calls in the output JSON file. Obviously profiling should increase the runtime, but I wouldn't expect it to increase so drastically. I'm even a little hesitant to draw any conclusions from the traced output. Is there a better way to measure the runtime of individual kernel calls without introducing such a significant delay?
To reproduce
Urgency
No response
Platform
Linux
OS Version
6.5.0-15-generic #15~22.04.1-Ubuntu
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.2
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
my_model.onnx.zip
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: