-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance is poor when onnxruntime C++ run in intel cpu #12489
Comments
Onnxruntime session would never have the first cold run exhibit the same performance. You would always need to have a couple of runs after the session is first created. After you stop the activity, CPU caches grow cold, but recover quickly. Do you have a real time scenario where incoming requests depend on the user activity? We have work to do in this area, but originally Onnxruntime has been optimized for continuous processing so any suggestions would not provide desired results at this time. A few things to try out depending on your model.
Ort::SessionOptions sessionOptions;
sessionOptions.DisableCpuMemArena();
|
1 sessionOptions.DisableCpuMemArena(); the three action I have try it, sorry that performance is same as before. |
Please provide the full code to reproduce and show how you are measuring performance. As you say you have two onnxruntime sessions it's not clear how/when you are creating those sessions. |
In the initial cycle, the time consumption is relatively small, and the later time is very much, such as:
|
It would be best to measure the ORT performance separately with no thread pools, and without the inline call to GetInputNames(). That way you're just measuring the cost of the Send one warmup query to each inference session, and measure performance for the following calls. Also not clear what Timer is. Is that a high resolution timer or not? https://en.cppreference.com/w/cpp/chrono/high_resolution_clock/now would be preferable. |
In the long running time, the memory and CPU are not changed too much, which is basically the same as before. |
yes , i used is high_resolution_clock::now() |
i use the ORT performance separately with no thread pools ,the same time as before @skottmckay |
reply: can you give me more suggestions? thank you! |
by gdb: |
i have two onnxruntime session running at intel cpu :
(1) at first total time is 200ms,
(2) when test many times later, speed is 10s.
(3) when nothing to do several min later, speed is 200ms again.
why change so much, thanks!
(1) try multhread option
(2) try session_options.AddConfigEntry("session.set_denormal_as_zero", "1");
System information
To Reproduce
(1) at first total time is 200ms,
(2) when test many times later, speed is 10s.
(3) when nothing to do several min later, speed is 200ms again.
Expected behavior
first and end time cost should same.
Screenshots
If applicable, add screenshots to help explain your problem.
The text was updated successfully, but these errors were encountered: