-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare Inference Perf BTW CPU and MKLDNN #10651
Comments
ResNet Inference Results
Server: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz, 2 sockets, 6 cores per socket Details:BatchSize: 64 Repeat: 100 MKLDNN multi-threads-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.20932 0.009643 0.062216 0.0177841
thread0::init_program 1 2.68691 2.68691 2.68691 2.68691
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.299036 0.002371 0.004543 0.00299036
thread0::conv2d 3300 9000.44 0.195592 30.1177 2.72741
thread0::elementwise_add 4900 4070.66 0.008136 20.9259 0.830748
thread0::relu 3100 606.868 0.074293 4.75193 0.195764
thread0::pool2d 100 20.8408 0.123946 1.13076 0.208408
thread0::mul 100 7.87092 0.072069 0.109052 0.0787092
thread0::softmax 100 8.86674 0.042954 1.37156 0.0886674
thread0::fetch 100 0.731889 0.00641 0.017074 0.00731889
thread0::run_inference 100 13828.2 115.068 402.65 138.282 CPU multi-threads-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.14652 0.009442 0.053964 0.0168606
thread0::init_program 1 2.61936 2.61936 2.61936 2.61936
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.322211 0.00269 0.008291 0.00322211
thread0::conv2d 3300 51572.6 1.13987 225.607 15.6281
thread0::elementwise_add 4900 4071.9 0.006965 11.3786 0.831
thread0::relu 3100 590.259 0.073982 4.52276 0.190406
thread0::pool2d 100 31.1838 0.306259 0.401829 0.311838
thread0::mul 100 3.02121 0.026938 0.076059 0.0302121
thread0::softmax 100 25.8391 0.254502 0.295943 0.258391
thread0::fetch 100 0.680716 0.005706 0.017241 0.00680716
thread0::run_inference 100 56405.6 454.744 1020.46 564.056 MKLDNN single thread-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.11259 0.009476 0.0491 0.0163616
thread0::init_program 1 2.58213 2.58213 2.58213 2.58213
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.308051 0.002513 0.004302 0.00308051
thread0::conv2d 3300 67372.5 1.09394 37.2228 20.4159
thread0::elementwise_add 4900 3785.25 0.00826 4.89141 0.772501
thread0::relu 3100 578.755 0.073902 0.583639 0.186695
thread0::pool2d 100 53.1591 0.518617 0.572609 0.531591
thread0::mul 100 8.33622 0.076171 0.103348 0.0833622
thread0::softmax 100 17.0244 0.167629 0.179503 0.170244
thread0::fetch 100 0.713337 0.006472 0.013599 0.00713337
thread0::run_inference 100 71924.9 711.922 762.468 719.249 CPU single thread-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.1284 0.009774 0.04971 0.0165941
thread0::init_program 1 2.63472 2.63472 2.63472 2.63472
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.304634 0.002511 0.00463 0.00304634
thread0::conv2d 3300 66386.6 1.1244 35.5913 20.1172
thread0::elementwise_add 4900 3774.94 0.006815 5.1618 0.770395
thread0::relu 3100 581.059 0.07387 0.830908 0.187438
thread0::pool2d 100 31.0233 0.306008 0.362504 0.310233
thread0::mul 100 2.73551 0.025045 0.036044 0.0273551
thread0::softmax 100 26.2478 0.258148 0.299739 0.262478
thread0::fetch 100 0.653839 0.005754 0.015972 0.00653839
thread0::run_inference 100 70905 700.366 750.701 709.05 How to reproduce# 1\. Compile
git clone https://github.com/tensor-tang/Paddle paddle
cd paddle
git checkout compare
mkdir build && cd build
cmake .. -DWITH_GPU=OFF -DWITH_TESTING=ON -DCMAKE_INSTALL_PREFIX=./tmp
make -j `nproc`
make install
# �2\. Run Python test, this will save ResNet model
make test ARGS="-R test_image_classification -V"
# 3\. Run the Inference test
cd ..
./build/paddle/fluid/inference/tests/book/test_inference_image_classification_resnet --dirname=./build/python/paddle/fluid/tests/book/image_classification_resnet.inference.model --batch_size=64 --repeat=100 This will give the result of MKLDNN multi-threads, with single threads please try: taskset -c 0 ./build/paddle/fluid/inference/tests/book/test_inference_image_classification_resnet --dirname=./build/python/paddle/fluid/tests/book/image_classification_resnet.inference.model --batch_size=64 --repeat=100 As for CPU performance please change the The changesNext moveWill give a PR to change the |
Could you give a PR URL, which we can easily know the changes between you and develop branch? |
As for batch size 1:
MKLDNN multi-threads-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.03365 0.009594 0.033669 0.0152007
thread0::init_program 1 2.51944 2.51944 2.51944 2.51944
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.319115 0.00236 0.00419 0.00319115
thread0::conv2d 3300 957.289 0.128773 2.30153 0.290088
thread0::elementwise_add 4900 98.6889 0.006231 0.487361 0.0201406
thread0::relu 3100 19.9029 0.004581 0.082508 0.00642029
thread0::pool2d 100 13.2406 0.082838 1.40292 0.132406
thread0::mul 100 6.07581 0.051283 0.127585 0.0607581
thread0::softmax 100 6.89751 0.030691 0.678422 0.0689751
thread0::fetch 100 0.661207 0.004258 0.018983 0.00661207
thread0::run_inference 100 1187.95 9.49811 20.9113 11.8795 MKLDNN single thread-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.01823 0.00948 0.032522 0.014974
thread0::init_program 1 2.5322 2.5322 2.5322 2.5322
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.210431 0.001891 0.007563 0.00210431
thread0::conv2d 3300 1417.37 0.088962 1.31571 0.429505
thread0::elementwise_add 4900 80.7512 0.00611 0.691986 0.0164798
thread0::relu 3100 18.3715 0.004308 0.025278 0.00592629
thread0::pool2d 100 6.167 0.059295 0.080151 0.06167
thread0::mul 100 4.34947 0.042111 0.051461 0.0434947
thread0::softmax 100 1.72373 0.016533 0.022925 0.0172373
thread0::fetch 100 0.334084 0.003117 0.010444 0.00334084
thread0::run_inference 100 1588.87 15.7445 17.2864 15.8887 CPU single thread-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.02967 0.009555 0.032814 0.0151422
thread0::init_program 1 2.5184 2.5184 2.5184 2.5184
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.186704 0.001724 0.002616 0.00186704
thread0::conv2d 3300 1085.46 0.033286 0.968292 0.328928
thread0::elementwise_add 4900 75.329 0.004157 0.046556 0.0153733
thread0::relu 3100 17.7532 0.004074 0.01435 0.00572683
thread0::pool2d 100 1.14296 0.010944 0.016306 0.0114296
thread0::mul 100 0.762121 0.00725 0.011288 0.00762121
thread0::softmax 100 0.94436 0.009064 0.014038 0.0094436
thread0::fetch 100 0.252252 0.002306 0.00582 0.00252252
thread0::run_inference 100 1236.44 12.2918 13.2701 12.3644 CPU multi-threads-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::load 68 1.05451 0.009667 0.034995 0.0155075
thread0::init_program 1 2.57237 2.57237 2.57237 2.57237
-------------------------> Profiling Report <-------------------------
Place: CPU
Time unit: ms
Sorted by event first end time in descending order in the same thread
Event Calls Total Min. Max. Ave.
thread0::feed 100 0.195629 0.001808 0.00283 0.00195629
thread0::conv2d 3300 855.584 0.034638 4.23663 0.259268
thread0::elementwise_add 4900 78.6017 0.004243 0.049165 0.0160412
thread0::relu 3100 18.6085 0.004336 0.013074 0.00600273
thread0::pool2d 100 1.18143 0.01134 0.015879 0.0118143
thread0::mul 100 0.813435 0.00767 0.012044 0.00813435
thread0::softmax 100 0.952499 0.009162 0.013097 0.00952499
thread0::fetch 100 0.260175 0.002331 0.005984 0.00260175
thread0::run_inference 100 1013.59 8.99383 17.3052 10.1359 |
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! |
Give a quick comparison on test_inference_image_classification to have a baseline, at least.
Based on dfdcb7e
The text was updated successfully, but these errors were encountered: