You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
It seems #12380 causes significant performance regression in SpMV. It causes about 3 times slow down on p3.16x. The main reason is that the PR causes a small number of omp threads to perform computation.
Here is the minimal code for reproducing the bug. It seems the problem occurs only when a model is initialized with multiple GPUs.
Found the root cause of the issue: After the PR: #12380 , omp_thread_max_ is mutated in set_reserve_cores. This means for each gpu worker the omp_thread_max_ will keep dropping. For 8 GPU workers, it drops till it it is 1. After this, the dot operator execution internally calls GetRecommendedOMPThreadCount which return omp_thread_max_ which is 1. Thus the dot operator executes on a single thread. For now, reverting the PR to the old behavior is a good option. We should also try to understand more on cause of the segfault which was the reason for the PR #12380 and come up with a different fix.
It seems #12380 causes significant performance regression in SpMV. It causes about 3 times slow down on p3.16x. The main reason is that the PR causes a small number of omp threads to perform computation.
Here is the minimal code for reproducing the bug. It seems the problem occurs only when a model is initialized with multiple GPUs.
use the code below to run the code.
The csr file can be downloaded from
The text was updated successfully, but these errors were encountered: