[metal] Performance Improvements for bitmasked
#678
Labels
c++
C++ engineering related
enhancement
Make existing things or codebases better
mac
Mac OS X platform
listgen
will launch a group of threads whose size is equal to that of the total numbers of SNodes. This could be very inefficient. Try grid-stride loops to balance the load.struct_for
kernels.place
is abitmasked
, instead of appending the activebitmasked
elements intoListManager
, we might want to just loop through all the possible coordinates and check which are active directly (This is how LLVM backends implement it?). This is because the appending toListManager
is expensive, as it uses atomic operations.The text was updated successfully, but these errors were encountered: