-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Allow to customize cooperative group size (tile_size
) for static_map
#194
Comments
Probably a candidate for #110. We could also explore having dynamic CG sizes, e.g., |
@ttnghia I'm curious, what architecture did you run your benchmarks on? |
I'm running on RTX Quadro 6000, SM75. |
@sleeepyjack I'm guessing it's a difference of GDDR vs HBM. Larger tile_size is better on HBM vs GDDR. |
@jrhemstad I was thinking the same thing. A long time ago, I dreamed about having a compile time lookup table for choosing the optimal (default) CG size for a given architecture in WarpCore. Sounds wild, but hey, why not? |
That wouldn't be too hard. It would be similar to how CUB does its device specific tuning policies. |
Completed in the new implementation |
Currently,
static_map
internally sets a fixed numbertile_size = 4
. Suchtile_size
value is used when calling theinsert
orcontains
APIs. The valuetile_size = 4
is not an optimal one, and may cause performance regression on some (if not most) systems as I have tested myself. For example, settingtile_size = 2
would double the performance when running on my system.It would be great if we can have a way to specify
tile_size
upon constructing thestatic_map
object, similar to when we construct astatic_multimap
.The text was updated successfully, but these errors were encountered: