Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IVF-PQ: tweak launch configuration #1069

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion cpp/include/raft/spatial/knn/detail/ivf_pq_search.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -1173,7 +1173,13 @@ struct ivfpq_compute_similarity {
// If we don't have enough repeating probes (locality_hint < tmp.blocks_per_sm),
// the locality is not going to improve with increasing the number of blocks per SM.
// Hence, the only metric here is the occupancy.
select_it = tmp.occupancy > cur.occupancy;
bool improves_occupancy = tmp.occupancy > cur.occupancy;
// Otherwise, the performance still improves with a smaller block size,
// given there is enough work to do
bool improves_parallelism =
tmp.occupancy == cur.occupancy &&
7u * tmp.blocks_per_sm * dev_props.multiProcessorCount <= n_blocks;
select_it = improves_occupancy || improves_parallelism;
} else {
// If we don't use shared memory for the lookup table, increasing the number of blocks
// is very taxing on the global memory usage.
Expand Down