-
Notifications
You must be signed in to change notification settings - Fork 915
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Load balance optimization for contiguous_split (#9755)
The existing `contiguous_split` implementation was vulnerable to situations where `number of columns N * number of splits M` was < the number of SMs on the gpu. This PR implements a postprocessing step which attempts to distribute the amount of bytes to be copied as evenly as possible across all available SMs. PR has been updated to repartition using a constant chunk size of 1 MB. This yields better results than the initial approach. Before/after benchmarks for some particularly degenerate cases (T4) ``` Before (4 partitions) 4GB, 4 columns, no splits 43.3 ms 43.3 ms 8 bytes_per_second=46.1738G/s After 4GB, 4 columns, no splits 10.1 ms 10.1 ms 8 bytes_per_second=198.642G/s ``` ``` Before (2 partitions) 1GB, 1 column + validity, no splits 114 ms 114 ms 8 bytes_per_second=17.5212G/s After 1GB, 1 column + validity, no splits 10.5 ms 10.6 ms 8 bytes_per_second=189.784G/s ``` Authors: - https://github.com/nvdbaranec Approvers: - Jake Hemstad (https://github.com/jrhemstad) - Vyas Ramasubramani (https://github.com/vyasr) - Elias Stehle (https://github.com/elstehle) - Mike Wilson (https://github.com/hyperbolic2346) URL: #9755
- Loading branch information
1 parent
dbe65f1
commit c07fdab
Showing
3 changed files
with
277 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.