Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Speedup
make_blobs
by up to 2x by fixing inefficient kernel launch …
…configuration (#1100) The kernel generates two elements per iteration and attempts to write the second element with an offset equal to the grid stride. However, the grid stride is currently computed to be greater than the length of the generated array, so this second value is never used. By using a grid stride of half the array size, we speed up the kernel by nearly 2x in some cases (see perf charts in the PR comments). _Note: this will effectively modify many test inputs, so be aware of that when comparing results prior to and following the change._ Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1100
- Loading branch information