[Feature Request] Fast prefix sum implementation #7235

RplusW · 2023-01-26T04:26:59Z

Concisely describe the proposed feature
Prefix Sum is used extensively for many particle simulation applications. Currently, the taichi implementation in taichi.algorithm is not extensively optimized compared to the Cupy implementation using CUB backend. I did a side by side comparison of cupy.cumsum with the PrefixSumExecutor and noticed that for size 10 million array, cupy takes 300us while taichi takes 550us. It would be beneficial to add a more optimized PrefixSum algorithm (potentially with allocation of output).

Additional comments
Here are the code I used for benchmarking.

import cupy as cp

test_array = cp.ones((10000000), dtype=cp.int32)

def test_cumsum(test_array):   
    out=cp.cumsum(test_array, axis=None, dtype=cp.int32)

print(benchmark(test_cumsum, (test_array,), n_repeat=100))

import taichi as ti
from taichi.algorithms._algorithms import PrefixSumExecutor

ti.init(arch=ti.gpu, kernel_profiler=True)

array_size = 10000000
array = ti.field(dtype=ti.i32, shape=array_size)
array.fill(1)
PrefixSum = PrefixSumExecutor(array_size)

for i in range(10000):
    PrefixSum.run(array)

ti.profiler.print_kernel_profiler_info()

The text was updated successfully, but these errors were encountered:

RplusW added the feature request Suggest an idea on this project label Jan 26, 2023

taichi-gardener added this to Taichi Lang Jan 26, 2023

github-project-automation bot moved this to Untriaged in Taichi Lang Jan 26, 2023

lin-hitonami assigned turbo0628 Feb 3, 2023

lin-hitonami moved this from Untriaged to Todo in Taichi Lang Feb 3, 2023

ailzhang moved this from Todo to Backlog in Taichi Lang Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Fast prefix sum implementation #7235

[Feature Request] Fast prefix sum implementation #7235

RplusW commented Jan 26, 2023 •

edited

Loading

[Feature Request] Fast prefix sum implementation #7235

[Feature Request] Fast prefix sum implementation #7235

Comments

RplusW commented Jan 26, 2023 • edited Loading

RplusW commented Jan 26, 2023 •

edited

Loading