[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

xumingkuan · 2021-05-15T11:33:53Z

Concisely describe the proposed feature
to_numpy() is sometimes too slow now on GPUs. This is because to_numpy() calls tensor_to_ext_arr(), which doesn't have the information that we don't need the original value of the newly created numpy array. So we have an unnecessary copy from host to device here:

taichi/taichi/backends/cuda/codegen_cuda.cpp

Lines 71 to 72 in c824a8b

    
           CUDADriver::get_instance().memcpy_host_to_device( 
        
               (void *)device_buffers[i], host_buffers[i], args[i].size);

We can remove this CUDA memcpy to accelerate it.

Describe the solution you'd like (if any)
Add a compiler hint to tensor_to_ext_arr() when called by to_numpy() (or to_torch()), that we will fully write the values in the external array. Then we can remove the unnecessary copy in codegen_cuda.cpp.

Additional comments
See also ti.loop_unique(covers=...), but that one only supports SNodes now.

Shall we add the new hint to the CHI IR or to the kernel?

The text was updated successfully, but these errors were encountered:

k-ye · 2021-05-15T13:29:25Z

Sounds great. Note that GLSL has type qualifiers like in, out or inout, which is a more generic approach than hinting tensor_to_ext_arr(). We can also analyze each kernel itself to see if the kernel only reads the external array, or does both read and write to it.

bobcao3 · 2022-04-14T23:05:46Z

Merging with #4048

xumingkuan added the feature request Suggest an idea on this project label May 15, 2021

This was referenced May 28, 2021

Accessing 3D ti.field is slow #2374

Closed

One-Year Roadmap #2398

Closed

bobcao3 closed this as completed Apr 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

xumingkuan commented May 15, 2021 •

edited

Loading

k-ye commented May 15, 2021

bobcao3 commented Apr 14, 2022

[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

Comments

xumingkuan commented May 15, 2021 • edited Loading

k-ye commented May 15, 2021

bobcao3 commented Apr 14, 2022

xumingkuan commented May 15, 2021 •

edited

Loading