You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Concisely describe the proposed feature to_numpy() is sometimes too slow now on GPUs. This is because to_numpy() calls tensor_to_ext_arr(), which doesn't have the information that we don't need the original value of the newly created numpy array. So we have an unnecessary copy from host to device here:
Describe the solution you'd like (if any)
Add a compiler hint to tensor_to_ext_arr() when called by to_numpy() (or to_torch()), that we will fully write the values in the external array. Then we can remove the unnecessary copy in codegen_cuda.cpp.
Additional comments
See also ti.loop_unique(covers=...), but that one only supports SNodes now.
Shall we add the new hint to the CHI IR or to the kernel?
The text was updated successfully, but these errors were encountered:
Sounds great. Note that GLSL has type qualifiers like in, out or inout, which is a more generic approach than hinting tensor_to_ext_arr(). We can also analyze each kernel itself to see if the kernel only reads the external array, or does both read and write to it.
Concisely describe the proposed feature
to_numpy()
is sometimes too slow now on GPUs. This is becauseto_numpy()
callstensor_to_ext_arr()
, which doesn't have the information that we don't need the original value of the newly created numpy array. So we have an unnecessary copy from host to device here:taichi/taichi/backends/cuda/codegen_cuda.cpp
Lines 71 to 72 in c824a8b
We can remove this CUDA memcpy to accelerate it.
Describe the solution you'd like (if any)
Add a compiler hint to
tensor_to_ext_arr()
when called byto_numpy()
(orto_torch()
), that we will fully write the values in the external array. Then we can remove the unnecessary copy in codegen_cuda.cpp.Additional comments
See also
ti.loop_unique(covers=...)
, but that one only supports SNodes now.Shall we add the new hint to the CHI IR or to the kernel?
The text was updated successfully, but these errors were encountered: