Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

Closed
xumingkuan opened this issue May 15, 2021 · 2 comments
Closed

[Perf] Avoid unnecessary cuMemcpy in to_numpy() #2344

xumingkuan opened this issue May 15, 2021 · 2 comments
Labels
feature request Suggest an idea on this project

Comments

@xumingkuan
Copy link
Contributor

xumingkuan commented May 15, 2021

Concisely describe the proposed feature
to_numpy() is sometimes too slow now on GPUs. This is because to_numpy() calls tensor_to_ext_arr(), which doesn't have the information that we don't need the original value of the newly created numpy array. So we have an unnecessary copy from host to device here:

CUDADriver::get_instance().memcpy_host_to_device(
(void *)device_buffers[i], host_buffers[i], args[i].size);

We can remove this CUDA memcpy to accelerate it.

Describe the solution you'd like (if any)
Add a compiler hint to tensor_to_ext_arr() when called by to_numpy() (or to_torch()), that we will fully write the values in the external array. Then we can remove the unnecessary copy in codegen_cuda.cpp.

Additional comments
See also ti.loop_unique(covers=...), but that one only supports SNodes now.

Shall we add the new hint to the CHI IR or to the kernel?

@xumingkuan xumingkuan added the feature request Suggest an idea on this project label May 15, 2021
@k-ye
Copy link
Member

k-ye commented May 15, 2021

Sounds great. Note that GLSL has type qualifiers like in, out or inout, which is a more generic approach than hinting tensor_to_ext_arr(). We can also analyze each kernel itself to see if the kernel only reads the external array, or does both read and write to it.

This was referenced May 28, 2021
@bobcao3
Copy link
Collaborator

bobcao3 commented Apr 14, 2022

Merging with #4048

@bobcao3 bobcao3 closed this as completed Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Suggest an idea on this project
Projects
None yet
Development

No branches or pull requests

3 participants