Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sign CUDA Kernel #17293

Merged
merged 7 commits into from
Aug 29, 2023
Merged

Sign CUDA Kernel #17293

merged 7 commits into from
Aug 29, 2023

Conversation

baijumeswani
Copy link
Contributor

@baijumeswani baijumeswani commented Aug 25, 2023

l1_loss is defined as: mean(abs(y1 - y2))

If y = abs(x), dy/dx = sign(x).

In onnxruntime, Sign does not have a cuda kernel. As a result, the execution graph looks like: MemcpyToHost -> Sign -> MemcpyFromHost

image

This PR implements the Sign cuda kernel so as to avoid the memcpy.

@baijumeswani baijumeswani added the training issues related to ONNX Runtime training; typically submitted using template label Aug 25, 2023
@centwang
Copy link
Contributor

If the percentage of the kernel time in the profile result is minor, I actually think that adding CUDA kernel of Sign is much simpler as it only requires several lines change in the unary elementwise, and it also helps ORT to run inference or forward graph with Sign on CUDA in the future...

@baijumeswani baijumeswani force-pushed the baijumeswani/abs-grad branch from 2ea9aa9 to 9220980 Compare August 25, 2023 21:15
@baijumeswani
Copy link
Contributor Author

If the percentage of the kernel time in the profile result is minor, I actually think that adding CUDA kernel of Sign is much simpler as it only requires several lines change in the unary elementwise, and it also helps ORT to run inference or forward graph with Sign on CUDA in the future...

Makes sense. I was contemplating whether I should add the Sign cuda kernel or the AbsGrad cuda kernel initially.

Made the change now to add the Sign cuda kernel

@baijumeswani baijumeswani changed the title AbsGrad CPU and CUDA Kernels Sign CUDA Kernel Aug 25, 2023
centwang
centwang previously approved these changes Aug 28, 2023
hariharans29
hariharans29 previously approved these changes Aug 28, 2023
@baijumeswani baijumeswani dismissed stale reviews from hariharans29 and centwang via 17a58d8 August 28, 2023 17:21
@baijumeswani baijumeswani merged commit 5d2c573 into main Aug 29, 2023
@baijumeswani baijumeswani deleted the baijumeswani/abs-grad branch August 29, 2023 04:03
@baijumeswani
Copy link
Contributor Author

Thank you for the review @er3x3 @hariharans29

snnn added a commit that referenced this pull request Nov 2, 2023
Cherry-pick PRs: 
#18026 
#17912 
#17901 “2 lines added whitespace errors when cherry-picking"
#17293 
#17364 
#17505 
#17885

This PR contains all the cherry-picks for the patch release except:
1. The PRs marked with sdxl_llama
2. #17772 which has a merge conflict.

---------

Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
Co-authored-by: Baiju Meswani <[email protected]>
Co-authored-by: Kaz Nishimura <[email protected]>
Co-authored-by: Scott McKay <[email protected]>
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training issues related to ONNX Runtime training; typically submitted using template
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants