-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sign CUDA Kernel #17293
Sign CUDA Kernel #17293
Conversation
If the percentage of the kernel time in the profile result is minor, I actually think that adding CUDA kernel of Sign is much simpler as it only requires several lines change in the unary elementwise, and it also helps ORT to run inference or forward graph with Sign on CUDA in the future... |
2ea9aa9
to
9220980
Compare
…baijumeswani/abs-grad
Makes sense. I was contemplating whether I should add the Sign cuda kernel or the AbsGrad cuda kernel initially. Made the change now to add the Sign cuda kernel |
…baijumeswani/abs-grad
…baijumeswani/abs-grad
17a58d8
Thank you for the review @er3x3 @hariharans29 |
Cherry-pick PRs: #18026 #17912 #17901 “2 lines added whitespace errors when cherry-picking" #17293 #17364 #17505 #17885 This PR contains all the cherry-picks for the patch release except: 1. The PRs marked with sdxl_llama 2. #17772 which has a merge conflict. --------- Co-authored-by: Chi Lo <[email protected]> Co-authored-by: Chi Lo <[email protected]> Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Baiju Meswani <[email protected]> Co-authored-by: Kaz Nishimura <[email protected]> Co-authored-by: Scott McKay <[email protected]>
l1_loss is defined as:
mean(abs(y1 - y2))
If y = abs(x), dy/dx = sign(x).
In onnxruntime,
Sign
does not have a cuda kernel. As a result, the execution graph looks like:MemcpyToHost -> Sign -> MemcpyFromHost
This PR implements the
Sign
cuda kernel so as to avoid the memcpy.