-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added wrapped C cuda code and runable examples #1
Conversation
Sorry for the previous chaos, I thought these parts will not be publish as part of the package. The following changes have been made:
|
Hi @maleadt, I am mentoring an Open Source Promotion Plan student to implement Tropical GEMM on GPUs. Regarding the recent update in GemmKernels.jl: JuliaGPU/GemmKernels.jl#101, I was suggesting him to try the GemmKernels.jl to make the implementation compatible with Julia CUDA ecosystem. However from the above benchmark, we can see its performance is not as good as the 600 line C code. We might need your help to decide which way to go is technically more feasible:
Also, @ArrogantGao found NOTE: All the benchmarks and implementations are included in this repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes look great, well done!
.vscode/settings.json
Outdated
@@ -0,0 +1 @@ | |||
{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vscode configuration files should not be commited.
@@ -0,0 +1,627 @@ | |||
// This CUDA code is modified based on github repo https://github.com/Yinghan-Li/YHs_Sample, which is under GPL 3.0 License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy, the GPL3 license, that is sexy. If we decide to keep this version in our code base, we have to include GPL3 license.
To “propagate” a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well.
I would recommend doing so. An all-Julia implementation is always preferable, for so many reasons: support for different datatypes, easier to tune using metaprogramming instead of the hard-coded 128x128x8 here, easier for other people to contribute to, etc. The code generated by GemmKernels.jl is generally pretty good, so it should be possible to compare the generated PTX code of both implementations, and/or use NSight Compute to compare executions. Maybe it's something simple, like GemmKernels.jl not using |
Remove the .vscode file and changed the license to GPL 3.0 (indeed, I also like that better). |
@maleadt Thank you first your prompt reply. @ArrogantGao Let us do some profiling and get some understanding about the performance issues. Let me merge the PR first, and move the discussion to: #2 , we can update the profiling result and generated ptx code there. |
No description provided.