-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel tuning on mobile GPU #18
Comments
/cc @gfursin if is interested in the specific topic |
@bhack |
Yeap, very nice paper! By the way, I agree with @naibaf7 that porting and tuning various libs on Android is a pain. That's why all my current effort is to provide an abstract (universal) workload auto-tuning/crowd-tuning platform where users could plugin in their workload (possibly including sub-tuner), define exploration strategy also as a plugin, and then be able to compile/run/tune them across different devices. |
Interesting read, thanks for notifying me! I can see that their re-write system has some advantages over simple auto-tuning, however, their draw-back might be that they won't be able to express everything in their system if I understand it right. Can they handle for example the ld (leading dimension) and offset parameters of the GEMM routine? Or optimisations such as staggered indices or using barriers for locality? If those things can be nicely expressed then I think re-write rules might be a good alternative to auto-tuning. Since I'm the author of CLBlast I'll also give my opinion about the poor performance of CLBlast on Mali. I believe the real reason is different from what the authors observe in the paper, I think it related to the ARM compiler not doing a specific optimization which CLBlast relies on. See this issue for more details. I'll contact the authors to discuss some details of their experiments 🙂 |
Interesting... /cc @michel-steuwer if he want to contribute to the thread. |
I think this is the next step http://www.lift-project.org |
Just to collect some feedbacks from Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation. /cc @CNugteren cause the benckmark on mobile involved CLBlast
The text was updated successfully, but these errors were encountered: