Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel tuning on mobile GPU #18

Closed
bhack opened this issue Dec 3, 2016 · 6 comments
Closed

Kernel tuning on mobile GPU #18

bhack opened this issue Dec 3, 2016 · 6 comments

Comments

@bhack
Copy link

bhack commented Dec 3, 2016

Just to collect some feedbacks from Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation. /cc @CNugteren cause the benckmark on mobile involved CLBlast

@bhack
Copy link
Author

bhack commented Dec 3, 2016

/cc @gfursin if is interested in the specific topic

@naibaf7
Copy link
Owner

naibaf7 commented Dec 3, 2016

@bhack
Thanks. Good reference paper. Yes better mobile GPU support is definitely planned, but I don't have a test platform at the moment. On Android phones it's really inconvenient...
As always, if someone can sponsor me the hardware, I'm going to try to optimize for it :)
Unfortunately, general convolution is a step harder to tune than GEMM. One point is that the parameter search space has discontinuities in it, so the simulated annealing process is more difficult to find correct large-scale optimas.

@gfursin
Copy link
Contributor

gfursin commented Dec 3, 2016

Yeap, very nice paper! By the way, I agree with @naibaf7 that porting and tuning various libs on Android is a pain. That's why all my current effort is to provide an abstract (universal) workload auto-tuning/crowd-tuning platform where users could plugin in their workload (possibly including sub-tuner), define exploration strategy also as a plugin, and then be able to compile/run/tune them across different devices.
The progress is good - I finally managed to provide support to compile and run Caffe CPU version with all libs for Android or any x86/ARM Linux via CK and getting closer to compile and run OpenCL version (with the help of @psyhtest and other colleages). I also plan to add demos of crowd-tuning CLBlast around Q1 2017 (to show users how to add other libs and tuners while reusing CK-based compile and timing routines).
But I am just fighting all the time with continuous changes in all dependencies which are also often incompatible with mobile devices. Just a few days ago there was an update in gflags lib which broke compilation for Android :( . It's very frustrating. I wish frameworks were simpler and didn't have so many dependencies ...
Another good note from @naibaf7 is about discontinuities in parameter search space for DNN. That's true that using simulated annealing or genetic algorithms is more difficult/longer, but I hope that collaborative exploration of such sub-areas across different users via CK may speed up this process ... But it's a longer-term goal. In the mean time, I am back to hacking CK to stabilize universal workload tuning ;) ...

@CNugteren
Copy link

Interesting read, thanks for notifying me! I can see that their re-write system has some advantages over simple auto-tuning, however, their draw-back might be that they won't be able to express everything in their system if I understand it right. Can they handle for example the ld (leading dimension) and offset parameters of the GEMM routine? Or optimisations such as staggered indices or using barriers for locality? If those things can be nicely expressed then I think re-write rules might be a good alternative to auto-tuning.

Since I'm the author of CLBlast I'll also give my opinion about the poor performance of CLBlast on Mali. I believe the real reason is different from what the authors observe in the paper, I think it related to the ARM compiler not doing a specific optimization which CLBlast relies on. See this issue for more details. I'll contact the authors to discuss some details of their experiments 🙂

@bhack
Copy link
Author

bhack commented Dec 4, 2016

Interesting... /cc @michel-steuwer if he want to contribute to the thread.

@bhack
Copy link
Author

bhack commented Dec 4, 2016

I think this is the next step http://www.lift-project.org

@bhack bhack closed this as completed Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants