Kernel tuning on mobile GPU #18

bhack · 2016-12-03T10:30:28Z

Just to collect some feedbacks from Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation. /cc @CNugteren cause the benckmark on mobile involved CLBlast

bhack · 2016-12-03T10:38:51Z

/cc @gfursin if is interested in the specific topic

naibaf7 · 2016-12-03T15:17:19Z

@bhack
Thanks. Good reference paper. Yes better mobile GPU support is definitely planned, but I don't have a test platform at the moment. On Android phones it's really inconvenient...
As always, if someone can sponsor me the hardware, I'm going to try to optimize for it :)
Unfortunately, general convolution is a step harder to tune than GEMM. One point is that the parameter search space has discontinuities in it, so the simulated annealing process is more difficult to find correct large-scale optimas.

gfursin · 2016-12-03T17:46:01Z

Yeap, very nice paper! By the way, I agree with @naibaf7 that porting and tuning various libs on Android is a pain. That's why all my current effort is to provide an abstract (universal) workload auto-tuning/crowd-tuning platform where users could plugin in their workload (possibly including sub-tuner), define exploration strategy also as a plugin, and then be able to compile/run/tune them across different devices.
The progress is good - I finally managed to provide support to compile and run Caffe CPU version with all libs for Android or any x86/ARM Linux via CK and getting closer to compile and run OpenCL version (with the help of @psyhtest and other colleages). I also plan to add demos of crowd-tuning CLBlast around Q1 2017 (to show users how to add other libs and tuners while reusing CK-based compile and timing routines).
But I am just fighting all the time with continuous changes in all dependencies which are also often incompatible with mobile devices. Just a few days ago there was an update in gflags lib which broke compilation for Android :( . It's very frustrating. I wish frameworks were simpler and didn't have so many dependencies ...
Another good note from @naibaf7 is about discontinuities in parameter search space for DNN. That's true that using simulated annealing or genetic algorithms is more difficult/longer, but I hope that collaborative exploration of such sub-areas across different users via CK may speed up this process ... But it's a longer-term goal. In the mean time, I am back to hacking CK to stabilize universal workload tuning ;) ...

CNugteren · 2016-12-04T16:32:48Z

Interesting read, thanks for notifying me! I can see that their re-write system has some advantages over simple auto-tuning, however, their draw-back might be that they won't be able to express everything in their system if I understand it right. Can they handle for example the ld (leading dimension) and offset parameters of the GEMM routine? Or optimisations such as staggered indices or using barriers for locality? If those things can be nicely expressed then I think re-write rules might be a good alternative to auto-tuning.

Since I'm the author of CLBlast I'll also give my opinion about the poor performance of CLBlast on Mali. I believe the real reason is different from what the authors observe in the paper, I think it related to the ARM compiler not doing a specific optimization which CLBlast relies on. See this issue for more details. I'll contact the authors to discuss some details of their experiments 🙂

bhack · 2016-12-04T17:16:59Z

Interesting... /cc @michel-steuwer if he want to contribute to the thread.

bhack · 2016-12-04T17:31:58Z

I think this is the next step http://www.lift-project.org

bhack mentioned this issue Dec 3, 2016

Is libdnn support Mali gpu? #17

Open

bhack closed this as completed Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel tuning on mobile GPU #18

Kernel tuning on mobile GPU #18

bhack commented Dec 3, 2016

bhack commented Dec 3, 2016

naibaf7 commented Dec 3, 2016 •

edited

Loading

gfursin commented Dec 3, 2016 •

edited

Loading

CNugteren commented Dec 4, 2016

bhack commented Dec 4, 2016

bhack commented Dec 4, 2016

Kernel tuning on mobile GPU #18

Kernel tuning on mobile GPU #18

Comments

bhack commented Dec 3, 2016

bhack commented Dec 3, 2016

naibaf7 commented Dec 3, 2016 • edited Loading

gfursin commented Dec 3, 2016 • edited Loading

CNugteren commented Dec 4, 2016

bhack commented Dec 4, 2016

bhack commented Dec 4, 2016

naibaf7 commented Dec 3, 2016 •

edited

Loading

gfursin commented Dec 3, 2016 •

edited

Loading