-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenCl compiling issue #1571
Comments
Try using CMake instead, it is much better at finding libraries and even then it can also be better manually configured in cases when it doesn't find something at first. |
how to run in CLI with cmake? |
Thanks for your response. It appears to have compiled, but now I can't run ./main as it says no command found. Is there anyone that can assist me with compiling so that I can use ./main? |
With CMake |
Lovely, thank you for the direction. I can run ./main from the bin subfolder. It appears clblast does not have a system_info label like openBlas does (llama.cpp shows BLAS=1 when compiled with openBlas), so I'll try and test another way to see if my GPU is engaged. To clarify, clblast is an alternative to openblas, is that right? I assume I can't run both openBlas and Clblast at the same time, but maybe I'm missing something. |
It seems like it was not compiled in, then. It should show which platform and device it uses on start up and |
Thanks again for the information. I am trying to compile using cmake . -DLLAMA_CLBLAST=ON
Neither make nor cmake find the library, so I'm still uncertain how to actually point llama.cpp to my libraries in /data/data/com.termux/files/usr/include/CL Edit: to clarify, editing line in cMakeCache.txt,
To
And then trying cmake . -DLLAMA_CLBLAST=ON gives me this:
|
You can try to use cd build
rm -r * # restart configuration just in case
CMAKE_PREFIX_PATH=/data/data/com.termux/files/usr cmake .. -DLLAMA_CLBLAST=ON I don't really know how Termux works though. |
I'll mess around with it tonight, and let you know how it goes tomorrow. Thanks for the cmake_prefix_path idea. |
OK, got Termux running in Docker. First install some packages: pkg update
pkg upgrade
apt install clang cmake cmake-curses-gui opencl-headers ocl-icd Install CLBlast: cd
git clone https://github.com/CNugteren/CLBlast.git
cd CLBlast
cmake -B build \
-DBUILD_SHARED_LIBS=OFF \
-DTUNERS=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/data/data/com.termux/files/usr
cd build
make -j8
make install Build llama.cpp: cd
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp/
cmake -B build -DLLAMA_CLBLAST=ON
cd build
make -j8 |
This is fantasic, @SlyEcho . I genuinely appreciate. I'm stuck during install CLBLAST. I run
And receive
I tried continuing with,
And am stuck on:
Now of course llama.cpp is saying CLBlast not found. I'm confused as to what is exactly causing the problem that I can't make install. It feels like we're very close though, so thanks again for coming this far! Please let me know if there's anything I can do to force this make install. |
It is trying to install into I made a mistake, it should be You can reconfigure with: cmake .. -DCMAKE_INSTALL_PREFIX=/data/data/com.termux/files/usr
make install If that path is not allowed either you can install in some home folder and then point llama.cpp to it with the |
Thank you. This worked for me. I'm saving these posts for myself to learn from. Llama.cpp found clblast, and I'm able to build it. Now, I'm getting an error running ./main, so I might reference it in a new issue, but here's the message,
To clarify, I am able to build, and run llama.cpp using cmake, but with clblast enabled then I'm getting this a huge error in ./main Again, thanks to you for helping me get it compiled at all! |
The messages are impossible to read because the CL program doesn't have line breaks, but here are the errors:
It just seems like this device doesn't support llama.cpp, maybe it only works with float32 numbers? @0cc4m, what do you think? |
Thanks for cleaning the error message. I'm confused because I use llama.cpp everyday, so it's definitely supported. Openblas works as expected. Perhaps it's just clblast that isn't supported? Which is still odd because running clpeak shows,
|
OpenBLAS runs on the CPU. OpenCL runs on the GPU.
That's it. llama.cpp uses |
I understand now, so my device with openCL is currently incompatible. That sucks, but I'm happy to know for sure. :) Edit: sincerely! I would've spent weeks trying to figure that out by myself, so learning it can't be done in 24hours helps me a lot. |
@SlyEcho @JackJollimore Half precision support isn't required. Otherwise no Nvidia GPU would work at all. |
Thank you for clarifying. |
Kinda interesting you use clvk. Did you install that yourself or does it come with the phone? |
It's a package that's available through termux repository, and my device has a vulkan chip so I installed it. Should I try again without it? Edit: trying without it: I uninstalled clvk, then rebuilt using Slyechos instructions and CNugteren/CLBlast.git Here's my clinfo after removing clvk:
Clpeak:
And then of course ./main
|
I'm gonna see if I can get this clvk working on my machine. |
They are using something called Clspv to compile CL kernels to Vulkan SPIR-V. This is what it supports: OpenCL C 1.2 Language on Vulkan |
It's very experimental, I didn't get it working on my desktop GPU and llama.cpp, some kind of LLVM error. |
I wouldn't even know where to begin with such a thing, but if there's anything I can do to try, or whatever then please let me know. I can still run llama.cpp without it, so for me: any progress in this direction is a bonus. |
I don't think it's going to work with this CL driver for a long time, it's experimental. Maybe when we get a Vulkan version of a WebGPU version, we can run on more devices. |
They do claim CLBlast support. Maybe clvk is a way for Nvidia GPUs to run FP16 on OpenCL. |
It's possible it may work with just CLBlast as it was in the earlier commits, when the CPU dequantized and converted to |
here's make -j8 for 7296c96
I also tried fb62f92, which successfully built with clBlast, but then ./main
|
It's the same errors again. Looks like q5_0, q5_1 and q8_0 are not supported for some reason. maybe if you remove that part it could work? |
Thanks for your response. I'm trying to understand, but I'm not that savvy. I'm fine with removing parts to test and see if we can get this to function, but I need more specific directions as to what I need to do. I didn't know it was possible to remove q5_0, q5_1 and q8_0 from the build. Edit; OpenCL is installed, and llama.cpp now compiles with clBlast though it's incompatible. |
@JackJollimore Have you checked if your phone has native OpenCL support? I know mine does, you just have to compile clinfo and other OpenCL tools manually instead of using the termux packages. |
It never occurred to me to try it like that, so I'll try it and let you know how it goes. |
Thanks again for that. My device does natively support OpenCL. I manually built clinfo, and here's the details:
I'm trying to run llama.cpp that's compiled with CLBlast enabled, and here's the error from ./main:
Some kind of error obtaining platform? I dunno what it's trying to say. |
Did you compile CLBlast manually as well? I remember some trouble linking it all together, but it did work in the end. |
Yes, I compiled CLBlast manually. I restarted the process because I had some other package from termux installed too(ocl-icd). Now I can't compile llama.cpp
To provide more context, when I use my file manager, and view system/vendor/lib64 then libOpenCL.so is available. In termux: I navigate to system/vendor/lib64 and libOpenCL.so isn't there. It looks like llama.cpp is looking in some other place (/data/data/com.termux/files/usr/lib/ instead of /system/vendor/lib64) for libOpenCL.so I tried(failed) to link llama.cpp with export LD_LIBRARY_PATH=/system/vendor/lib64:$LD_LIBRARY_PATH But I have no idea what I'm doing. |
Just delete the Or maybe you can open the CMakeCache.txt and find and fix the paths there. |
Okay, I'll try these options. Somehow, I delinked my cmake compiler (again) so I'll try and sort this and let you know how it goes tomorrow. Edit: I realized I manually installed OpenCL-Headers instead of CLBlast, so I corrected my error, but CLBlast can't find the OpenCL library without ocl-icd installed... so i have to use apt install ocl-icd (Tried manually building, but there's no cmakelist, or make file. https://github.com/OCL-dev/ocl-icd) Once ocl-icd auto installs, it allows me to build CLBlast, which allows me to make llama.cpp with ClBlast enabled, but then the same error when running main,
I'm thinking termux can't access system/vendor/lib64 properly. I'll try editing the cmakecache file later this evening. |
It is an ICD loader, that means CLBlast and llama.cpp or any other program that uses OpenCL is actally using the loader. I don't know how it works on your phone but, here on GNU/Linux there are files in |
Thanks for clarifying. Double checking ocl-icd, it requires root permission - which I don't have. So, ocl-icd can't enable native OpenCL. I have no way of pathing to system/vendor/lib64. I dunno how clinfo is even able to access the information about OpenCL. In this way, the cmakecache.txt is unclear as there's no direct path to libOpenCL.so, which is required for building CLBlast, and llama.cpp. Anyway, thanks for trying but I don't see a simple way of making this work and ultimately, I hoped to help others get it working on their devices but the average person isn't going to do all of this. |
Following up with resolution, thanks again @SlyEcho, @0cc4m Beginning with a fresh install of Termux, install opencl-headers, opencl-clhpp, ocl-icd, clinfo. Following @SlyEcho instructions for building CLBlast:
Build llama.cpp with CLBlast enabled through cmake:
Then Termux users can start ./main with..
In this way, Termux enables GPU acceleration for llama.cpp. |
Hi there, to clarify, you ran pkg install clang, cmake, ocl-icd, opencl-headers, opencl-clhpp, yes? Ensuring OpenCL and CLBlast is properly installed, and linked is key. Based on the error message, it appears that you did not cd CLBlast after cloning the git. I'd do the following order (ensure starting in the $HOME directory with cd $HOME):
Then
then
then
then
Finally,
There might be a warning about cmake depreciation, but as far as I've seen: any other warning is unacceptable and probably means there's a linking/pathing error. I had to begin with a totally fresh install of Termux because I had old pathing messing up CLBlast installation. I will share the build folder, but it may not be compatible for your device; My device has Vulkan backend which isn't officially supported yet, so CLBlast has the lower performance comparatively. Openblas times around 250ms per token, and CLBlast around 350ms. |
git is not installed correctly. You could download the tarball from GitHub... but I would make sure that the dev environment is first set up correctly. As I don't use termux, I can't help you much, but probably you need the openssl-1.1 package. |
Fdroid is good for me. Playstore version is depreciated, not maintained, and lacks features. Here's my Fdroid Termux setup(run each seperately):
Before installing CLBlast and llama.cpp: test OpenCL with clinfo. My Termux setup requires clinfo to access my OpenCL library like this:
|
How much opencl will benefit inference speed in token per second? |
On a phone or iGPU? Probably not much. People have posted a lot of their testing in the issues here, usually the CPU is faster. It really is limited by the sheer size of the models, and even if the memory access could be improved for shared memory it is still a lot of computations. On dedicated GPUs? It can get pretty close to CUDA/ROCm speed when generating tokens. Prompt evaluation is still slower because CLBlast is not as fast as the vendor BLAS routines. But it also depends a lot on the GPU vendor (AMD, Nvidia), GPU age (GTX, RTX), Video RAM size (need a 8 GB card for 7BQ4_0), VRAM type (GDDR, HBM), OS and OpenCL driver (vendor, Mesa Clover, Mesa rusticl, clvk, etc.) that you have and can use. |
Certainly a phone gpu is limited, but it comes down to effective syncronizing of the cpu/gpu, is that right? I checked into the LLVM issue you had and found 2 similarly named, different things:
For my device, some apps allow llvm software renderering, or turnip+zink. for example, Alexvorxx drivers increase performance over LLVMpipe and freedreno open-source Gallium3D driver advertises OpenGL 4.6 for the A6xx series graphics is OpenGL relevant to the way llama.cpp functions now? I haven't seen any mention of it. |
Right now there is no way to use OpenGL or Vulkan in llama.cpp. |
Understood. Thank you! |
What is theoretical performance achievable on state-of-the-art mobile soc like exynos2200 or snapdragon 8 gen utilizing all resources ,i.e CPU GPU dsp, (assuming sufficient ddr5 memory available)? ~ 1.5 t/s currently reported on poco f3 or s22, is 4x speedup possible for a 7b model? |
With 7B models, OpenBlas print evals around 250ms, and print timings around 330ms is typical for my device (3 t/s), so I figure the devices you mentioned are faster if properly configured. It's difficult to guess what's possible with a fully supported GPU since it's theoretical, maybe 5 t/s. It could be more, like 10 t/s, but I'm just guessing. edit: the new t/s print is nice:
|
This is impossible to answer. I guess you could estimate something with the known FLOPS performance characteristics, but llama.cpp cannot use GPU and CPU at the same time fully, and it works best if using one one single type of performance core. For example on my Pinebook Pro (RK3399) today I tested 3B and it gets almost the same speed if I use 4 A-53 cores or 2 A-72 cores, but if I try to use all of them it is much slower. So just by only using the performance cores, most of the CPU cores are not even used. I have an SBC with the new RK3588S as well, and this one can generate 7B on its four A-76 cores at around 3.3 t/s. Using the four Cortex A-55 cores it is 0.8 t/s, using all cores 1.3 t/s. These newer SoC like Exynos 2200 have three types of cores, so I'm not sure which ones should be used for best performance. Won't know until someone tests it to find out. |
It appears llama.cpp has no limit, and makes no estimate on the hardware for the system it's installed. I'm not complaining, it is what it is. In this way, it's powerful so long as one narrows the parameters for the specific device/system.
--threads 8 is essentially full device lock for me. Termux/llama.cpp fights the Operating system for resources. It's cool that it can do that, but it's ineffecient. --threads 5 keeps my CPU around 80-90%, but lower performance vs. the sweet-spot for my device: --threads 4 on OpenBlas. If CLBlast built, then --threads 3 is better with the -ngl parameter. It's interesting to watch the resource monitor during inference: CPU throttles around 50-70%, and GPU starts less than 1%, spikes around 80-100% for a few seconds, then hovers around 20-30% while writing the response. |
Hi, I'm trying to compile llama.cpp using my opencl drivers. My device is a Samsung s10+ with termux.
On downloading and attempting make with LAMA_CLBLAST=1, I receive an error:
I edited the ggml-open.cl.cpp file TRYING to point it to my opencl libraries by replacing <clblast.h> with ocl_icd.h. (as my library path is /data/data/com.termux/files/usr/include)
Then with make LLAMA_CLBLAST=1 I received this:
Current Behavior
It appears my libraries for opencl are not included and I don't know how to make llama.cpp recognize them during compilation.
clinfo:
lscpu:
clpeak:
Thanks for any direction on this matter.
The text was updated successfully, but these errors were encountered: