Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel Arc / XPU support #631

Closed
itlackey opened this issue Oct 25, 2023 · 21 comments
Closed

Intel Arc / XPU support #631

itlackey opened this issue Oct 25, 2023 · 21 comments
Assignees
Labels
enhancement New feature or request

Comments

@itlackey
Copy link

I would be great to be able to run Tabby locally on my Intel Arc GPU.

Additional context

This is currently possible in tools like llama.cpp by compiling with OpenCL support. I have no idea how that would (or could) translate to Rust.


Please reply with a 👍 if you want this feature.

@itlackey itlackey added the enhancement New feature or request label Oct 25, 2023
@itlackey
Copy link
Author

If I am understanding this code correctly (crates/llama-cpp-bindings/build.rs), I believe we just need a switch for OpenCL to enable Intel GPU support.

If OpenCL is selected, then add the build args as described in this section of the llama.cpp docs:
https://github.com/ggerganov/llama.cpp#clblast

@wsxiaoys
Copy link
Member

Hi @itlackey, unfortunately, I don't have an Intel Arc card to try out. If anyone has a card and is interested in giving it a try, please feel free to do so! Happy to help if any problems arise.

@itlackey
Copy link
Author

I have a card but no Rust experience and basic understanding of the underlying C++ libraries. I could try adjusting the code but I'm not entirely sure of what the entire list of changes would be. Do you know if there would need additonal changes beyond altering the build args in build.rs?

@wsxiaoys
Copy link
Member

I think the first step would be following the instructions here: https://github.com/TabbyML/tabby#-contributing to make it build in your local dev environment. Then you could tune the building flags in llama-cpp-bindings’s build.rs a bit to make it compiles with opencl support.

@itlackey
Copy link
Author

Sounds reasonable, I will give it a try as soon as I get a chance.

@cromefire
Copy link
Contributor

cromefire commented Nov 25, 2023

Trying to get this working (more specifically Intel iGPU support and also ROCm support) and but after compiling it I just get a "501 Error: Not Implemented" (in Docker). No errors during build though. Any idea what went wrong?

I won't be using OpenCL support, but using Intel MKL and hipBLAS, as they seem the better fit. Pretty sure there are still issues but if I can't even hit it, I can't test it.

@cromefire
Copy link
Contributor

Put it all in a pull request here: #895

@itlackey
Copy link
Author

Nice work!! I put this on the back burner due to not being able to get decent performance using OpenCL with llama.cpp but it looks like you found a better approach. Thanks for pushing this forward!

@cromefire
Copy link
Contributor

Well I still have to get it work, right now every sort of configuration for tabby just returns HTTP 501

@itlackey
Copy link
Author

Does llama.cpp work in the gpu with these compiler options? If not, get llama.cpp working as expected and then port that to the Tabby build settings. Skimming through the changes to Tabby, it seems like you're in the right track.

@hungle-i3
Copy link

Hello everyone, I am working on supporting intel CPU Arch by integrating intel openapi platform to tabby. Just wanted to know whether there is any update on this. I love to contribute to this for getting it done. Thanks.

@cromefire
Copy link
Contributor

cromefire commented Feb 8, 2024

Upstream (llama.cpp) has to support it. As soon as it has that I have stuff already prepared. Alternatively if it'll be integrated faster, Vulkan compute can also be used, but the same deal, it has to be merged first.

Haven't followed those PRs though, so if one of them has merged, tell me and I'll get it done. As soon as Tabby's fork of llama.cpp has been updated of course.

@itlackey
Copy link
Author

itlackey commented Feb 8, 2024

I have not spent time on it in a while. I did see SYCL is now supported in llama.cpp and works well on Intel.

@cromefire
Copy link
Contributor

I have not spent time on it in a while. I did see SYCL is now supported in llama.cpp and works well on Intel.

Then I should probably get to it once I find a slither of time, I have SYCL already prepared pretty much.

@wsxiaoys has the llama.cpp fork already been updated?

@wsxiaoys
Copy link
Member

wsxiaoys commented Feb 8, 2024

It's updated in recent release (0.8): https://github.com/TabbyML/llama.cpp

@hungle-i3
Copy link

hungle-i3 commented Feb 17, 2024

Hi @wsxiaoys, llama.cpp fork binding with the tabby releases (0.8/0.9) hasn't updated to the SYCL support.
At the moment, for intel architecture, I am planning to support 2 following features as recommendation from llama.cpp https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md

Is it good to go?

@cromefire
Copy link
Contributor

onemkl: For intel CPU https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html#gs.4lpxoo.

Not sure whether that's even worth it as I think the default CPU stuff already uses AVX and so on I think.

@hungle-i3
Copy link

@cromefire , will enable Intel BLAS (Intel10_64lp) at onemkl feature as Intel guideline https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html

@cromefire
Copy link
Contributor

@cromefire , will enable Intel BLAS (Intel10_64lp) at onemkl feature as Intel guideline https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html

Would definitely still suggest to try it first if it even helps with anything and if it does maybe check for regression, because otherwise it might be easier to just use it by default rather than adding it as a "backend". 2 CPU backbends would kinda be confusing...

hungle-i3 added a commit to i3automation/tabby that referenced this issue Feb 17, 2024
    * Support new feature: openapi
    * Change compiler to Intel llvm when compiling llama.cpp
    * Support Intel BLAS (Intel10_64lp)
@hungle-i3
Copy link

Thanks @cromefire for your suggestion.
Pull request at #1474

@wsxiaoys
Copy link
Member

Closing as vulkan support is preferred for such use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants