Replies: 2 comments 10 replies
-
Hi, thanks for bringing up the discussion. I'll try to outline some of the practices that we have followed so far to accommodate different backends into BackgroundThe project started out with Lines 1102 to 1137 in 2833a6f Note that these are naive dot product implementations, without any advanced GEMM optimizations. Later, support for OpenBLAS and other BLAS CPU libraries was added directly in Lines 9498 to 9504 in 2833a6f At some point, the following idea for adding GPU support to
The existing backend implementations, even though mostly decoupled from the core This is a short background and overview of how we support various backends in Back to the specific questions:
In case XeTLA is something similar to BLAS, then it could be integrated straight into Note that if we decide to integrate it as a custom backend, I would like to have all the implementation contained in 1 or 2 files similar to the existing backends. It can of course include 3rd party libs (as we do with CUDA, Metal, etc.) but the
I understand the basic principle of JIT-ing, but I don't have experience with implementing this technique. If it is something that we can write in pure C and help to optimize the existing SIMD routines in |
Beta Was this translation helpful? Give feedback.
-
Add some comments might help
The whole case would be like end to end cases
|
Beta Was this translation helpful? Give feedback.
-
Hi, this is Mingfei from intel pytorch team and we want to help optimize the performance of llama.cpp on intel hardware. I need some guidelines about how to make contributions in this project:
Any opinion is welcome :) feel free to comment so that we can find the most proper manner to contribute.
Beta Was this translation helpful? Give feedback.
All reactions