llama-cpp-python cuBLAS wheels

Wheels for llama-cpp-python compiled with cuBLAS support.

Requirements:

Windows x64, Linux x64, or MacOS 11.0+
CUDA 11.6 - 12.2
CPython 3.8 - 3.11

Warning

MacOS 11 and Windows ROCm wheels are unavailable for 0.2.21+.
This is due to build issues with llama.cpp that are not yet resolved.

ROCm builds for AMD GPUs: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/tag/rocm
Metal builds for MacOS 11.0+: https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/tag/metal

Installation instructions:

To install, you can use this command:

python -m pip install llama-cpp-python --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117

This will install the latest llama-cpp-python version available from here for CUDA 11.7. You can change cu117 to change the CUDA version.
You can also change AVX2 to AVX, AVX512 or basic based on what your CPU supports.
basic is a build without AVX, FMA and F16C instructions for old or basic CPUs.
CPU-only builds are also available by changing cu117 to cpu.

You can install a specific version with:

python -m pip install llama-cpp-python==<version> --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117

An example for installing 0.1.62 for CUDA 12.1 on a CPU without AVX2 support:

python -m pip install llama-cpp-python==0.1.62 --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX/cu121

List of available versions:

python -m pip index versions llama-cpp-python --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117

If you are replacing an already existing installation, you may need to uninstall that version before running the command above.
You can also replace the existing version in one command like so:

python -m pip install llama-cpp-python --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
-OR-
python -m pip install llama-cpp-python==0.1.66 --force-reinstall --no-deps --index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117
-OR-
python -m pip install llama-cpp-python --prefer-binary --upgrade --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117

Wheels can be manually downloaded from: https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels

I have renamed llama-cpp-python packages available to ease the transition to GGUF.
This is accomplished by installing the renamed package alongside the main llama-cpp-python package.
This should allow applications to maintain GGML support while still supporting GGUF.

python -m pip install llama-cpp-python-ggml --prefer-binary --extra-index-url=https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels/AVX2/cu117

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.github		.github
index		index
old_workflows		old_workflows
LICENSE		LICENSE
README.md		README.md
generate-html.ps1		generate-html.ps1
generate-textgen-html.ps1		generate-textgen-html.ps1
host-index.bat		host-index.bat
host-index.sh		host-index.sh
workflows.md		workflows.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama-cpp-python cuBLAS wheels

Installation instructions:

All wheels are compiled using GitHub Actions

About

Releases 8

Packages

Contributors 3

Languages

License

jllllll/llama-cpp-python-cuBLAS-wheels

Folders and files

Latest commit

History

Repository files navigation

llama-cpp-python cuBLAS wheels

Installation instructions:

All wheels are compiled using GitHub Actions

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 3

Languages

Packages