[TOC]
This page exposes steps to make run a coding assistant on your VSCode/VSCodium.
- download a llama2 model
- convert it in gguf format (for llama.cpp execution for CPUs)
- make run llama.cpp as a service
- make run VSCode/VSCodium Continue plugin
The earn time, you should find your prepared model in this place: https://huggingface.co/TheBloke/Llama-2-7B-GGUF.
However, I'll describe how you could apply the conversino by yourself if you need it.
For llama 2
models, let's go to https://ai.meta.com/llama/ and let's follow the instructions.
Example:
From /home/user/llm
:
download.sh
After downloading your model, your should have a folder with consolidated.00.pth
file.
Its parent directory should also contain tokenizer.model
file
Example:
/home/user/llm/models/llama2/tokenizer.model
/home/user/llm/models/llama2/7B/consolidated.00.pth
Then, let's convert it to gguf format, to let llama.cpp use it.
From /home/user/llm
:
docker run \
-v ./models/llama2:/models \
ghcr.io/ggerganov/llama.cpp:full \
--convert /models/7B
Or in a docker-compose.yml
file:
llama-cpp:
image: ghcr.io/ggerganov/llama.cpp:full
volumes:
- ./models/llama2:/models
command: --convert /models/7B
... And command: docker compose up llama-cpp
docker run \
-d \
-p 64256:64256 \
-v ./models/llama2:/models \
ghcr.io/ggerganov/llama.cpp:full \
--server --host 0.0.0.0 --port 64256 -m /models/7B/ggml-model-f16.gguf -c 2048
Or in a docker-compose.yml
file:
llama-cpp:
image: ghcr.io/ggerganov/llama.cpp:full
volumes:
- ./models/llama2:/models
command: --server --host 0.0.0.0 --port 64256 -m /models/7B/ggml-model-f16.gguf -c 2048
ports:
- 64256:64256
... And command: docker compose up -d llama-cpp
To check your logs:
docker compose logs -f llama-cpp
Here are steps to make the link between VSCode/VSCodium Continue plugin and your llama.cpp service
Could be downloaded from https://marketplace.visualstudio.com/
Direct link I used (install through VSCodium remote ssh, in a linux X64 VM):
https://marketplace.visualstudio.com/_apis/public/gallery/publishers/Continue/vsextensions/continue/0.7.0/vspackage?targetPlatform=linux-x64
Then load manually your .vsix file: [email protected]
.
-
Server url:
http://172.16.3.63:65432
-
~/.continue/config.py setup:
from continuedev.libs.llm.llamacpp import LlamaCpp
(...)
config = ContinueConfig(
allow_anonymous_telemetry=False,
models=Models(
default=LlamaCpp(
max_context_length=2048,
server_url="http://172.16.3.63:64256")
),
(...)
And make it run with with docker.
Here is a convenient extract of my docker-compose.yml
file:
continue:
image: python:3.10.13-bookworm
working_dir: /continue
volumes:
- ./continue/config.py:/root/.continue/config.py
command:
- /bin/bash
- -c
- |
pip install continuedev
python -m continuedev --host 0.0.0.0 --port 65432
ports:
- 65432:65432
Make it run with: docker compose up -d continue
This expose only the first try.
We know clearly that the chat you'll get won't be powerful, but at least we have a full integration chain.
On next try, we'll discover rift
full solution.