Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU support in Docker, other Docker-related updates #1655

Closed
wants to merge 1 commit into from

Conversation

lukaboljevic
Copy link

@lukaboljevic lukaboljevic commented Feb 28, 2024

The main contribution of this PR is the addition of a new Dockerfile and docker compose file for running PrivateGPT on GPU in Docker. The command to run is simply docker compose -f docker-compose-gpu.yaml up --build. This should address issues like #1652, #1597, #1405.

The PR also proposes some changes based on #1428:

  • Added max workers for poetry installer
  • Added an entrypoint.sh script. Here, the user can specify the model, tokenizer, prompt style and embedding model through environment variables, and then those are automatically downloaded using the setup script.
  • Choose the tokenizer and prompt style in settings-docker.yaml, and update default Mistral model to v0.2 from v0.1
  • Removed USER worker as it seems to have caused a segfault on Mac (can't test unfortunately, as I don't have a Mac)

I feel like having the entrypoint.sh script and simply running docker compose up is way simpler and more transparent than the current (and not very well documented) approach of running docker compose run --rm --entrypoint="bash -c '[ -f scripts/setup ] && scripts/setup'" private-gpt. The script also allows the user to more easily and directly choose model, tokenizer, prompt style and embedding model. The current approach involves creating a new settings-FOO.yaml file, and including FOO as a new profile in the corresponding docker compose file. This isn't bad at all, it just takes a while to find through the documentation, and may require a few attempts and browsing through issues like #1579 and #1573 before getting it to run.

Whatever your stance is on the entrypoint script, the documentation needs to be updated to tell the user as directly as possible what is the correct way to run and configure PrivateGPT in Docker. One shouldn't have to dig deep in the documentation and issues to look for something that already exists and works. I would really like to hear your opinion on this, so we can discuss on how exactly to update it.

In any case, I'm open for any comments and suggestions, and I hope you find this PR useful.

Edit: I ran make test and make check - 30 tests passed with 23 warnings, while make check fixed some files which I did not edit.

Edit 08.03.2024.: Closed in favour of #1690.

@neofob
Copy link

neofob commented Mar 3, 2024

@lukaboljevic :
This is great! You beat me to it. I plan to work on this this weekend but you saved me some time. :)

What I would do is similar except a minor change where I would install virtualenv in the base (builder) stage, install deps..etc. then copy the virtual env directory to the app stage in Dockerfile. Anyhow, what you are doing is similar. I'll check out your PR and let you know how it goes.

@neofob
Copy link

neofob commented Mar 3, 2024

@lukaboljevic :

  • The line 74 in Dockerfile.local.gpu should be
    ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"

  • docker build works

  • docker-compose up works

@lukaboljevic
Copy link
Author

lukaboljevic commented Mar 4, 2024

@lukaboljevic :

  • The line 74 in Dockerfile.local.gpu should be
    ENV PYTHONPATH="$PYTHONPATH:/home/worker/app/private_gpt/"
  • docker build works
  • docker-compose up works

I agree with you, it should be this way as this is the correct path to the private_gpt folder. This line is present in Dockerfile.local and Dockerfile.local.gpu, so I tested your suggestion in both and everything works without issues. However, it for some reason seems to work even with just ENV PYTHONPATH="$PYTHONPATH:/private_gpt/" - this is what the current Dockerfile.local on the main branch has, which is why I didn't pay too much attention to that line until now.

I will wait for the input from @imartinez to update, but yes, I agree with you. I'm also glad docker build and compose work for you.

@dpedwards
Copy link

To address a NVIDIA GPU from a Docker Container the following Dockerfile config is working for me as well on Linux & Windows after fully installing the CUDA-Toolkit:

Dockerfile.cuda

# Use a specific version of the nvidia base image
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 as base

ENV DEBIAN_FRONTEND="noninteractive"
ENV TZ="Europe/Ljubljana"

# Minimize the number of RUN commands to reduce the number of layers
RUN apt-get update && apt-get install -y software-properties-common && \
    add-apt-repository ppa:deadsnakes/ppa && \
    apt-get update && \
    apt-get install -y python3.11 python3.11-venv python3-pip && \
    ln -sf /usr/bin/python3.11 /usr/bin/python3 && \
    python3 --version && \
    apt-get install -y libopenblas-dev ninja-build build-essential pkg-config wget gcc && \
    pip install pipx && \
    python3 -m pipx ensurepath && \
    pipx install poetry && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

ENV PATH="/root/.local/bin:$PATH"
ENV POETRY_VIRTUALENVS_IN_PROJECT=true

############################################
FROM base as dependencies
############################################

WORKDIR /home/worker/app
COPY pyproject.toml poetry.lock ./

RUN poetry install --extras "ui llms-llama-cpp embeddings-huggingface vector-stores-qdrant" && \
    CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python

############################################
FROM base as app
############################################

ENV PYTHONUNBUFFERED=1
ENV PORT=8080
EXPOSE 8080

RUN useradd -m worker
USER worker
WORKDIR /home/worker/app

RUN mkdir -p local_data models
COPY --chown=worker --from=dependencies /home/worker/app/.venv/ .venv
COPY --chown=worker private_gpt/ private_gpt
COPY --chown=worker fern/ fern
COPY --chown=worker *.yaml *.md ./

ENTRYPOINT [".venv/bin/python", "-m", "private_gpt"]
  1. Build the Docker image:
    docker build -f Dockerfile.cuda -t YOUR_DOCKER_IMAGE_NAME:YOUR_DOCKER_IMAGE_TAG .
    Example:
    docker build -f Dockerfile.cuda -t rag-cuda:latest .

  2. Run the Docker container:
    docker run -it --gpus all -v "YOUR_HOST_MODEL_PATH:/home/worker/app/models" -v "YOUR_HOST_LOCAL_DATA_PATH:/home/worker/app/local_data" -p 8080:8080 YOUR_DOCKER_IMAGE_NAME:YOUR_DOCKER_IMAGE_TAG
    Example:
    docker run -it --gpus all -v "/home/ubuntu/development/private-gpt-api/models:/home/worker/app/models" -v "/home/ubuntu/development/private-gpt-api/local_data:/home/worker/app/local_data" -p 8080:8080 rag-cuda:latest

private-gpt-api_Docker_CUDA_run

@neofob
Copy link

neofob commented Apr 24, 2024

@dpedwards : With recent change in llama-cpp-python, after 0.2.58, iirc, you need to use the flag -DLLAMA_CUDA=on instead of -DLLAMA_CUBLAS=on to get CUDA support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants