Installation issue #24

jayjk13 · 2024-10-28T17:38:49Z

Hello,
I'm encountering an issue when trying to install the grouped-gemm package during the Docker image build process. The installation fails with an error indicating that no NVIDIA driver is found. This happens despite using a CUDA-enabled base image.

Environment

Base Image: nvidia/cuda:12.6.2-cudnn-devel-ubuntu22.04
Python Version: 3.10
Pip Version: 23.3.1
Operating System Inside Docker: Ubuntu 22.04
grouped-gemm Version: Attempting to install grouped-gemm==0.1.6

Dockerfile Snippet

FROM nvidia/cuda:12.6.2-cudnn-devel-ubuntu22.04

ENV CUDA_VISIBLE_DEVICES="0"

# Install system dependencies
RUN apt-get update && \
    apt-get install -y \
    git \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Upgrade pip
RUN pip3 install --upgrade pip

# Install necessary Python packages
RUN pip3 install transformers==4.45.0 \
    accelerate==0.34.1 \
    sentencepiece==0.2.0 \
    torchvision \
    requests \
    torch \
    Pillow \
    grouped-gemm

Error Message

During the pip install step, I receive the following error:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Full error log:

#16 83.25 /tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
#16 83.25 cpu = _conversion_method_template(device=torch.device("cpu"))
#16 83.25 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#16 83.25 Traceback (most recent call last):
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#16 83.25     main()
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#16 83.25     json_out['return_val'] = hook(**hook_input['kwargs'])
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
#16 83.25     return hook(config_settings)
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
#16 83.25     return self._get_build_requires(config_settings, requirements=['wheel'])
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires
#16 83.25     self.run_setup()
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
#16 83.25     exec(code, locals())
#16 83.25   File "<string>", line 16, in <module>
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 509, in get_device_capability
#16 83.25     prop = get_device_properties(device)
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 523, in get_device_properties
#16 83.25     _lazy_init() # will define _get_device_properties
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 319, in _lazy_init
#16 83.25     torch._C._cuda_init()
#16 83.25 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Steps to Reproduce

Create a Dockerfile with the contents provided above.
Run docker build -t test-image . to build the Docker image.
Observe that the build fails during the pip install grouped-gemm step with the error about missing NVIDIA drivers.

The text was updated successfully, but these errors were encountered:

lingjzhu · 2024-11-04T20:00:02Z

Are you able to resolve this issue? I encountered the exact same situation and was unable to solve it despite trying different methods for three days.

jayjk13 · 2024-11-04T22:23:16Z

Nope, I wasn't able to solve it @lingjzhu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installation issue #24

Installation issue #24

jayjk13 commented Oct 28, 2024 •

edited

Loading

lingjzhu commented Nov 4, 2024

jayjk13 commented Nov 4, 2024

Installation issue #24

Installation issue #24

Comments

jayjk13 commented Oct 28, 2024 • edited Loading

Environment

Dockerfile Snippet

Error Message

Steps to Reproduce

lingjzhu commented Nov 4, 2024

jayjk13 commented Nov 4, 2024

jayjk13 commented Oct 28, 2024 •

edited

Loading