Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation issue #24

Open
jayjk13 opened this issue Oct 28, 2024 · 2 comments
Open

Installation issue #24

jayjk13 opened this issue Oct 28, 2024 · 2 comments

Comments

@jayjk13
Copy link

jayjk13 commented Oct 28, 2024

Hello,
I'm encountering an issue when trying to install the grouped-gemm package during the Docker image build process. The installation fails with an error indicating that no NVIDIA driver is found. This happens despite using a CUDA-enabled base image.

Environment

  • Base Image: nvidia/cuda:12.6.2-cudnn-devel-ubuntu22.04
  • Python Version: 3.10
  • Pip Version: 23.3.1
  • Operating System Inside Docker: Ubuntu 22.04
  • grouped-gemm Version: Attempting to install grouped-gemm==0.1.6

Dockerfile Snippet

FROM nvidia/cuda:12.6.2-cudnn-devel-ubuntu22.04

ENV CUDA_VISIBLE_DEVICES="0"

# Install system dependencies
RUN apt-get update && \
    apt-get install -y \
    git \
    python3 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Upgrade pip
RUN pip3 install --upgrade pip

# Install necessary Python packages
RUN pip3 install transformers==4.45.0 \
    accelerate==0.34.1 \
    sentencepiece==0.2.0 \
    torchvision \
    requests \
    torch \
    Pillow \
    grouped-gemm

Error Message

During the pip install step, I receive the following error:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Full error log:

#16 83.25 /tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/_subclasses/functional_tensor.py:295: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
#16 83.25 cpu = _conversion_method_template(device=torch.device("cpu"))
#16 83.25 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
#16 83.25 Traceback (most recent call last):
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#16 83.25     main()
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#16 83.25     json_out['return_val'] = hook(**hook_input['kwargs'])
#16 83.25   File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
#16 83.25     return hook(config_settings)
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
#16 83.25     return self._get_build_requires(config_settings, requirements=['wheel'])
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 295, in _get_build_requires
#16 83.25     self.run_setup()
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
#16 83.25     exec(code, locals())
#16 83.25   File "<string>", line 16, in <module>
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 509, in get_device_capability
#16 83.25     prop = get_device_properties(device)
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 523, in get_device_properties
#16 83.25     _lazy_init() # will define _get_device_properties
#16 83.25   File "/tmp/pip-build-env-a17rx2c3/overlay/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 319, in _lazy_init
#16 83.25     torch._C._cuda_init()
#16 83.25 RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Steps to Reproduce

  1. Create a Dockerfile with the contents provided above.
  2. Run docker build -t test-image . to build the Docker image.
  3. Observe that the build fails during the pip install grouped-gemm step with the error about missing NVIDIA drivers.
@lingjzhu
Copy link

lingjzhu commented Nov 4, 2024

Are you able to resolve this issue? I encountered the exact same situation and was unable to solve it despite trying different methods for three days.

@jayjk13
Copy link
Author

jayjk13 commented Nov 4, 2024

Nope, I wasn't able to solve it @lingjzhu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants