Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing nvcc Compiler #41

Open
seanlaw opened this issue Jul 9, 2021 · 8 comments
Open

Missing nvcc Compiler #41

seanlaw opened this issue Jul 9, 2021 · 8 comments

Comments

@seanlaw
Copy link

seanlaw commented Jul 9, 2021

@AgrawalAmey @xmnlab Do either of you know if nvcc has been (accidentally) removed in this repo for cuda 11? When I install cudatoolkit-dev=10.1 it contains nvcc and I understand that nvcc doesn't come pre-packaged with cudatoolkit. Did something change for cuda 11? Is there an alternative way to get nvcc via conda-forge?

@AgrawalAmey
Copy link
Contributor

@seanlaw this is not expected. Perhaps something changed in how cuda packages these files. Will investigate this. Thanks!

@seanlaw
Copy link
Author

seanlaw commented Jul 9, 2021

Perhaps something changed in how cuda packages these files. Will investigate this. Thanks!

Awesome! Please let me know if you need me to be more specific about the exact cudatoolkit-dev version(s) or if I can provide any further information

@AgrawalAmey
Copy link
Contributor

Yes, ee recently published a bunch of new versions. It would be a great help if you could specify the exact version @seanlaw. Thank you!

@seanlaw
Copy link
Author

seanlaw commented Jul 10, 2021

@AgrawalAmey I ran a bunch of tests on multiple different versions of the cudatoolkit-dev starting from 10.1.243 to 11.4.0 and it looks like 11.4.0 (the latest version) did not have nvcc (it couldn't not be found in the typical miniconda3/bin directory). All of the others seemed fine. When I rolled back to 11.3.1 then nvcc was found

@AgrawalAmey
Copy link
Contributor

Thank @seanlaw, this helps a lot! Will try to fix it today.

@AgrawalAmey
Copy link
Contributor

@seanlaw sorry for the delay got occupied last week. I tried to test it out today. But could not reproduce the issue. Check the following trace:

> conda install -c conda-forge cudatoolkit-dev
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.3

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /mnt/tmp/cuda114/env

  added / updated specs:
    - cudatoolkit-dev


The following NEW packages will be INSTALLED:

  cudatoolkit-dev    conda-forge/linux-64::cudatoolkit-dev-11.4.0-py39h3811e60_1
  python_abi         conda-forge/linux-64::python_abi-3.9-2_cp39

The following packages will be UPDATED:

  ca-certificates    pkgs/main::ca-certificates-2020.12.8-~ --> conda-forge::ca-certificates-2021.5.30-ha878542_0
  certifi            pkgs/main::certifi-2020.12.5-py39h06a~ --> conda-forge::certifi-2021.5.30-py39hf3d152e_0
  openssl                                 1.1.1i-h27cfd23_0 --> 1.1.1k-h27cfd23_0


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

> which nvcc
/mnt/tmp/cuda114/env/bin/nvcc

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0

Can you please share the system information? OS, conda version, precise build version of the cudatoolkit-dev package?

Thanks!

@seanlaw
Copy link
Author

seanlaw commented Jul 19, 2021

I am starting with a 100% clean/fresh miniconda installation on Ubuntu "18.04.5 LTS (Bionic Beaver)" running Python 3.7.10 and Conda 4.10.3.

In case it matters, nvidia-smi also displays this:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 465.27       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   30C    P8    11W / 280W |     19MiB / 11176MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:84:00.0 Off |                  N/A |
|  0%   32C    P8    11W / 280W |      6MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1558      G   /usr/lib/xorg/Xorg                  9MiB |
|    0   N/A  N/A      1648      G   /usr/bin/gnome-shell                6MiB |
|    1   N/A  N/A      1558      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

I am installing the cudatoolkit-dev v11.4.0 using mamba and I am seeing this:

ERROR conda.core.link:_execute(701): An error occurred while installing package 'conda-forge::cudatoolkit-dev-11.4.0-py39h3811e60_1'.
Rolling back transaction: done

LinkError: post-link script failed for package conda-forge::cudatoolkit-dev-11.4.0-py39h3811e60_1
location of failed script: /home/sean/miniconda3/bin/.cudatoolkit-dev-post-link.sh
==> script messages <==
<None>
==> script output <==
stdout: Running Post installation
downloading https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run to /home/sean/miniconda3/pkgs/cudatoolkit-dev/cuda_11.4.0_470.42.01_linux.run
Extracting on Linux

stderr:
return code: 1

()

I also tried starting with a clean/fresh miniconda installation again and, instead of using mamba, I tried installing cudatoolkit-dev using conda directly:

$ conda install -c conda-forge cudatoolkit-dev
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/sean/miniconda3

  added / updated specs:
    - cudatoolkit-dev


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge
    certifi-2021.5.30          |   py37h89c1867_0         141 KB  conda-forge
    conda-4.10.3               |   py37h89c1867_0         3.1 MB  conda-forge
    cudatoolkit-dev-11.4.0     |   py37h5e8e339_1           9 KB  conda-forge
    openssl-1.1.1k             |       h7f98852_0         2.1 MB  conda-forge
    python_abi-3.7             |          2_cp37m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         5.5 MB

The following NEW packages will be INSTALLED:

  cudatoolkit-dev    conda-forge/linux-64::cudatoolkit-dev-11.4.0-py37h5e8e339_1
  python_abi         conda-forge/linux-64::python_abi-3.7-2_cp37m

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates    pkgs/main::ca-certificates-2021.7.5-h~ --> conda-forge::ca-certificates-2021.5.30-ha878542_0
  certifi            pkgs/main::certifi-2021.5.30-py37h06a~ --> conda-forge::certifi-2021.5.30-py37h89c1867_0
  conda              pkgs/main::conda-4.10.3-py37h06a4308_0 --> conda-forge::conda-4.10.3-py37h89c1867_0
  openssl              pkgs/main::openssl-1.1.1k-h27cfd23_0 --> conda-forge::openssl-1.1.1k-h7f98852_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
conda-4.10.3         | 3.1 MB    | ########################################################################################## | 100%
cudatoolkit-dev-11.4 | 9 KB      | ########################################################################################## | 100%
openssl-1.1.1k       | 2.1 MB    | ########################################################################################## | 100%
ca-certificates-2021 | 136 KB    | ########################################################################################## | 100%
certifi-2021.5.30    | 141 KB    | ########################################################################################## | 100%
python_abi-3.7       | 4 KB      | ########################################################################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

$ which nvcc
/home/sean/miniconda3/bin/nvcc

And, as you can see, it installs successfully and nvcc is indeed present. However, this didn't work and I realized that the CUDA version is 11.3 while the cudatoolkit is for version 11.4! According to the CUDA release notes, my driver version (465.27) is only compatible with CUDA version 11.3 and not with 11.4. Only driver versions greater or equal to 470.42.01 is compatible with CUDA 11.4.

I guess the problem now is figuring out why mamba is unable to link nvcc properly but that may be beyond the scope of this repo.

@oublalkhalid
Copy link

Hello,

The same issue here, any updates on nvcc ??

Processing open-source/mamba
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [13 lines of output]
      open-source/mamba/setup.py:78: UserWarning: mamba_ssm was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
        warnings.warn(
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "open-source/mamba/setup.py", line 112, in <module>
          if bare_metal_version >= Version("11.8"):
      NameError: name 'bare_metal_version' is not defined
      
      
      torch.__version__  = 2.2.0+cu121
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
``
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants