Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/tmp/cuda-installer.log causes subsequent installs of other users to fail #76

Open
1 task done
skwde opened this issue Oct 30, 2023 · 1 comment
Open
1 task done
Labels
bug Something isn't working

Comments

@skwde
Copy link

skwde commented Oct 30, 2023

Solution to issue cannot be found in the documentation.

  • I checked the documentation.

Issue

This is already described in here: #44
The solution is to remove /tmp/cuda-installer.log created by the user first installing cuda on the system.

However a backport to older cudatoolkit-dev is missing.
To my understanding, @jnooree tried but didn't find a fitting commit.
He mentioned to create another issue to notify maintainers, what I do now.

I use following environment.yml

---
name: tf2
channels:
  - conda-forge
dependencies:
  - python>=3
  - tensorflow==2.11.1=cuda112*
  - cudatoolkit-dev=11.2
  # ## not necessary (all required tools are in cudatoolkit-dev), though otherwise 11.8 is installed which is confusing
  - cudatoolkit=11.2

which fails with

ERROR conda.core.link:_execute(730): An error occurred while installing package 'conda-forge::cudatoolkit-dev-11.2.2-py39h3811e60_0'.

LinkError: post-link script failed for package conda-forge::cudatoolkit-dev-11.2.2-py39h3811e60_0
location of failed script: <some path>/.conda/envs/tf2/bin/.cudatoolkit-dev-post-link.sh
==> script messages <==
<None>
==> script output <==
stdout: Log file not open.
Running Post installation
downloading https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run to <some path>/.conda/envs/tf2/pkgs/cudatoolkit-dev/cuda_11.2.2_460.32.03_linux.run
Extracting on Linux

stderr: <some path>/.conda/envs/tf2/pkgs/cudatoolkit-dev/cuda_11.2.2_460.32.03_linux.run: line 513: 680866 Segmentation fault
    (core dumped) ./cuda-installer --silent --toolkit --toolkitpath=/data/scratch/62320/tmpj5oiro_4 --override
Traceback (most recent call last):
  File "<some path>/.conda/envs/tf2/bin/cudatoolkit-dev-post-install.py", line 216, in <module>
    _main()
  File "<some path>/.conda/envs/tf2/bin/cudatoolkit-dev-post-install.py", line 212, in _main
    extractor.extract()
  File "<some path>/.conda/envs/tf2/bin/cudatoolkit-dev-post-install.py", line 124, in extract
    subprocess.run(cmd, env=os.environ.copy(), check=True)
  File "<some path>/.conda/envs/tf2/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['<some path>/.conda/envs/tf2/pkgs/cudatoolkit-dev/cuda_11.2.2_460.32.03_linux.run', '--silent', '--toolkit', '--toolkitpath=/data/scratch/62320/tmpj5oiro_4', '--override']' returned non-zero exit status 139.

return code: 1

()

Installed packages

$ conda list
# packages in environment at /cfs/earth/scratch/slurmtest1/.conda/envs/tf2:
#
# Name                    Version                   Build  Channel

Environment info

$ conda info

     active environment : None
            shell level : 0
       user config file : ~/.condarc
 populated config files : <path to global config>/condarc
          conda version : 4.12.0
    conda-build version : not installed
         python version : 3.9.12.final.0
       virtual packages : __linux=4.18.0=0
                          __glibc=2.28=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : <path to spack installed>/miniconda3-4.12.0  (read only)
      conda av data dir : <path to spack installed>/miniconda3-4.12.0/etc/conda  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : <some loc>/.conda/pkgs
       envs directories : <some loc>/.conda/envs
                          ~/.conda/envs
                          <path to spack installed>/miniconda3-4.12.0/envs
               platform : linux-64
             user-agent : conda/4.12.0 requests/2.27.1 CPython/3.9.12 Linux/4.18.0-372.16.1.el8_6.x86_64 rocky/8.6 glibc/2.28
                UID:GID : 529000190:529000190
             netrc file : None
           offline mode : False
@skwde skwde added the bug Something isn't working label Oct 30, 2023
@ThePassedWind
Copy link

Facing the same problem too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants