Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: File /var/log/cuda-installer.log does not exist when installer an older cuda #207

Open
fangq opened this issue Mar 13, 2023 · 5 comments · Fixed by #215
Open

Error: File /var/log/cuda-installer.log does not exist when installer an older cuda #207

fangq opened this issue Mar 13, 2023 · 5 comments · Fixed by #215

Comments

@fangq
Copy link

fangq commented Mar 13, 2023

thanks for developing this action script!

I spent the past few hours trying to use it for one of my CUDA projects. I found two issues and would like to see if there is a workaround

  1. I kept getting an error (Error: Error: File /var/log/cuda-installer.log does not exist) at the end of the cuda installation, see https://github.com/fangq/mcx/actions/runs/4401047764/jobs/7706904559. I even forked this repo and removed the artifact uploading code inside src/installer.ts, yet, it still kept giving me this error. I was trying to install cuda 9.2 for its smaller size. is there a workaround?

  2. the default cuda 12.x is over 4GB in size. when I used the default setting, I got "No space left on device" error, see https://github.com/fangq/mcx/actions/runs/4400016129/jobs/7704933495; same for the the CI tests: https://github.com/fangq/cuda-toolkit/actions/runs/4401127506/jobs/7707048051. what is the most compact installation option? I only need nvcc and libcudart/libcudartstatic for my project.

@Jimver
Copy link
Owner

Jimver commented Mar 18, 2023

Hey thx for the report, for your first question that is a good one, I'll change the artifact uploading code in a new PR.

For your second point, have you tried using the network method? Like this:

steps:
- uses: Jimver/[email protected]
  id: cuda-toolkit
  with:
    cuda: '12.1.0'
    method: 'network'
    sub-packages: '["nvcc"]'

You can see more details in the README.md file

Jimver added a commit that referenced this issue Mar 18, 2023
- This fixes #207, older CUDA versions don't have an installer log at
  the hardcoded location of /var/log/cuda-installer.log so the
  artifact uploader would fail.
- To fix this let's not upload any artifacts in the first place if there
  is no log file to upload.
@Jimver
Copy link
Owner

Jimver commented Mar 18, 2023

Not sure if point 2 is fixed yet so reopening

@Jimver Jimver reopened this Mar 18, 2023
okazunori2013 referenced this issue in okazunori2013/cuda-toolkit Mar 19, 2023
- This fixes #207, older CUDA versions don't have an installer log at
  the hardcoded location of /var/log/cuda-installer.log so the
  artifact uploader would fail.
- To fix this let's not upload any artifacts in the first place if there
  is no log file to upload.
okazunori2013 referenced this issue in okazunori2013/cuda-toolkit Mar 19, 2023
- This fixes #207, older CUDA versions don't have an installer log at
  the hardcoded location of /var/log/cuda-installer.log so the
  artifact uploader would fail.
- To fix this let's not upload any artifacts in the first place if there
  is no log file to upload.
@robandpdx
Copy link

robandpdx commented May 7, 2024

I am seeing the install log artifact upload fail due to file permissions...

Beginning upload of artifact content to blob storage
node:events:492
      throw er; // Unhandled 'error' event
      ^
Error: EACCES: permission denied, open '/var/log/cuda-installer.log'
Emitted 'error' event on ReadStream instance at:
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  errno: -13,
  code: 'EACCES',
  syscall: 'open',
  path: '/var/log/cuda-installer.log'
}

In my case, the installer log has the following permissions...

-rw-------  1 runner    root              9022 May  7 04:05 nvidia-installer.log

@robandpdx
Copy link

More info on my situation encountering the error above...

I am using GitHub's GPU runners, which are using the official NVIDIA image from Azure Marketplace. On NVIDIA image, umask in /etc/login.defs is changed to 077. It seems to be intentional change which NVIDIA folks make during image generation because default umask on base ubuntu image is 022. Looks like 077 mask is recommended by some internet resources to make the vm more secure.

If task uses sudo to install something, task should make sure that tool is available without sudo later. For example, setup-miniconda fixes permissions on MacOS using chown. So I guess this action should perform something similar for Ubuntu.

@qthequartermasterman
Copy link
Contributor

@robandpdx did you find a way to solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants