Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDI spec devices are not valid input in all device input locations #16232

Closed
Clockwork-Muse opened this issue Oct 19, 2022 · 4 comments · Fixed by #16580
Closed

CDI spec devices are not valid input in all device input locations #16232

Clockwork-Muse opened this issue Oct 19, 2022 · 4 comments · Fixed by #16580
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@Clockwork-Muse
Copy link

/kind bug

Description

Although it is now possible to list a CDI spec during podman run, there are a few other places where devices can be specified where the CDI spec appears to not be allowed.

Steps to reproduce the issue:

  1. Install podman and the nvidia-container-toolkit, and generate the relevant CDI spec.

  2. Modify the containers config: sudo sed -i 's|^#devices = \[\]|devices = ["nvidia.com/gpu=gpu0"]|;' /usr/share/containers/containers.conf

  3. Run a gpu-enabled container: podman run --rm docker.io/nvidia/cudagl:11.4.2-runtime-ubuntu20.04 nvidia-smi

Describe the results you received:

ERRO[0000] validating containers config: invalid device mode: nvidia.com/gpu=gpu0 

Describe the results you expected:
The container runs successfully, and produces output from nvidia-smi

Additional information you deem important (e.g. issue happens only occasionally):

I'm also attempting to use devices in a gitlab-runner via the podman socket, and something in the chain seems to be trying to stat the device line. That is, I get this error during a run:

ERROR: Job failed (system failure): prepare environment: Error response from daemon: container create: stat nvidia.com/gpu=gpu0: no such file or directory (docker.go:659:0s). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

... this is somewhat more unsupported, however...

Output of podman version:

Client:       Podman Engine
Version:      4.3.0-rc1
API Version:  4.3.0-rc1
Go Version:   go1.18.1
Built:        Wed Dec 31 16:00:00 1969
OS/Arch:      linux/amd64

Output of podman info:

host:
  arch: amd64
  buildahVersion: 1.28.0-dev
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2:2.1.4-0ubuntu22.04+obs13.5_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: '
  cpuUtilization:
    idlePercent: 59.42
    systemPercent: 6.4
    userPercent: 34.19
  cpus: 12
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: journald
  hostname: bumblebee
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 165536
      size: 65536
  kernel: 5.15.0-48-generic
  linkmode: dynamic
  logDriver: journald
  memFree: 2298265600
  memTotal: 33545072640
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_1.6-0ubuntu22.04+obs46.2_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.6
      commit: 18cf2efbb8feb2b2f20e316520e0fd0b6c41ef4d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-0ubuntu22.04+obs10.4_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 753664
  swapTotal: 1023406080
  uptime: 289h 16m 2.00s (Approximately 12.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/simhoff/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 1
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/simhoff/.local/share/containers/storage
  graphRootAllocated: 981810135040
  graphRootUsed: 486783193088
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 291
  runRoot: /run/user/1001/containers
  volumePath: /home/simhoff/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.0-rc1
  Built: 0
  BuiltTime: Wed Dec 31 16:00:00 1969
  GitCommit: ""
  GoVersion: go1.18.1
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.0-rc1

Package info (e.g. output of rpm -q podman or apt list podman or brew info podman):

Listing... Done
podman/now 4:4.3.0-rc1-0ubuntu22.04+obs60.1 amd64 [installed,local]
podman/unknown 4:4.3.0-0ubuntu22.04+obs63.1 arm64
podman/unknown 4:4.3.0-rc1-0ubuntu22.04+obs60.1 armhf
podman/unknown 4:4.3.0-rc1-0ubuntu22.04+obs60.1 s390x

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes - package is from Kubic repo

Additional environment details (AWS, VirtualBox, physical, etc.):
Host OS: Ubuntu 22.04

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 19, 2022
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 21, 2022

@giuseppe PTAL

@giuseppe
Copy link
Member

first half of the fix: containers/common#1239

@giuseppe
Copy link
Member

and second half: #16580

giuseppe added a commit to giuseppe/libpod that referenced this issue Nov 25, 2022
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 9, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants