Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mkdirat(., ., 0) on virtiofs mount fails spuriously in containers running inside vfkit machine on macOS #24725

Closed
p0llard opened this issue Dec 1, 2024 · 3 comments
Labels
machine macos MacOS (OSX) related

Comments

@p0llard
Copy link

p0llard commented Dec 1, 2024

Issue Description

On macOS, Running mkdirat(., ., 0) (first arguments irrelevant, except that the parent directory is on a virtiofs mount) inside a continer fails with EACCES when a vfkit-based machine is used. I have not tested other types of machine (e.g. libkrun).

This issue arises when running useradd --create-home in debian:bookworm-20240513-slim, which is used as part of the Redox build process here. useradd uses mkdir (rather than mkdirat) here and suffers the same failure.

I believe the underlying issue is with Apple's virtiofs implementation; I have FB16008360 open with Apple to track this. Please see "Additional information" below.

Steps to reproduce the issue

Steps to reproduce the issue

  1. [Optional:] podman machine reset && podman machine init
  2. podman machine start
  3. podman machine ssh
  4. [Inside machine:] cd /Users/${MACOS_USERNAME}
  5. [Inside machine:] mkdir -m0 foo
  6. [Inside machine:] ls

Describe the results you received

mkdir fails with a permissions error, but the directory is successfully created; strace, not present on the FCOS image, shows the mkdirat syscall failing with -EACCES).

core@localhost:/Users/jpollard$ mkdir -m0 foo
mkdir: cannot create directory ‘foo’: Permission denied
core@localhost:/Users/jpollard$ ls
[...]
foo 
[...]

Describe the results you expected

Either:

  1. There is no error; or
  2. The directory is not created due to the error.

podman info output

host:
  arch: arm64
  buildahVersion: 1.38.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-2.fc40.aarch64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 94.52
    systemPercent: 1.63
    userPercent: 3.84
  cpus: 7
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: coreos
    version: "40"
  eventLogger: journald
  freeLocks: 2045
  hostname: localhost.localdomain
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
    uidmap:
    - container_id: 0
      host_id: 501
      size: 1
    - container_id: 1
      host_id: 100000
      size: 1000000
  kernel: 6.11.3-200.fc40.aarch64
  linkmode: dynamic
  logDriver: journald
  memFree: 4222513152
  memTotal: 15561142272
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-2.fc40.aarch64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1.fc40.aarch64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.17-1.fc40.aarch64
    path: /usr/bin/crun
    version: |-
      crun version 1.17
      commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
      rundir: /run/user/501/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240906.g6b38f07-1.fc40.aarch64
    version: |
      pasta 0^20240906.g6b38f07-1.fc40.aarch64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: unix:///run/user/501/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-2.fc40.aarch64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 0
  swapTotal: 0
  uptime: 2h 41m 16.00s (Approximately 0.08 days)
  variant: v8
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 106769133568
  graphRootUsed: 16266203136
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 21
  runRoot: /run/user/501/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 5.3.1
  Built: 1732147200
  BuiltTime: Thu Nov 21 00:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.7
  Os: linux
  OsArch: linux/arm64
  Version: 5.3.1

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

Apple Silicon M3 Max (aarch64), Sequoia 15.1.1.

Additional information

To try to isolate where exactly the EACCES comes from, I ran Alpine Linux Virt 3.20 in vfkit directly. I built a custom Kernel to add some additional pr_debug instrumentation around the relevant parts of the FUSE and virtiofs implementations.

This patch was applied to Linux v6.6.63.

A vfkit VM running Alpine Linux with the patched Kernel was run with --device virtio-fs,sharedDir="${SCRATCH_DIR}"/,mountTag=vfkit-share, and the following experiment performed:

vm:~# echo 'func fuse_simple_request +pf` > /sys/kernel/debug/dynamic_debug/control
vm:~# echo 'func virtio_fs_pending_and_unlock +p' >  /sys/kernel/debug/dynamic_debug/control
vm:~# cat ~/mkdirat_test.c
#include <errno.h>
#include <fnctl.h>
#include <sys/stat.h>

int main() {
    int ret = mkdirat(AT_FDCWD, "foo", 0);
    return (ret ? errno : 0);
}
vm:~# gcc -Werror -Wpedantic mkdirat_test.c -o mkdirat_test
vm:~# mount vfkit-share -t virtiofs /mnt
vm:~# cd /mnt
vm:/mnt# ~/mkdirat_test ; echo $?
13
vm:/mnt# dmesg | grep -A1 'opcode 9'
[  1131.542345] virtio_fs_wake_pending_and_unlock: opcode 9 unique 0x44 nodeid 0x1 in.len 52 out.len 128
[  1131.542632] fuse_simple_request: fuse: unique 0x44 error -13
vm:/mnt# grep EACCES /usr/include/bits/errno.h
#define EACCES          13

We see that the source of the EACCES error is the FUSE server backing the virtiofs mount; in this context, the server is the macOS Virtualization Framework via a VZVirtioFileSystemDeviceConfiguration, so I have raised an issue with Apple.

Note that the mkdirat_test is necessary rather than just running mkdir -m0 foo because Alpine's busybox mkdir does a two stage mkdir/chmod rather than using mkdirat(., ., 0).

@p0llard p0llard added the kind/bug Categorizes issue or PR as related to a bug. label Dec 1, 2024
@github-actions github-actions bot added the remote Problem is in podman-remote label Dec 1, 2024
@Luap99 Luap99 added macos MacOS (OSX) related machine and removed remote Problem is in podman-remote labels Dec 2, 2024
@Luap99
Copy link
Member

Luap99 commented Dec 2, 2024

I would assume this is the same issue as I already described here: #23018

If anything chmod 0 any file/dir it will not longer be accessible as we launch the VM (thus virtofs) as user so there is no way to read such a file on the host.

@p0llard
Copy link
Author

p0llard commented Dec 3, 2024

I would assume this is the same issue as I already described here: #23018

If anything chmod 0 any file/dir it will not longer be accessible as we launch the VM (thus virtofs) as user so there is no way to read such a file on the host.

They may share a common cause somewhere inside Apple's virtiofs implementation, but I believe the manifestation in the VM is different: in your case, you find that once a file has a mode of 0, it's impossible to access it; in my case, I find that creating a file with a mode of 0 claims to fail, even though it succeeds.

I could imagine that Apple's FUSE implementation of mkdir involves several APFS operations, including (for example) a final stat after the file has been created, and this stat fails for the same reason as your issue, leading the overall mkdir operation to appear to fail. If this is the case, then I think there is still a separate bug on Apple's side, namely that the mkdir operation isn't sufficiently atomic.

@Luap99
Copy link
Member

Luap99 commented Dec 3, 2024

I mean sure but fundamentally this is up to the apple implementation. I don't see what podman machine can do about that so I move this to discussion

@Luap99 Luap99 removed the kind/bug Categorizes issue or PR as related to a bug. label Dec 3, 2024
@containers containers locked and limited conversation to collaborators Dec 3, 2024
@Luap99 Luap99 converted this issue into discussion #24751 Dec 3, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
machine macos MacOS (OSX) related
Projects
None yet
Development

No branches or pull requests

2 participants