Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount point ownership not consistent with Docker's behaviour #19652

Open
matejvasek opened this issue Aug 16, 2023 · 57 comments
Open

Mount point ownership not consistent with Docker's behaviour #19652

matejvasek opened this issue Aug 16, 2023 · 57 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue

Comments

@matejvasek
Copy link
Contributor

matejvasek commented Aug 16, 2023

Issue Description

When mounting a volume into a container the mountpoint directory should preserve it's ownership. This seems to work only for very first run/mount. Subsequent mounts have altered ownership of mountpoints directory (to the ownership set by first mounter).

This happens at least podman v4.3 -- v4.6.

Steps to reproduce the issue

Run following script against podman docker compat socket:

#!/bin/sh

set -e

cat <<EOF > Dockerfile.usera
FROM alpine
USER root
RUN mkdir -p /workspace
RUN chown 1001:1002 /workspace
USER 1001:1002
EOF

cat <<EOF > Dockerfile.userb
FROM alpine
USER root
RUN mkdir -p /workspace
RUN chown 1003:1004 /workspace
USER 1003:1004
EOF

docker build -q . -f Dockerfile.usera -t alpine-usera
docker build -q . -f Dockerfile.userb -t alpine-userb
docker volume rm test-volume || true
docker run --rm -v test-volume:/workspace alpine-usera sh -c 'echo done'
# command below fails on podman because of permissions
docker run --rm -v test-volume:/workspace alpine-userb sh -c 'touch /workspace/b'
docker volume rm test-volume || true

Describe the results you received

The script exits with non-zero exit code and error message.

touch: /workspace/b: Permission denied

Describe the results you expected

The script exits with 0 exit code.

podman info output

host:
  arch: amd64
  buildahVersion: 1.31.2
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc37.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 96.82
    systemPercent: 0.74
    userPercent: 2.44
  cpus: 16
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: workstation
    version: "37"
  eventLogger: journald
  freeLocks: 2011
  hostname: rigel
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.4.9-100.fc37.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 942694400
  memTotal: 67101196288
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.7.0-1.fc37.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.7.0
    package: netavark-1.7.0-1.fc37.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: crun-1.8.6-1.fc37.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.6
      commit: 73f759f4a39769f60990e7d225f561b4f4f06bcf
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20230625.g32660ce-1.fc37.x86_64
    version: |
      pasta 0^20230625.g32660ce-1.fc37.x86_64
      Copyright Red Hat
      GNU Affero GPL version 3 or later <https://www.gnu.org/licenses/agpl-3.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-8.fc37.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 1392640
  swapTotal: 8589930496
  uptime: 89h 60m 31.00s (Approximately 3.71 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: quay.io/mvasek
      PullFromMirror: ""
    Prefix: docker.io
    PullFromMirror: ""
  localhost:50000:
    Blocked: false
    Insecure: true
    Location: localhost:50000
    MirrorByDigestOnly: false
    Mirrors: null
    Prefix: localhost:50000
    PullFromMirror: ""
  search:
  - example.com
store:
  configFile: /home/mvasek/.config/containers/storage.conf
  containerStore:
    number: 10
    paused: 0
    running: 1
    stopped: 9
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/mvasek/.local/share/containers/storage
  graphRootAllocated: 1022488477696
  graphRootUsed: 409338372096
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 65
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/mvasek/.local/share/containers/storage/volumes
version:
  APIVersion: 4.6.2-dev
  Built: 1692205486
  BuiltTime: Wed Aug 16 19:04:46 2023
  GitCommit: 8183ba8b256442910154d4d264deac9d12242eae
  GoVersion: go1.20.2
  Os: linux
  OsArch: linux/amd64
  Version: 4.6.2-dev

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

I tested this on rootless but I believe the same thing happens for privileged too.

Additional information

Happens always.

@matejvasek matejvasek added the kind/bug Categorizes issue or PR as related to a bug. label Aug 16, 2023
@giuseppe
Copy link
Member

I see the ownership is kept if you delete the volume before running the second command.

Does docker automatically delete the volume when the first container exits?

@matejvasek
Copy link
Contributor Author

matejvasek commented Aug 17, 2023

@giuseppe no the volume persists.

@matejvasek
Copy link
Contributor Author

Another way to reproduce: try building an app using pack CLI with podman and untrusted builder.

@matejvasek
Copy link
Contributor Author

@giuseppe but you might be onto something: the ownership behaves differently the moment I try to write something into the volume.

@matejvasek
Copy link
Contributor Author

Maybe I isolated the bug in wrong way, but there's definitely some issues with volume mounting.
The pack CLI does application build in multiple containers that share some data via volumes. With Docker it works with podman it fails because of ownership issues.

@matejvasek
Copy link
Contributor Author

matejvasek commented Aug 17, 2023

  1. Create simple Go app (e.g hello world).
  2. Run pack build my-go-app -Bghcr.io/knative/builder-jammy-full:latest --docker-host=inherit --trust-builder=0.
  3. Build fails because permission on shared volume.

@matejvasek
Copy link
Contributor Author

@giuseppe try running:

#!/bin/sh

set -e

cat <<EOF > Dockerfile.usera
FROM alpine
USER root
RUN mkdir -p /workspace
RUN chown 1001:1002 /workspace
USER 1001:1002
EOF

cat <<EOF > Dockerfile.userb
FROM alpine
USER root
RUN mkdir -p /workspace
RUN chown 1003:1004 /workspace
USER 1003:1004
EOF

docker build -q . -f Dockerfile.usera -t alpine-usera
docker build -q . -f Dockerfile.userb -t alpine-userb
docker volume rm test-volume || true
docker run --rm -v test-volume:/workspace alpine-usera sh -c 'echo done'
docker run --rm -v test-volume:/workspace alpine-userb sh -c 'touch /workspace/b'
docker volume rm test-volume || true

With docker it works but on podman it fails.

@matejvasek
Copy link
Contributor Author

matejvasek commented Aug 17, 2023

@giuseppe note that if the first container actually tried to write to /workspace/ it would fail with Moby too. But in our usecase the first container uses the volume as read only. Although it may not declare it via :ro

@giuseppe
Copy link
Member

@mheon do we need to change ownership every time we use the volume in a container?

@mheon
Copy link
Member

mheon commented Aug 17, 2023

I have to assume we added that code for a reason, but I can't recall exactly why. Almost certainly a bugfix, but exactly what was being fixed is unclear. The exact on-mount behavior for volumes versus Docker has been a persistent problem.

@matejvasek
Copy link
Contributor Author

fyi in the past even the very first mounting container had bad ownership, see #10905

@mheon
Copy link
Member

mheon commented Aug 17, 2023

I actually cannot find an explicit chown of the volume mountpoint anywhere in the mount code. So I'm actually not 100% on where this is being done; it may be an unintentional side-effect of another chown doing something else?

@matejvasek
Copy link
Contributor Author

Looks like the chown is called only when volume if brand new -- created together with a new container.

@matejvasek
Copy link
Contributor Author

// Make sure the new volume matches the permissions of the target directory.
// https://github.com/containers/podman/issues/10188
st, err := os.Lstat(filepath.Join(c.state.Mountpoint, v.Dest))
if err == nil {
if stat, ok := st.Sys().(*syscall.Stat_t); ok {
if err := os.Lchown(mountPoint, int(stat.Uid), int(stat.Gid)); err != nil {
return err
}
}

@matejvasek
Copy link
Contributor Author

wrt:

docker run --rm -v test-volume:/workspace alpine-usera sh -c 'echo done'
docker run --rm -v test-volume:/workspace alpine-userb sh -c 'touch /workspace/b'

It appears that chow is called only for the first container.

@matejvasek
Copy link
Contributor Author

there is some state vol.state.NeedsChown
I assume this ensures that chown is done once?

@matejvasek
Copy link
Contributor Author

The vol.state.NeedsChown seems to be set on the first chown done by the first container, so subsequent containers won't chown it.

@matejvasek
Copy link
Contributor Author

@giuseppe how important is vol.state.NeedsChown?

@matejvasek
Copy link
Contributor Author

if vol.state.NeedsChown && (!vol.UsesVolumeDriver() && vol.config.Driver != "image") {
vol.state.NeedsChown = false

@mheon
Copy link
Member

mheon commented Aug 17, 2023

Ah, ok, per @matejvasek it's fixVolumePermissions()

Looking further, it's tied to a bool, NeedsChown, in the volume config. Set to true at volume create, false once permissions have been fixed during first mount into a container. Dropping the bool entirely and making the chown unconditional ought to fix this?

@matejvasek
Copy link
Contributor Author

@mheon I believe it will fix the issue, but I don't know if it could have any adverse effects.

@rhatdan
Copy link
Member

rhatdan commented Aug 17, 2023

It is not doing a recursive chown, correct? I think the goal there was to make sure the volume is owned by the primart user of the container. I think I had a PR on this code at one point to attempt to change it, but I gave up.
#16782

@matejvasek
Copy link
Contributor Author

matejvasek commented Aug 17, 2023

make sure the volume is owned by the primart user of the container.

Small correction: primary user uid/gid is used only if the mount point does not already exist in the container. If the mount point exist (as directory) then uid/gid of the directory shall be used.

@matejvasek
Copy link
Contributor Author

Setting ownership just once makes sense if you assume that the volume will be always used just by single container.
However that's not my case the pack CLI runs multiple containers in sequence on one volume.

@mheon
Copy link
Member

mheon commented Aug 17, 2023

If there's a reason we added this originally, it's Docker compat. If Docker doesn't do the same thing, they that reason is not valid.

@matejvasek
Copy link
Contributor Author

If there's a reason we added this originally, it's Docker compat. If Docker doesn't do the same thing, they that reason is not valid.

What you mean by this here? The fact that we do chown, or the fact that we do it only once?

@mheon
Copy link
Member

mheon commented Aug 17, 2023

Only once. There's no reason we'd add such complexity other than to match Docker

@matejvasek
Copy link
Contributor Author

matejvasek commented Aug 18, 2023

Looks like NeedsChown was introduced in #6747 which was fixing #5698.

@matejvasek
Copy link
Contributor Author

The issue does not directly mention Docker.

@matejvasek
Copy link
Contributor Author

@giuseppe do you recall why NeedsChown was needed?

@giuseppe
Copy link
Member

I don't remember more than what is in the original issue.

Also, how does it work on Docker if multiple containers use the same volume? It is chown'ed to the last user?

@matejvasek
Copy link
Contributor Author

Also, how does it work on Docker if multiple containers use the same volume? It is chown'ed to the last user?

Yes, it appears so. It is chowned to the uid/gid of the mountpoint for each container.

@matejvasek
Copy link
Contributor Author

See the test in bug description. The second container can create a file in mountpoint directory. It works on Docker, but not on podman.

@giuseppe
Copy link
Member

but what happens if you keep the first container running?

Does the second container override the first chown?

@matejvasek
Copy link
Contributor Author

It runs in series. Not in parallel.

@matejvasek
Copy link
Contributor Author

My use case is sequential.

@matejvasek
Copy link
Contributor Author

Does the second container override the first chown?

But yes, that's what appears to happen.

@giuseppe
Copy link
Member

giuseppe commented Aug 21, 2023

sure but if we are going to change the current behavior to match docker compatibility we need to address all the cases, not just one.

What I am arguing is that the docker mechanism of always chowning, is subject to race conditions:

# (sleep 0.5; docker run --rm -v test-volume:/workspace alpine-userb true) & docker run  --rm -v test-volume:/workspace alpine-usera sh -c 'stat -c %u:%g /workspace ; sleep 1; stat -c %u:%g /workspace' 
1001:1002
1003:1004

The ownership of a volume can change while a container is using it.

The Podman behavior makes more sense, we can probably just document it and not aim for compatibility

@matejvasek
Copy link
Contributor Author

is subject to race conditions:

Well if somebody runs two containers at same time on the same volume that they had it coming.
Also even in current form this is a problem: only one of the containers "wins" and forces it's ownership.

@matejvasek
Copy link
Contributor Author

@giuseppe I'll try to modify pack CLI to work even with podman behaviour. Will see if it's possible.

@matejvasek
Copy link
Contributor Author

The thing is that pack CLI first run root owned container on the volume. All subsequent containers then cannot create files in the volume as a consequence.

@mheon
Copy link
Member

mheon commented Aug 21, 2023

I think that we should probably change to match Docker here - IE, unconditional chown on mount into a container, even if the volume is in use by another container. It's racy, but that's sort of expected - using different UID/GID containers with the same volume is always going to be questionable

@Luap99
Copy link
Member

Luap99 commented Aug 22, 2023

If we do it then we should add some option to persevere the current behaviour. I can easily see this as big performance hit if we start to chown on each container start, consider volumes with many files.

@matejvasek
Copy link
Contributor Author

If we do it then we should add some option to persevere the current behaviour. I can easily see this as big performance hit if we start to chown on each container start, consider volumes with many files.

In't the chown non recursive?

@matejvasek
Copy link
Contributor Author

I believe in the past it was recursive but now it is not, right?

@Luap99
Copy link
Member

Luap99 commented Aug 22, 2023

Ah you are right the normal volume is not recursive. Then this should not be a problem, although it seems very weird to not chown recursively. If the first container created files in the volume we just chown the parent directory how is the second container supposed to read the files?

@mheon
Copy link
Member

mheon commented Aug 22, 2023

I don't think that's expected to work. We want the new container to at least be able to write to the mounted volume. If the goal is multiple containers which use different UIDs and GIDs to be able to read from the same volume, I don't know if that's possible right now, nor do I think it's necessarily expected.

@rhatdan
Copy link
Member

rhatdan commented Aug 22, 2023

It should not recursively chown. Although we do have :U which I think recursively chown.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@392781
Copy link

392781 commented Feb 8, 2024

This is still an issue, when I mount a local directory to a container volume, the container does not have access to make edits to the local directory...

The solution that I unfortunately have to resort to is using --user root + -e GRANT_SUDO=yes which creates a separate problem of the local user not being able to edit files that get created while inside the container.

@rhatdan
Copy link
Member

rhatdan commented Feb 8, 2024

Please specify the Podman command you are using? If this is an SELinux machine you would need to use the :Z or :z option on the volume mount.

@392781
Copy link

392781 commented Feb 8, 2024

Please specify the Podman command you are using? If this is an SELinux machine you would need to use the :Z or :z option on the volume mount.

Here is the barebones configuration I am using along with VS Code's devcontainers extension:

Dockerfile:

FROM docker.io/jupyter/r-notebook:r-4.3.1

devcontainer.json

{
    "name": "test",
    "build": {
        "dockerfile": "Dockerfile"
    },
    "remoteUser": "jovyan",
    "containerUser": "jovyan",

    "workspaceMount": "source=${localWorkspaceFolder},target=/home/jovyan/work,type=bind,z",
    "workspaceFolder": "/home/jovyan/work",

        "overrideCommand": true,
    // Uncomment the next line if you want to keep your containers running after VS Code shuts down.
        "shutdownAction": "none"
}

Running ls -la while inside the container at the mounted workspace folder /home/jovyan/work results in all the files listed as created by root. The work/ dir itself is also owned by root.

@rhatdan
Copy link
Member

rhatdan commented Feb 12, 2024

You are running in rootless mode? If yes and you look at /home/joyvan/work directory outside of the container are the files owned by joyvan?

@mreilaender
Copy link

mreilaender commented Jul 19, 2024

Not sure if this is related but I have a similar issue when trying to build a maven based project using pack + podman (5.1.2) while bind mounting the .m2 directory to cache dependencies:

pack build --builder paketobuildpacks/builder:base \
--volume "$HOME"/.m2:/home/cnb/.m2:rw \
-e BP_MAVEN_BUILD_ARGUMENTS="-Dmaven.test.skip=true package" \
-e BP_MAVEN_BUILT_ARTIFACT="target/*-exec.jar" \
-e BP_JVM_VERSION="8" \
"localhost/test" -v \
--docker-host=inherit

The building fails in the mvn package phase during dependency resolution with:

java.nio.file.AccessDeniedException: /home/cnb/.m2/repository/...

Edit: $HOME"/.m2 are owned by the user I'm executing pack with:

drwxr-xr-x    4 user user      4096 Apr 23 15:04 .m2

@giuseppe
Copy link
Member

--volume "$HOME"/.m2:/home/cnb/.m2:rw \

do you need to relabel the volume? :z?

@mreilaender
Copy link

Already tried, doesn't help :/ Same Exception

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue
Projects
None yet
Development

No branches or pull requests

7 participants