Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error from newuidmap: newuidmap: write to uid_map failed: Operation not permitted #3834

Closed
brianjmurrell opened this issue Mar 22, 2022 · 21 comments

Comments

@brianjmurrell
Copy link

Description

Trying to build a container results in an error:

$ id
uid=11412345(brian) gid=11412345(brian) groups=11412345(brian),1536800000(admins),1536800003(internal_users)

$ podman build --log-level debug --build-arg UID=$(id -u) -t chrootbuild -f packaging/Dockerfile.mockbuild .
INFO[0000] podman filtering at log level debug          
DEBU[0000] Called build.PersistentPreRunE(podman build --log-level debug --build-arg UID=11412345 -t chrootbuild -f packaging/Dockerfile.mockbuild .) 
DEBU[0000] Merged system config "/usr/share/containers/containers.conf" 
DEBU[0000] Using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /home/brian/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] systemd-logind: Unknown object '/'.          
DEBU[0000] Using graph driver overlay                   
DEBU[0000] Using graph root /home/brian/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/11412345/containers 
DEBU[0000] Using static dir /home/brian/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/11412345/libpod/tmp  
DEBU[0000] Using volume path /home/brian/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] Not configuring container store              
DEBU[0000] Initializing event backend journald          
DEBU[0000] configured OCI runtime runc initialization failed: no valid executable found for OCI runtime runc: invalid argument 
DEBU[0000] configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument 
DEBU[0000] configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument 
DEBU[0000] Using OCI runtime "/usr/bin/crun"            
INFO[0000] Found CNI network podman (type=bridge) at /home/brian/.config/cni/net.d/87-podman.conflist 
DEBU[0000] Default CNI network name podman is unchangeable 
INFO[0000] Setting parallel job count to 169            
DEBU[0000] error from newuidmap: newuidmap: write to uid_map failed: Operation not permitted 
ERRO[0000] invalid internal status, try resetting the pause process with "podman system migrate": cannot setup namespace using newuidmap: exit status 1 

Steps to reproduce the issue:
Not entirely sure. Simply tried to build a container as above.

Describe the results you received:
As above.

Describe the results you expected:
Container build works just as it does successfully on a different Fedora 35 system where my uid is 1001

Output of rpm -q buildah or apt list buildah:

$ rpm -q buildah
package buildah is not installed

(but I came here because https://github.com/containers/podman/issues/new says:

If you are filing a bug against podman build, please instead file a bug
against Buildah (https://github.com/containers/buildah/issues). Podman build
executes Buildah to perform container builds, and as such the Buildah
maintainers are best equipped to handle these bugs.)

Output of buildah version:

$ buildah version
-bash: buildah: command not found

Output of podman version if reporting a podman build issue:

$ podman version
ERRO[0000] invalid internal status, try resetting the pause process with "podman system migrate": cannot setup namespace using newuidmap: exit status 1 

I have of course tried to run podman system migrate as suggested.

Output of cat /etc/*release:

Fedora release 35 (Thirty Five)
NAME="Fedora Linux"
VERSION="35 (Thirty Five)"
ID=fedora
VERSION_ID=35
VERSION_CODENAME=""
PLATFORM_ID="platform:f35"
PRETTY_NAME="Fedora Linux 35 (Thirty Five)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:35"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f35/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=35
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=35
PRIVACY_POLICY_URL="https://fedoraproject.org/wiki/Legal:PrivacyPolicy"

Output of uname -a:

Linux node.example.com 5.14.10-300.fc35.x86_64 #1 SMP Thu Oct 7 20:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

# This file is is the configuration file for all tools
# that use the containers/storage library. The storage.conf file
# overrides all other storage.conf files. Container engines using the
# container/storage library do not inherit fields from other storage.conf
# files.
#
#  Note: The storage.conf file overrides other storage.conf files based on this precedence:
#      /usr/containers/storage.conf
#      /etc/containers/storage.conf
#      $HOME/.config/containers/storage.conf
#      $XDG_CONFIG_HOME/containers/storage.conf (If XDG_CONFIG_HOME is set)
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver, Must be set for proper operation.
driver = "overlay"

# Temporary storage location
runroot = "/run/containers/storage"

# Primary Read/Write location of container storage
# When changing the graphroot location on an SELINUX system, you must
# ensure  the labeling matches the default locations labels with the
# following commands:
# semanage fcontext -a -e /var/lib/containers/storage /NEWSTORAGEPATH
# restorecon -R -v /NEWSTORAGEPATH
graphroot = "/var/lib/containers/storage"


# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs.  Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"

# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file.  These ranges will be partitioned
# to containers configured to create automatically a user namespace.  Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids.  Note multiple UIDs will be
# squashed down to the default uid in the container.  These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = "false"

# Inodes is used to set a maximum inodes of the container image.
# inodes = ""

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

# Set to skip a PRIVATE bind mount on the storage home directory.
# skip_mount_home = "false"

# Size is used to set a maximum size of the container image.
# size = ""

# ForceMask specifies the permissions mask that is used for new files and
# directories.
#
# The values "shared" and "private" are accepted.
# Octal permission masks are also accepted.
#
#  "": No value specified.
#     All files/directories, get set with the permissions identified within the
#     image.
#  "private": it is equivalent to 0700.
#     All files/directories get set with 0700 permissions.  The owner has rwx
#     access to the files. No other users on the system can access the files.
#     This setting could be used with networked based homedirs.
#  "shared": it is equivalent to 0755.
#     The owner has rwx access to the files and everyone else can read, access
#     and execute them. This setting is useful for sharing containers storage
#     with other users.  For instance have a storage owned by root but shared
#     to rootless users as an additional store.
#     NOTE:  All files within the image are made readable and executable by any
#     user on the system. Even /etc/shadow within your image is now readable by
#     any user.
#
#   OCTAL: Users can experiment with other OCTAL Permissions.
#
#  Note: The force_mask Flag is an experimental feature, it could change in the
#  future.  When "force_mask" is set the original permission mask is stored in
#  the "user.containers.override_stat" xattr and the "mount_program" option must
#  be specified. Mount programs like "/usr/bin/fuse-overlayfs" present the
#  extended attribute permissions to processes within containers rather then the
#  "force_mask"  permissions.
#
# force_mask = ""

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""

# metadata_size is used to set the `pvcreate --metadatasize` options when
# creating thin devices. Default is 128k
# metadata_size = ""

# Size is used to set a maximum size of the container image.
# size = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"
$ cat /etc/sub[ug]id
vagrant:100000:65536
autotest:165536:65536
brian:100000:65536
vagrant:100000:65536
autotest:165536:65536
brian:100000:65536
@giuseppe
Copy link
Member

could you try these commands to confirm newuidmap and newgidmap work?

$ unshare -U sleep 100 &
$ newuidmap $! 0 $(id -u) 1 1 100000 65536
$ newgidmap $! 0 $(id -g) 1 1 100000 65536

@brianjmurrell
Copy link
Author

$ unshare -U sleep 100 &
[1] 263758
$ newuidmap $! 0 $(id -u) 1 1 100000 65536
newuidmap: write to uid_map failed: Operation not permitted
$ newgidmap $! 0 $(id -g) 1 1 100000 65536
newgidmap: write to gid_map failed: Operation not permitted

Just for good measure:

$ sestatus
SELinux status:                 disabled

so not an SELinux problem at least.

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

Prior versions of dnf required a reinstalled shadow-utils package to make sure newuidmap and newgidmap got their filecap set.

In quay.io/buildah/stable we do: rpm --restore shadow-utils to fix the permissions on these files.

$  grep shadow contrib/buildahimage/stable/Dockerfile 
RUN useradd build; yum -y update; rpm --restore shadow-utils 2>/dev/null; yum -y install buildah fuse-overlayfs xz --exclude container-selinux; rm -rf /var/cache /var/log/dnf* /var/log/yum.*;

@brianjmurrell
Copy link
Author

Interesting.

$ sudo getcap $(which newuidmap)
$ sudo rpm --restore shadow-utils
$ sudo getcap $(which newuidmap)
/usr/bin/newuidmap cap_setuid=ep

Is this some kind of bug which will be resolved at some point to not require this hackery?

Unfortunately resolving this has just led to another error:

DEBU[0021] Error pulling candidate registry.fedoraproject.org/fedora:35: writing blob: adding layer with blob "sha256:9c6cc34637169910926efbf525620fc39873beb0b2b3ba9fdf30d8662a38e407": Error processing tar file(exit status 1): lchown /var/spool/mail: operation not permitted 

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

Yes there is something wrong with the imagebuilder that is building the base images. I am surprised it has not been fixed yet.

The lchown issue, means that the UID is not allowed to be changed, either the UID range in the user namespace is not covered or the underlying file system is not allowing the chown.

@brianjmurrell
Copy link
Author

The lchown issue, means that the UID is not allowed to be changed, either the UID range in the user namespace is not covered or the underlying file system is not allowing the chown.

Any thoughts on how which of those problems is determined to be the cause and the mitigation?

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

--build-arg UID=$(id -u) is probably causing the issue, if this UID is not available in the user namespace.

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

Just a guess, but UID 11412345 does not exists within the user namespace and if this UID is attempted to be put on disk the chown will fail.

@brianjmurrell
Copy link
Author

So if your guess is correct, how do I make that UID available in the user namespace?

I'm afraid I have not really had time to wrap my head around all of this namespace stuff yet. :-(

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

Could you try to build in rootfull mode? What are you doing with the UID? BTW Your UID is in the container, it is just mapped to root.

@brianjmurrell
Copy link
Author

The UID in the container is being used to add a user. I.e.: RUN useradd -u $UID -ms /bin/bash $USER.

Running the podman command as above with sudo (rootfull, as I understand it), the container does continue to build but gets an error at it's first RUN command:

STEP 4/14: RUN set -x; ls -l
error running container: error from /usr/bin/crun creating container for [/bin/sh -c set -x; ls -l]: sd-bus call: Transport endpoint is not connected: Transport endpoint is not connected
: exit status 1
Error: error building at STEP "RUN set -x; ls -l": error while running runtime: exit status 1

Although frankly if that's just a side-effect of running rootful, then we could disregard it as hopefully running rootful has provided the information you need and we can move on to getting it to run rootless again?

@rhatdan
Copy link
Member

rhatdan commented Mar 23, 2022

If you are running the container with Podman, there is no need to add that user.

Podman will do it automatically with a command like:

$ podman run --userns=keep-id -v $HOME:$HOME --workdir $HOME fedora grep dwalsh /etc/passwd
dwalsh:*:3267:3267:Daniel J Walsh:/home/dwalsh:/bin/sh

@giuseppe
Copy link
Member

yes please drop --build-arg UID=$(id -u) because that ID is not available inside the user namespace.

I am closing the issue since this failure is expected, but please keep the discussion going and add more comments.

@brianjmurrell
Copy link
Author

If you are running the container with Podman, there is no need to add that user.

Following along here: https://rpm-software-management.github.io/mock/#mock-inside-podman-fedora-toolbox-or-docker-container

it instructs to add a mockbuilder user and add that user to the mock group.

yes please drop --build-arg UID=$(id -u) because that ID is not available inside the user namespace.

AFAIU, that is needed when creating the user above so that a directory mapped into the container (with -v) has the same ownership of files inside the container as outside. At least on Docker, and this Dockerfile needs to be able to work with Docker also as all of the platforms we are using don't support podman.

And this works elsewhere, where $(id -u) is, say, 1001. Why is it not working with a bigger UID?

@rhatdan
Copy link
Member

rhatdan commented Mar 24, 2022

You should just create a user within your container and set it to that UID like 1000 but it has no relationship the the UID assigned to the user do the build.

In rootless containers, you only get 65k UIDs, so specifying a UID > 65k will not work.

@brianjmurrell
Copy link
Author

You should just create a user within your container and set it to that UID like 1000 but it has no relationship the the UID assigned to the user do the build.

But then (with Docker at least -- recall the requirement to be compatible with Docker as podman is not ubiquitous yet) running the container with, say, -v $HOME:$HOME does not allow the user in the container to access $HOME with the same permissions as on the host system. Reading some things could be allowed, but almost assuredly, writing will not be.

@rhatdan
Copy link
Member

rhatdan commented Mar 24, 2022

We change the UIDs within the podman machine to match those of the host, so that processes running by the user "podman containers" can write to volumes from the homedir.

@brianjmurrell
Copy link
Author

I'm not seeing that here. Creating the container with the default UID value of 1000 and then trying to use it:

$ id; podman run -ti -v $HOME:$HOME -w $HOME chrootbuild bash -c "id; touch foobarbat;ls -l foobarbat; pwd; ls -ld ."
uid=1001(brian) gid=1001(brian) groups=1001(brian),10(wheel),135(mock),962(power),968(vagrant),976(wireshark),986(libvirt),1002(docker) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
uid=1000(build) gid=1000(build) groups=1000(build),135(mock)
touch: cannot touch 'foobarbat': Permission denied
ls: cannot access 'foobarbat': No such file or directory
/home/brian
drwxr-xr-x. 276 root root 36864 Mar 24 18:24 .

Or maybe I am. I don't really know any more as I am so confused now. I really just want to be able to start a container, map my $HOME into it and operate in it as if I were not in a container, just like I can do with Docker given this Dockerfile and building a container from it with --build-arg=$(id -u) where my UID is 11412345:

FROM fedora:35

RUN dnf -y install mock make \
                   rpm-build createrepo rpmlint redhat-lsb-core git \
                   python-srpm-macros rpmdevtools mock-core-configs\ \<\ 37.1

ARG UID=1000

ENV USER build
ENV PASSWD build
RUN useradd -u $UID -ms /bin/bash $USER
RUN echo "$USER:$PASSWD" | chpasswd
RUN usermod -a -G mock $USER

RUN dnf -y upgrade --exclude mock-core-configs && \
    dnf clean all

USER build

i.e.:

$ id; docker run -it -v $PWD:$PWD -w $PWD test bash -c "id; touch foobarbat; ls -l foobarbat"
uid=11412345(brian) gid=11412345(brian) groups=11412345(brian),496(docker),1536800000(admins),1536800003(internal_users)
uid=11412345(build) gid=1000(build) groups=1000(build),135(mock)
-rw-r--r-- 1 build build 0 Mar 24 19:19 foobarbat
/home/brian
drwxr-xr-x 24 build 11412345 8192 Mar 24 19:35 .

podman's limitation of not being able to use my large UID in the container seems to be the road-block here, so what is the solution?

@rhatdan
Copy link
Member

rhatdan commented Mar 24, 2022

@giuseppe thoughts?

BTW Podman can do exactly what Docker does, which is run as root. The issues you are seeing is caused by the User Namespace. If you ran docker in rootless mode, you would have the exact same issue.

@giuseppe I think we should have
$ podman unshare --userns=keep-id cat /proc/self/uid_map

To allow users to see what that user namespace looks like.

@brianjmurrell
Copy link
Author

I seem to have a solution that works in both environments with both high and low UIDs:

FROM fedora:35
RUN dnf -y install mock make \
                   rpm-build createrepo rpmlint redhat-lsb-core git \
                   python-srpm-macros rpmdevtools mock-core-configs\ \<\ 37.1
ARG UID=1000
ENV USER build
ENV PASSWD build
RUN useradd -u $UID -ms /bin/bash $USER
RUN echo "$USER:$PASSWD" | chpasswd
RUN usermod -a -G mock $USER
RUN dnf -y upgrade --exclude mock-core-configs && \
    dnf clean all

Podman

$ podman build -t chrootbuild -f Dockerfile
$ podman run --rm --privileged -w $(HOME) -v=$(HOME):$(HOME) -it chrootbuild ...

Docker

$ docker build --build-arg UID=$(id -u) -t chrootbuild -f Dockerfile
$ docker run --user=$$(id -u) --privileged=true -w $(HOME) -v=$(HOME):$(HOME) -it chrootbuild ...

where with Podman, my UID is 1001 and on the Docker machine, my UID is 11412345.

But boy this was a long haul to get here.

Any thoughts about all of this?

@giuseppe
Copy link
Member

could you give a try to podman run --userns=keep-id instead of specifying the --user manually? That will setup the user namespace in a way to map your user to the same ID inside the container. It is a shortcut for doing it manually with --user and --uidmap

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants