Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UID and GID 999 are system IDs and are not to be used by non-default-distro processes such as container services #33

Open
bbruun opened this issue Sep 4, 2024 · 8 comments

Comments

@bbruun
Copy link

bbruun commented Sep 4, 2024

I was trying out the docker image and setting up docker-compose and kept getting errors about access. That OS distro is RHEL 8.

It appears that "you" use UID 999 (and 1000 for some reason) and GID 999 for the service inside the docker image.
That overlaps with the default <1000 is system user and groups only.

You set the UID/GID's here: https://github.com/search?q=repo%3Avalkey-io%2Fvalkey-container%20999&type=code

On e.g. RHEL 8 then

  • UID 999 is the systemd-coredump user managed and installed by systemd
  • GID 999 is the input group (I don't know which one uses it on a server).

Could you fix this mixup of valkey being a system service maintained by the distros and make the UID and GID larger than 1000 which is also the advised method for non system accounts. E.g. 65535 for both would be ideal as it is the default "nobody" group on systems.

@polarathene
Copy link

It is fine to use whatever UID/GID in the container. <1000 is for system services, anything above that can be assigned for something else on the host as well. You cannot always have a container with a single UID/GID when a distinction needs to be made for operation.

In a rootless container, the container itself can run as UID 0 (root) and be mapped to UID on the host that the host deems appropriate. Then within the container any volume mount has the UID mapped accordingly, which is probably the correct approach. Podman does this well with --uidmap + --gidmap options.

I haven't looked into what this image is doing, but it's not uncommon for containers that only need to persist data with a specific UID/GID to support making that runtime configurable. You could also use named volumes instead of bind mount volumes if you don't need direct file access in the host (should be fine for a DB).

@bbruun
Copy link
Author

bbruun commented Feb 17, 2025

The problem is (was as I've fixed it) that the container uses UID 999 in the container and you don't document it on https://hub.docker.com/r/valkey/valkey and it, in my case, conflicted with the systemd-bus-proxy user which on RHEL gets UID/GID 999/997 by default causing a conflict with systemd and your container when run as currently documented.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

If it somehow is a requirement for the container to run with UID 999 then please add it to the documentation of the container so that --uidmap/--gidmap or Docker's --user is part of the documentation.

I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Changing the UID from 999 to 65534 (or there abouts) is the easiest solution to avoid getting a CVE because of overlapping UID's with the host OS system which can grant more access if somehow it was possible to break out of the container and having a valid UID to work with.

@roshkhatri
Copy link
Member

Hey, Thank you so much for raising this issue and the information in this issue. To be frank it would be really helpful if you could raise a PR with a fix, that would also help other users to avoid getting a CVE.

is this the fix we are looking for?

addgroup -S -g 1000 valkey; \

# alpine already has a gid 999, so we'll use a nobody id
	addgroup -S -g 65534 valkey; \
	adduser -S -G valkey -u 65534 valkey

Also, previously redis container was also setup the same way, were we facing the same issue then?

@polarathene
Copy link

polarathene commented Feb 17, 2025

Might want to give this consideration:

Hello. Using a high uid/gid for files in the image requires reserving a lot of uids/gids per operating system user user when running docker rootless or podman rootless.

It would be more practical to keep nonroot to be 1000 or 1001. If no files are owned by nobody, then maybe it doesn't matter so much which uid does it have assigned.

NOTE: I've not investigated that myself, nor am I endorsing their suggestion to default to 1000/1001 instead. Just mentioning it as a datapoint 😅 (my personal preference is for images to default to 0:0 and support + document advice for --user or rootless containers, but I do understand why non-root is broadly adopted)


@bbruun @roshkhatri

Quick overview of 999:999 with popular database containers:


@bbruun

Could you fix this mixup of valkey being a system service maintained by the distros

I don't see how this is a mixup. It's perfectly normal with containers, I think you're just not familiar with that? If you insist that you know better, please cite images or resources that state this should/is the way it should be within containers (note that I cited links to various official Docker images that say otherwise).

I understand your concern from a sysadmin perspective when you don't take into consideration how it is with containers, that threw me off when I first got into working with containers myself. The thing is a container can vary in base image, so the UID/GID assignment by distro is not always in alignment, even with common system users/groups like bin,daemon,mail,uucp..

This UID/GID concern can vary across new releases of the same distro as well, but is more of an evident issue when an image installs packages that create users/groups, as the order can matter.

If your base image or your own were to change packages in a future build that would shift that install order, it can affect the UID/GID of the next image release made, such that any existing users persisted storage outside of the image is no longer in alignment with the UID/GID values for your containers users/groups, requiring manual fix by each affected user (or sometimes containers apply this at runtime with an entrypoint script).

Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

Keep your volume data persisted to a common location on the host system where this boundary is a non-issue, alternatively deploy with rootless or user namespace remapping (solves your issue properly). There is also ID Mapped mounts which rootful containers can leverage (Podman supports this, while Docker doesn't officially yet, it can be done manually IIRC).


make the UID and GID larger than 1000 which is also the advised method for non system accounts.

It's a system service, not one dependent upon a user session. Like the references I've provided above, they either explicitly create with 999 UID/GID or they have it implicitly by requesting system user/group during creation.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

If it somehow is a requirement for the container to run with UID 999 then please add it to the documentation of the container so that --uidmap/--gidmap or Docker's --user is part of the documentation.

I agree with you here, I wish images would document their UID/GID more visibly, especially when it's not configurable (without extending/customizing the image)


I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Besides the UID overlap, does it cause any actual problem in practice? I assume your concern is related to a container escape, similar to escape as a root user?

If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

The switch to a non-root user already results in all caps being dropped, thus the only time that becomes an issue is when the image modifies binaries with setcap to grant non-root users capabilities they'd otherwise not have (these are usually done with kernel enforcement check applied as well, rather than raising the capability for the process at runtime, thus if you drop the cap intentionally and don't use the feature, sadly the container fails to run as the dumb capability enforcement check prevents it).

Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

@polarathene
Copy link

@roshkhatri

is this the fix we are looking for?

You shouldn't be assigning a new user/group to the existing UID/GID already assigned to nobody/nogroup entries.

Personally I prefer containers to run as root (0:0) by default unless there's an actual need to switch to a non-root user. I know it goes against some "best practice" advice parroted around, but similar to VOLUME directive use, it often causes more problems.

The main value of the switch is for the convenience of a non-root user dropping all caps, so that the sysadmin deploying the container doesn't have to explicitly do so (drop caps, or change the container to run as a different UID/GID). Something that shouldn't be an issue if the sysadmin deployed with rootless containers instead, but I understand the precaution is taken given the broad audience that often doesn't know any better.

FWIW breakouts happen with non-root users too, and even if you are running in the container as nobody, this doesn't make you exempt from gaining access as root on the host when running rootful containers alright? All that requires is someone to misconfigure an image or container at runtime to enable the exploit, one of the obvious ones being access to the Docker socket.


If you want to go the extra mile, go get Valkey packaged as a slice for Canonical's chisel tool, so that the bare minimum for a container is installed and promote that as the default image (no shell, no package manager).

Then if Valkey can run as the nobody user by default, you can do that if you like, there's no need for a special valkey user if Valkey will be the only process running in the container, it has no relevance outside the container, where storage would be persisted only as UID/GID values (and show as whatever friendly text mapping exists in /etc/passwd + /etc/group on the host, if any does).

The sysadmin should ideally be able to use --user 0:0 (or whatever UID/GID pair they like) to run the image as that without issues. This is effectively the same as the user switch entrypoint, but is done by the container runtime prior to the entrypoint being run.


As I showed in my previous comment, Valkey is presently aligned with the same UID/GID as all other popular DB images. It would seem wise to be consistent there unless a proper discussion with the other projects can all agree on it being pragmatic to change away from that.

Personally the reasoning from @bbruun doesn't seem like good enough justification IMO, and I hope I've made that rather evident as to why with this verbose response.

@bbruun
Copy link
Author

bbruun commented Feb 19, 2025

Quick overview of 999:999 with popular database containers:

I understand this and I'm fully aware of this, but the valkey container just hit a ... nerve that overlapped with RHEL's systemd setup which the others don't (or haven't as of yet).

But just because "everyone else is doing it" then it does not make it the correct way or safe way or secure way to do it.

I don't see how this is a mixup. It's perfectly normal with containers, I think you're just not familiar with that? If you insist that you know better, please cite images or resources that state this should/is the way it should be within containers (note that I cited links to various official Docker images that say otherwise).

I am familiar with it (see my last sentence about my situation).

A few sources:
OWASP has a "RULE #2 -Set a user"
Configuring the container to use an unprivileged user is the best way to prevent privilege escalation attacks.

From https://linuxhandbook.com/uid-linux/
Do note that in most Linux distributions, UID 1-500 are usually reserved for system users. In Ubuntu and Fedora, UID for new users start from 1000.

RHEL that I'm on is in the Fedora family...

I understand your concern from a sysadmin perspective when you don't take into consideration how it is with containers, that threw me off when I first got into working with containers myself. The thing is a container can vary in base image, so the UID/GID assignment by distro is not always in alignment, even with common system users/groups like bin,daemon,mail,uucp..

I'm fully aware from the first FROM scratch container almost a decade ago until today to get into what it actually takes to make a container properly (and the FROM <distro>:<version> is a very very good starting point compared with scratch).
Most of the UID/GID issues can be handled by the (docker-)entrypoint.sh script and having a look then it is already ready for it by adding a usermod and chown if a env variable has been set to change runtime UID/GID.

This UID/GID concern can vary across new releases of the same distro as well, but is more of an evident issue when an image installs packages that create users/groups, as the order can matter.

Yes, hence the best practices to not create users in containers that are in well know UID/GID ranges for system users or accounts to specifically avoid that kind of scenario and nobody is a good candidate to use by default as it is a non-privileged user on most systems.

If your base image or your own were to change packages in a future build that would shift that install order, it can affect the UID/GID of the next image release made, such that any existing users persisted storage outside of the image is no longer in alignment with the UID/GID values for your containers users/groups, requiring manual fix by each affected user (or sometimes containers apply this at runtime with an entrypoint script).

I know - but since the container uses docker-entrypoint.sh and the container isn't using the USER setting then a usermod/groupmod could be added to change the UID/GID of the UID/GID in the container and chown the files using an environment variable and then valkey-server's params will use these.
Not an elegant solution but it works e.g. docker -e SET_UID=1234 -e SET_GID=1234 -p ...:... valkey or similar.

Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

True yet very avoidable by not using well known UID/GID ranges that are well know for system user and accounts.

Keep your volume data persisted to a common location on the host system where this boundary is a non-issue, alternatively deploy with rootless or user namespace remapping (solves your issue properly). There is also ID Mapped mounts which rootful containers can leverage (Podman supports this, while Docker doesn't officially yet, it can be done manually IIRC).

make the UID and GID larger than 1000 which is also the advised method for non system accounts.

It's a system service, not one dependent upon a user session. Like the references I've provided above, they either explicitly create with 999 UID/GID or they have it implicitly by requesting system user/group during creation.

A container is not a system service.
Volume mappings are always an issue due to exactly this problem. In Podman more then Docker as docker volumes (if directly mapped to the filesystem) sets the ownership automatically were as Podman running as the executing user sometimes have issues due to restrictions on non-root accounts. That is the main disadvantage of Podman - everything else is better IMHO.

Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

I don't have any citations. It is based on old introduction docs from Dockers site back in the day, and almost all other "get started with containers" documentation.

The main issue here is that if you install Valkey/Redis/Postgres/Mongo etc. etc. from the package manager then it will create a user (and often group) using adduser/addgroup which will not overlap with any system users or accounts but in the container this is not possible as it is hardcoded.
There are generally two solutions to this

  1. use a very high UID e.g. the nobody user which exists on most systems
  2. use variables and usermod/chown in the entrypoint.sh script to fix it. People using the variables will also know to set directory owner ship for volume mounts as they specifically choose a UID/GID to run as.

If it somehow is a requirement for the container to run with UID 999 then please add it to the documentation of the container so that --uidmap/--gidmap or Docker's --user is part of the documentation.

I agree with you here, I wish images would document their UID/GID more visibly, especially when it's not configurable (without extending/customizing the image)

Agreement is a nice thing :-)

I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Besides the UID overlap, does it cause any actual problem in practice? I assume your concern is related to a container escape, similar to escape as a root user?

No - which is why I've not made a PR to "fix" the issue "for me" but only asked about it/make the maintainers aware of the problems with using low UID/GIDs, which I hope is OK?

If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

Agree. But for testing out a container and the documentation or Docker Hub "example" does not document this then that is not something to take into account for testing - hence this issue.

The switch to a non-root user already results in all caps being dropped, thus the only time that becomes an issue is when the image modifies binaries with setcap to grant non-root users capabilities they'd otherwise not have (these are usually done with kernel enforcement check applied as well, rather than raising the capability for the process at runtime, thus if you drop the cap intentionally and don't use the feature, sadly the container fails to run as the dumb capability enforcement check prevents it).

Agree. But that is outside the scope of the low hardcoded UID/GID in the container.

Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

I know, but it is often better to be safe than sorry... hence I would guestimate that 99% of all security thinking is about what could happen vs how little the chance of it actually happening. Better safe than sorry.

If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

I agree, but this was found in a test of the container on a test RHEL server. So not in scope to see if using the container is valid choice vs manual installation (I'm thinking ahead in regards to upgrades mostly).

Podman is not something my colleagues like - they have trouble enough understanding and accepting the use of containers to begin with as they only see issues with them vs normal package managed applications that they have been using for decades. I'll get them there, but I can't make them do 1000 new things that they don't understand from day 1, I have to do it doucement even if it means using Docker and its root daemon.

@bbruun
Copy link
Author

bbruun commented Feb 19, 2025

Hey, Thank you so much for raising this issue and the information in this issue. To be frank it would be really helpful if you could raise a PR with a fix, that would also help other users to avoid getting a CVE.

Because it is (was) an inconvenience for me when I had the user overlap but not critical for running the container.

alpine already has a gid 999, so we'll use a nobody id

addgroup -S -g 65534 valkey;
adduser -S -G valkey -u 65534 valkey

I know how to fix it - that is not the problem.

I would add 2 variables and have the docker-entrypoint.sh script do a usermod and chown before running valkey-server to fix it so it is possible to run it with custom a UID/GID.

Also, previously redis container was also setup the same way, were we facing the same issue then?

I didn't use the Redis container - migrating away from AWS version to on-prem and I was looking at alternatives and Valkey seems to be the best choice. And using the container is mostly to test out upgrades later on vs using a package manager or manual install.

@polarathene
Copy link

TL;DR: Apologies for verbosity, I'm short on time.

  • Error/access cited problem needs more context.
    • It should not only happen with Valkey when other DB images have the same UID/GID (Potential issue identified if using Docker Compose).
    • Before any "fix" is considered by maintainers, verifying via reproduction should be achieved first.
  • Rootless containers would avoid the host overlap concern.
    • Rootful containers can use ID mapped volumes (Docker might not support this properly until v28).
    • Rootful containers could also use UserNS remapping (I've not personally used this feature).

Quick overview of 999:999 with popular database containers:

I understand this and I'm fully aware of this, but the valkey container just hit a ... nerve that overlapped with RHEL's systemd setup which the others don't (or haven't as of yet).

Could you please clarify with a reproduction? Are you certain this is what you think it is and not an XY problem?

For example these DB images all use VOLUME, and if you were using Docker Compose with the same service name and no explicit volumes for persistence, Redis and Valkey images (among others) declare the same VOLUME /data instruction... which Docker Compose will gladly carry over when you change images but not the name of the service (services.<service name>.image).

That can cause a variety of mishaps if you're not careful.

You shouldn't be proposing a fix with a UID/GID change, when that's not reproducible for you on other similar images using that same UID/GID. Instead it's better to understand the difference between the images (or what you did) to cause the underlying failure scenario itself. Adjusting the UID/GID may have "fixed" it, but that's not necessarily the correct solution... you've tried connecting some dots, but from what you've described the same problem could occur if both images changed to the same UID/GID values and you repeated the steps, it may have nothing to do with the existing host assignment.


Where I do agree with a change is for consistency. This is not ok:

FROM alpine:3.21
# add our user and group first to make sure their IDs get assigned consistently, regardless of whatever dependencies get added
RUN set -eux; \
# alpine already has a gid 999, so we'll use the next id
addgroup -S -g 1000 valkey; \
adduser -S -G valkey -u 999 valkey

FROM debian:bookworm-slim
# add our user and group first to make sure their IDs get assigned consistently, regardless of whatever dependencies get added
RUN set -eux; \
groupadd -r -g 999 valkey; \
useradd -r -g valkey -u 999 valkey

Image variants should be compatible with their UID/GID for the containerized service. However changing them will also impact all existing users of the images, that is a breaking change. I haven't reviewed the image in full, so the valkey group itself may not have much relevance but changing the UID would.


I don't see how this is a mixup. It's perfectly normal with containers, I think you're just not familiar with that? If you insist that you know better, please cite images or resources that state this should/is the way it should be within containers (note that I cited links to various official Docker images that say otherwise).

I am familiar with it (see my last sentence about my situation).

A few sources: OWASP has a "RULE #2 - Set a user" Configuring the container to use an unprivileged user is the best way to prevent privilege escalation attacks.

From https://linuxhandbook.com/uid-linux/ Do note that in most Linux distributions, UID 1-500 are usually reserved for system users. In Ubuntu and Fedora, UID for new users start from 1000.

RHEL that I'm on is in the Fedora family...

Regarding OWASP advice, privilege escalation attacks can happen as non-root users but switching away from the root user will drop all capabilities to the container user implicitly for the sysadmin so that they don't have to, and it minimizes some damage in the event a user escapes as that user (there are privilege escalation attacks in which they can become root regardless).

I have seen some image authors blindly follow such advice, but do so poorly by implementing workarounds that reduce security via setcap, defeating the purpose entirely when they grant non-default privileges that assist in carrying out attacks.

Yes, <1000 UID is typically reserved for system users, the range is configurable but please understand what a system user is... Valkey qualifies as a system user as it would when installed on the host.

Non-system users are those with login shells and intended for actual user sessions. The kernel has some features like:

# Default is 1024, Docker runs this with it set to 0 implicitly,
# as it's reasonably safe in the container context:
sysctl net.ipv4.ip_unprivileged_port_start=80

# It wasn't the case with other container engines for some time though,
# So some images would instead grant their non-root program this capability
# to ignore the security restriction (as it would apply for root)
#
# NOTE: the `+e` enforces this by the kernel preventing the program from running
# if the capability were denied by the sysadmin, even when the program is configured
# to bind to a port above 1024 which would have otherwise been valid..
setcap 'cap_net_bind_service=+ep' /path/to/program

An unprivileged user can be a system user btw, a UID of 1000+ can also be privileged in the sense of being granted ambient capabilities (Docker does not support this AFAIK, systemd does however), rather than a program/process being granted capabilities to it's permitted set (setcap with p; while e is the effective set which a process itself could natively raise at runtime if it is permitted).


Most of the UID/GID issues can be handled by the (docker-)entrypoint.sh script and having a look then it is already ready for it by adding a usermod and chown if a env variable has been set to change runtime UID/GID.

# allow the container to be started with `--user`
if [ "$1" = 'valkey-server' -a "$(id -u)" = '0' ]; then
find . \! -user valkey -exec chown valkey '{}' +
exec setpriv --reuid=valkey --regid=valkey --clear-groups -- "$0" "$@"
fi

Ah alright, yeah that resolves the ownership concern provided no other separate tooling/images are expecting the 999 UID/GID, otherwise still breaking.

However for a common ENV configurable I see across images with PUID / PGID, I suppose that works and is useful when the container runs with a non-root user instead of leveraging rootless containers 😅

Off-topic: That conditional would look better like (works with both bash and ash, which /bin/sh symlinks to):

if [[ "$1" == 'valkey-server' && "$(id -u)" == '0' ]]; then

This UID/GID concern can vary across new releases of the same distro as well, but is more of an evident issue when an image installs packages that create users/groups, as the order can matter.

Yes, hence the best practices to not create users in containers that are in well know UID/GID ranges for system users or accounts to specifically avoid that kind of scenario and nobody is a good candidate to use by default as it is a non-privileged user on most systems.

Sorry, doesn't make sense to me here. For clarity so that we're on the same page, when you've explicitly mounted a volume to the host, anything written to that location is where your concern is?

Otherwise it's a bit bizarre to expect no conflict in UID/GID with containers and the host, base images differ here by distro, and for each image on those base images, any further system packages installed will shuffle new UID/GID assignment accordingly, you can't do much about that beyond choose fixed UID/GID in advance (I've had to do this for ClamAV with it's DB for example so that it has a stable ownership across image upgrades).

999 on your host is a system UID but not necessarily privileged... systemd-coredump itself will initially be invoked privileged to create a socket, but the related service runs unprivileged. Your UID/GID grants permissions (rwx) for ownership/access, but privileges are more to do with capabilities the process has (which utilities like setcap can grant at a file level aka "capability-dumb", or systemd can augment processes).

I think you have a misunderstanding with the relation of privilege to capabilities, not UID/GID ownership?


Trying not to step on UID/GID used on the host system is a bit foolish as you will not know what these are. As the sysadmin for the containers running, any data you persist to the host storage is up to you with ensuring it's not mishandled, should the UID/GID conflict with assignment on the host that it causes a real problem (it rarely does in practice).

True yet very avoidable by not using well known UID/GID ranges that are well know for system user and accounts.

Again... you can't justify that way when it's not consistent across distros 🤷‍♂

This UID/GID doesn't exist in Fedora by default unless systemd is installed (for desktop/server ISOs that's usually a given).

$ docker run --rm -it fedora:42

# Nothing:
$ cat /etc/passwd | grep coredump

# Install systemd:
$ dnf install -y systemd

$ cat /etc/passwd | grep coredump
systemd-coredump:x:998:998:systemd Core Dumper:/:/usr/sbin/nologin

$ cat /etc/group | grep coredump
systemd-coredump:x:998:

# This is what got assigned 999 instead:
$ cat /etc/passwd | grep oom
systemd-oom:x:999:999:systemd Userspace OOM Killer:/:/usr/sbin/nologin

$ cat /etc/group | grep oom
systemd-oom:x:999:

Now the UID is 998, while for you it's 999... Does Fedora need to "fix" this now? No, it's no different to using a VM or migrating data from another OS install (even if it's the same one where UID/GID were mismatched due to implicit assignment of UID/GID and package install time).


A container is not a system service.
Volume mappings are always an issue due to exactly this problem. In Podman more then Docker as docker volumes (if directly mapped to the filesystem) sets the ownership automatically were as Podman running as the executing user sometimes have issues due to restrictions on non-root accounts. That is the main disadvantage of Podman - everything else is better IMHO.

The container is a sandbox (namespace) that runs one or more processes, those can be services as they would be outside of a container, why you want to make a distinction here I do not know. If you need isolation from host UID/GID values, go with rootless or related rootful features for remapping.

Volumes are only allowed to write to disk what it's permitted to on the host, a non-issue when the container is rootful, or when you use rootless containers with user namespace remapping (/etc/subuid + /etc/subgid range for a user to leverage). You can also use ID mapped volumes (better supported in Podman, but requires rootful).

So no it's not really the volumes that are the problem... it's the way a user chooses to run the image, and how the image author approaches it in their image. If you just use root in the container, it's not really a problem is it, except for the time before rootless containers were available which is why we have all this "best practice" advice to run containers as non-root users internally (similar to VOLUME "best practice", despite it being effectively legacy). The equivalent rootful container with non-root user security benefits is mostly down to dropping capabilities for the root user, but that can be inconvenient friction for users to do (plus they'd need to explicitly grant back any capabilities that non-root required setcap workarounds to function).

Push for the PGID/PUID feature if you like, or just use rootless containers. It's true that rootless outside of volume concerns do have other limitations, but that shouldn't affect most containers.

I'm not sure what you are on about with Podman disadvantage... it's daemonless, unlike Docker. If you want rootful container, then run Podman commands as root?


Normal, I would say sane, defaults for a container is to make it as non-intrusive/secure by default as possible and e.g. use the nobody user with UID/GID of 65534 (or similar high UID/GID) to avoid conflicting with OS UID/GID's - like most other containers does.

Citation needed for other official database images that are running as nobody please.

I don't have any citations. It is based on old introduction docs from Dockers site back in the day, and almost all other "get started with containers" documentation.

Perhaps it was removed from the docs for a reason then if it was previously suggested? I don't see how all images using nobody is any better when their own data would then overlap with the ownership of other containers writing to the host allowing anybody as the nobody user to access it 🤷‍♂

The main issue here is that if you install Valkey/Redis/Postgres/Mongo etc. etc. from the package manager then it will create a user (and often group) using adduser/addgroup which will not overlap with any system users or accounts

The system packages avoid the conflict for a good reason, but I really don't see it being an issue with containers. By that logic, containers shouldn't have root 0:0 🤦‍♂ Your problem is entirely resolved with ID mapping, just use rootless? Depending on storage driver, IIRC the containers own internal filesystem layout can be on the host with all it's files accessible, that'd be even worse for your concern, except no host service should be interacting with that area of the filesystem unrelated to the service.

but in the container this is not possible as it is hardcoded.

The hard-coding has no relevance...? Without that, it's not going to read the hosts users and groups, that'd be very bad.

FWIW, the official Docker docs encourage pinning explicit UID/GID, and they warn about large UID/GID values (for nobody this isn't particularly bad, adds only 20MB to the image):

Image

For rootless containers, this also affects the range assignment (you allocate 2^16 sub-UID/GID for a host user).


I'm most likely not the only one that tries out the container and gets a UID overlap because of this. Not every test needs to be setup in a full blown lab that thinks of nothing but security.

Besides the UID overlap, does it cause any actual problem in practice? I assume your concern is related to a container escape, similar to escape as a root user?

No - which is why I've not made a PR to "fix" the issue "for me" but only asked about it/make the maintainers aware of the problems with using low UID/GIDs, which I hope is OK?

Yes questions are great, just a reminder that I'm not a maintainer of this image (I recall your first reply to me might have mistaken me for one).

BTW, I mean no disrespect in my responses where I'm potentially over-explaining things you already know as a sysadmin, but I have enough experience as a sysadmin and with containers that I feel I can weigh in that this sounds like an XY problem with potential knowledge gaps on your end for the more niche aspects.

You do seem rather knowledgeable but something seems off with the reported problem (errors and access) only affecting Valkey.


If you're properly locking down the containers for security reasons, that really shouldn't be happening, you'll find escapes are reliant upon non-default capabilities being granted to the container root user (it's not equivalent to root on host which has far more capabilities granted by default).

Agree. But for testing out a container and the documentation or Docker Hub "example" does not document this then that is not something to take into account for testing - hence this issue.

Locking down capabilities would be more advanced security practice. Using non-root users as default in images or adding the support for such is usually to benefit the majority audience that doesn't have a good grasp on such things and thus rely upon "best practice" advice from resources they trust.

As such you won't see that sort of guidance in specific images READMEs. I disagree with the practice of rootful containers using non-root users due to various caveats that can bring, the default capabilities for container root are fine as-is, but I understand the precaution, especially for projects that just want to offer Docker / Containers as a deployment option but otherwise don't have much of a handle on this sort of security that well either (I've seen enterprise grade, well funded projects that are open-source that don't implement things on their end properly too, so you can imagine why such practices are prevalent).

Docs wise, the image should at least communicate the UID/GID it uses when deviating from root as default.


Agree. But that is outside the scope of the low hardcoded UID/GID in the container.

Apologies, I'm more focused on the UID/GID concern in general, rather than specific to this image.

In that sense I consider it relevant context and in scope, should anyone arrive at this issue to go over the pro/cons of what is being discussed.


Other than that the most common other way to break out is via access to the Docker API (usually the socket), but that's an explicit mount (and one that is forbidden access to with SELinux enabled for the container IIRC). I have seen some images run as non-root but give that non-root user permission to use the socket within the container 🤦‍♂

I know, but it is often better to be safe than sorry... hence I would guestimate that 99% of all security thinking is about what could happen vs how little the chance of it actually happening. Better safe than sorry.

Fair point.


If you're serious about security though, unless you need rootful containers, go for rootless (these have some limitations, but are usually fine). The whole UID/GID concern will be handled for you implicitly (at least it is with Podman IIRC).

I agree, but this was found in a test of the container on a test RHEL server. So not in scope to see if using the container is valid choice vs manual installation (I'm thinking ahead in regards to upgrades mostly).

Sorry, I don't follow the concern here?

If you're evaluating a container vs host install, and your concern is the container writes volume data to the host with a UID/GID already assigned to something else, why wouldn't ID mapped volumes or rootless containers make sense?

For context, this specific image may only have the one UID/GID mapping concern so your nobody user/group solution works for you. Some containers though will have more than a single user/group for write/read access, it's uncommon but it does happen (I happen to maintain one).


Podman is not something my colleagues like - they have trouble enough understanding and accepting the use of containers to begin with as they only see issues with them vs normal package managed applications that they have been using for decades. I'll get them there, but I can't make them do 1000 new things that they don't understand from day 1, I have to do it document even if it means using Docker and its root daemon.

FWIW Docker does have rootless too, and if this is mostly a concern on the developers desktop systems, Docker Desktop while rootful is effectively rootless in the sense of it using a VM to manage Docker. You can use that on Linux too, which still provides the docker CLI. I mention this because Docker Desktop also has ECI (enhanced container isolation), should you need even stricter security requirements (this will add friction should a container require access to the docker socket).

I understand the choice to go with Docker for users and to an extent with developers. If your colleagues are sysadmins and already comfortable with Systemd however, Podman Quadlets is the Docker Compose equivalent in Podman but with systemd units (they use a generator service to extend config with some Podman/Container specific metadata settings, but the generator outputs a standard systemd service). The difference between rootful and rootless then becomes system or user scope systemd services based on standard systemd config locations 👍 (compose.yaml is otherwise more friendly and widely supported that troubleshooting can be easier, except with certain caveats specific to Docker Compose)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants