Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS is confused by user namespaces (uid/gid mapping) when used with acltype=posixacl #4177

Closed
stgraber opened this issue Jan 8, 2016 · 7 comments
Milestone

Comments

@stgraber
Copy link

stgraber commented Jan 8, 2016

Hello,

First a quick introduction to the world of containers

I'm the project leader for LXC and LXD, working on containers on Linux. We now extensively use the user namespaces to provide an extra layer of security in Linux containers.

The user namespace allows one to map a range of uid and gid from the host or parent namespace into another range of uid and gid of a new namespace.

Typically what's done is that 65536 uids and gids are set aside per non-system users on the host. Those users through a couple of setuid helpers (newuidmap and newgidmap) can then setup a uid and gid map for their processes. Their 65536 allocation is therefore mapped from uid/gid 0 to 65536 of the new namespace, providing a POSIX-compatible environment.

That means that given a user on the host with uid and gid range 100000 through 165536, uid 100 in their container will be mapped to uid 100100 outside of it.

The problem with ZFS

When using ZFS with acltype=posixacl and an ACL entry on the host set for a uid (or gid) that's then mapped into the container, the container doesn't see the right mapped value when querying the acl from inside the namespace.

Example with zfs (broken)

root@dakara:~# zfs create lxd/test -o mountpoint=/tmp/test
root@dakara:~# zfs set acltype=posixacl lxd/test
root@dakara:~# cd /tmp/test/
root@dakara:/tmp/test# mkdir a
root@dakara:/tmp/test# setfacl -m default:user:100100:rwX a
root@dakara:/tmp/test# setfacl -m user:100100:rwX a
root@dakara:/tmp/test# getfacl a
# file: a
# owner: root
# group: root
user::rwx
user:100100:rwx
group::r-x
mask::rwx
other::r-x
default:user::rwx
default:user:100100:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

root@dakara:/tmp/test# lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /bin/bash
root@dakara:/tmp/test (in userns)# ls -lh
total 512
drwxrwxr-x+ 2 nobody nogroup 2 Jan  7 22:19 a

root@dakara:/tmp/test (in userns)# getfacl -n a
# file: a
# owner: nobody
# group: nogroup
user::rwx
user:4294967295:rwx
group::r-x
mask::rwx
other::r-x
default:user::rwx
default:user:4294967295:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

Example with ext4 (working)

root@dakara:/tmp/test.ext4# mkdir a

root@dakara:/tmp/test.ext4# setfacl -m default:user:100100:rwX a

root@dakara:/tmp/test.ext4# setfacl -m user:100100:rwX a

root@dakara:/tmp/test.ext4# getfacl a
# file: a
# owner: root
# group: root
user::rwx
user:100100:rwx
group::r-x
mask::rwx
other::r-x
default:user::rwx
default:user:100100:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

root@dakara:/tmp/test.ext4# lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /bin/bash
root@dakara:/tmp/test.ext4 (in userns)# ls -lh
total 4.0K
drwxrwxr-x+ 2 nobody nogroup 4.0K Jan  7 22:22 a

root@dakara:/tmp/test.ext4 (in userns)# getfacl -n a
# file: a
# owner: 65534
# group: 65534
user::rwx
user:100:rwx
group::r-x
mask::rwx
other::r-x
default:user::rwx
default:user:100:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

Environment

This was noticed on Ubuntu 14.04 using the zfs stable PPA. I first found it in production environments first with file servers misbehaving due to the problem, then reproduced it on my development systems.

The zfs version here is 0.6.5.3-1~trusty and I've seen this on 3.13, 3.16, 3.19 and 4.2 kernels (not that it should matter, the dkms code was the same). zfs-dkms is at 2.53-zfs1.

@stgraber
Copy link
Author

stgraber commented Jan 8, 2016

The lxc-usernsexec helper tool I'm using there comes from the LXC package in Ubuntu. It essentially causes a call to fork() followed by a call to unshare(CLONE_NEWUSER), then calls the newuidmap and newgidmap setuid helpers with the provided map so that the namespace can be configured properly.

You could reproduce something similar using the simple unshare tool and manual writes to /proc/PID/{u,g}id_map

@behlendorf
Copy link
Contributor

@stgraber nice to meet you, thanks for the clear bug report and sorry about the slow reply.

It looks to me like for some reason ZFS wasn't able to find a valid mapping in the namespace for the uid. We'll need to spend some time digging to determine exactly why, but with a test case that should be pretty straight forward once a developer has time to look in to it.

My suspicion is that it will be something fairly simple once identified. The kernel doesn't expect much from the filesystem to support namespaces, largely just the hooks to to save and restore xattrs. Most of the complexity here comes from the fact that the kernel interfaces change fairly frequently and we need to support several of them concurrently.

@behlendorf behlendorf added this to the 0.7.0 milestone Mar 24, 2016
@kjp949
Copy link

kjp949 commented Apr 27, 2016

I ran in to this problem on ubuntu 16.04 today. I'm using posix acls on zfs with my lxc containers. Now I have a bunch on non-important files and directories that I can't delete or access.

root@hostname:/mnt/storage/media/Movies/Movie)# rm -rf extra*
rm: cannot remove 'extrathumbs': Directory not empty

root@hostname:/mnt/storage/media/Movies/Movie# cd extrathumbs/
bash: cd: extrathumbs/: Invalid argument

Is there any way to clean this up?

@antifuchs
Copy link

@kpeterson11 - I have run into this exact (EINVAL issue) myself on ubuntu 16.04. I believe this particular case might be related to #2718, but I haven't yet managed to fully verify it is.

@kjp949
Copy link

kjp949 commented Apr 27, 2016

I have managed to clean up the bad files. I had to log in to the container, become the user that owned the files, try to delete the file and get a "permission denied error", switch to my other console with the root user on the host and successfully delete the file. I think the trick was to access the affected files using the user in the container. Using cat on the file as the owner in the container worked too.

For now, I've changed the container to privileged and set acls using the actual uid of the user in the container. Hopefully things behave now. If not, I'll have to go back to freenas. It seems I'm not having much luck with these acls.

@maxximino
Copy link
Contributor

My two cents at this:
The VFS already provides the translation of UIDs into the "blob" that we're asked to store through http://lxr.free-electrons.com/source/fs/posix_acl.c?v=4.4#L596 that are called
http://lxr.free-electrons.com/source/fs/xattr.c?v=4.4#L355 and http://lxr.free-electrons.com/source/fs/xattr.c?v=4.4#L456 respectively before/after calling into the filesystem for storing/reading the data.
Our calls to posix_acl_from_xattr (translating the blob into the in-memory acl) ask to do that w.r.t. the namespace of the current user (https://github.com/zfsonlinux/zfs/blob/887d1e60ef1f20a1b13e7c7b0d208f10b13b9cbe/include/linux/xattr_compat.h#L200) - but when we stored the blob, we got it already "translated" to init_user_ns .
posix_acl_from_xattr uses make_kuid, If I'm following the code correctly, that would result in a double translation of the namespace.

torvalds/linux@5f3a4a2 introduced the user_ns parameter, and the commit is quite clear - that CRED()->user_ns linked before should refer to init_user_ns (http://lxr.free-electrons.com/source/kernel/user.c?v=4.4#L25) ....which is... EXPORT_SYMBOL_GPL . :(

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 28, 2016
As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
not the current user_ns, should be passed to posix_acl_from_xattr()
and posix_acl_to_xattr().  Conveniently the init_user_ns is
available through the init credential (kcred).

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4177
@behlendorf
Copy link
Contributor

@maxximino nice job running down the root cause. I've proposed #4576 which provides &init_user_ns via the init credential ->user_ns (kcred). With the patch applied the test case provided by @stgraber works as expected. @maxximino @stgraber I'd appreciate it if you could review / test the proposed fix.

nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
not the current user_ns, should be passed to posix_acl_from_xattr()
and posix_acl_to_xattr().  Conveniently the init_user_ns is
available through the init credential (kcred).

Signed-off-by: Brian Behlendorf <[email protected]>
Issue openzfs#4177
ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
As described in torvalds/linux@5f3a4a2 the &init_user_ns, and
not the current user_ns, should be passed to posix_acl_from_xattr()
and posix_acl_to_xattr().  Conveniently the init_user_ns is
available through the init credential (kcred).


Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Massimo Maggi <[email protected]>
Closes openzfs#4177
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants