idmapped mounts - kernel 5.12 #10374

ykuksenko · 2021-05-18T03:31:02Z

/kind feature

Description

Kernel 5.12 merged an IDMAPPED mounts feature. Is the podman project planning to make use of this?

Forgive me if this should be asked in a different location (runc/crun?). Mostly asking in case this went under the radar, I haven't seen any info about this around podman though containerd/containerd#4734 has had some movement.

Thank you.

rhatdan · 2021-05-18T16:49:54Z

We are familiar with it, and are planning on using it. We have not discussed how this will be used in RHEL yet.

rhatdan · 2021-05-18T16:50:14Z

@giuseppe WDYT?

giuseppe · 2021-05-18T16:53:54Z

AFAIK, it doesn't support yet overlay so that is not really useful at the moment.

How is containerd using the feature? Does it work only with vfs?

brauner · 2021-05-19T12:45:08Z

AFAIK, it doesn't support yet overlay so that is not really useful at the moment.

Fwiw, systemd is already using idmapped mounts for ext4 and xfs. I have a branch for overlayfs which I just need to rebase onto v5.13. I dropped this port from my initial patchset.
So how do you want to use this with overlayfs? Since overlayfs can be mounted inside of user namespaces since 5.11 what would idmapped mounts be used with it?

How is containerd using the feature? Does it work only with vfs?

Idmapped mounts currently work with ext4 and xfs so you can idmap any such mount. Take a look at e.g. systemd/systemd#19438
Is that what you were asking?

rhatdan · 2021-05-19T13:53:19Z

We want to take an underlying image stored with the system UIDS (IE UID 0==0, 1==1 ...) and map into different containers with these UIDs remapped to match the user namespace we are running them in. We only want to do this as readonly, if we want to write to any of these files in the image, we want overlay to copy up the files and write them with the User Namespaced UIDs.

We need to be able to do this in both rootfull (CAP_SYS_ADMIN) and rootless with User Namespace.

giuseppe · 2021-05-19T13:54:27Z

first of all let me rephrase "that is not really useful at the moment" -> "that is not really useful at the moment for us".

We could use it for -v /src:/dest:O where we currently use chown, but it won't help with mounting the container rootfs.

In our case, I think it will be easier to just use MOUNT_ATTR_IDMAP on the final overlay mount, that will be the container rootfs.

An alternative that could possibly help us is to create an idmapped bind mount for the entire storage and use it for accessing the overlay lower layers (the OCI image). I've not investigated this alternative yet as it requires non trivial changes in the storage library so I was just waiting for overlay support since I saw it was WIP and plug it more easily in what we have now.

brauner · 2021-05-19T14:44:59Z

We want to take an underlying image stored with the system UIDS (IE UID 0==0, 1==1 ...) and map into different containers with these UIDs remapped to match the user namespace we are running them in. We only want to do this as readonly, if we

So yes, that is exactly what this is for. For the most common case where as you said we're talking about system/or init uids it's as simple as attaching the container's userns to the detached mount.

Fwiw, idmapped mounts work outside of user namespaces as well. That was an explicit use-case for systemd and others. I.e. you could remap a fileystem in the init userns with this to e.g. have real root in a privileged container write as an unprivileged uid (say 1000) onto the disk, i.e. you could go andd create an idmapped bind-mount of your host's /opt at /mnt with a mapping of:

0 1000 1
1 1 1000
1001 1001 <range>`

and exposing this to a privileged container would mean that uid 0 aka real root creates files as uid 1000 on disk.

want to write to any of these files in the image, we want overlay to copy up the files and write them with the User Namespaced UIDs.

Right, so this sounds like overlayfs would be mounted on top of idmapped mounts, i.e. the lower layers would be idmapped and overlayfs would be mounted from inside the container.

We need to be able to do this in both rootfull (CAP_SYS_ADMIN) and rootless with User Namespace.

Right currently MOUNT_ATTR_IDMAP requires cap_sys_admin in init_user_ns but that will change in the future. I just want to wait until later in the year before we consider this. I don't want to jump the unprivileged gun too soon.

brauner · 2021-05-19T14:47:59Z

first of all let me rephrase "that is not really useful at the moment" -> "that is not really useful at the moment for us".

Thanks for that qualification. :) I appreciate that!

We could use it for -v /src:/dest:O where we currently use chown, but it won't help with mounting the container rootfs.

Because your rootfs mount is an overlayfs mount. Is the mount created from inside the container or is it created outside of the container?

In our case, I think it will be easier to just use MOUNT_ATTR_IDMAP on the final overlay mount, that will be the container rootfs.

Right, that's another case. That sounds like an idmapped mount of an overlayfs mount.
I have a branch that supports both, i.e. idmapped upper/lower directories and creating idmapped mounts of an overlayfs mount.

An alternative that could possibly help us is to create an idmapped bind mount for the entire storage and use it for accessing the overlay lower layers (the OCI image). I've not investigated this alternative yet as it requires non trivial changes in the storage library so I was just waiting for overlay support since I saw it was WIP and plug it more easily in what we have now.

giuseppe · 2021-05-19T14:55:10Z

We could use it for -v /src:/dest:O where we currently use chown, but it won't help with mounting the container rootfs.

Because your rootfs mount is an overlayfs mount. Is the mount created from inside the container or is it created outside of the container?

we create the rootfs mount outside of the container. That is kind of forced since the OCI runtime gets just the path to the container rootfs (so it must be already mounted by Podman) and create the mount namespace later.

brauner · 2021-05-19T14:55:52Z

We could use it for -v /src:/dest:O where we currently use chown, but it won't help with mounting the container rootfs.

Because your rootfs mount is an overlayfs mount. Is the mount created from inside the container or is it created outside of the container?

we create the rootfs mount outside of the container. That is kind of forced since the OCI runtime gets just the path to the container rootfs (so it must be already mounted by Podman) and create the mount namespace later.

Right, so the s_user_ns, i.e. the user namespace of the superblock is init_user_ns for overlayfs, right?

brauner · 2021-05-19T14:57:49Z

My point is that in principle it is possible to create idmapped mounts of filesystems that are mounted inside of a user namespace. It's just a matter of adapting the translation functions but if we can avoid it that'd be good too.

giuseppe · 2021-05-19T15:05:43Z

yes correct, at least when running as root.

For rootless we have an intermediate user+mount namespace

giuseppe · 2021-05-19T15:11:54Z

if you have some patches for the kernel, I can give them a try

brauner · 2021-05-19T15:19:00Z

if you have some patches for the kernel, I can give them a try

I need to rebase them but I can give you something in about a week or so. I can also share a tree with you and you can poke at it as well if you want to.

github-actions · 2021-06-19T00:04:42Z

A friendly reminder that this issue had no activity for 30 days.

rhatdan · 2021-07-09T20:28:49Z

@brauner did you ever make any progress on this?

brauner · 2021-07-09T22:51:56Z

Yes, I have indeed. My plan is to get it into shape for either next merge window or the one after that.

github-actions · 2021-08-09T00:03:39Z

A friendly reminder that this issue had no activity for 30 days.

giuseppe · 2021-08-30T08:47:40Z

@brauner I've played with idmapped mounts and try to enable them only for overlay lower layers, but it seems even this configuration is not supported.

Right, so the s_user_ns, i.e. the user namespace of the superblock is init_user_ns for overlayfs, right?

looking back at this question. In the rootless case no, the idmapped mount should be managed inside of a user namespace.

For rootless, we create an outer user namespace with all the available IDs (as defined in /etc/sub?id) are mapped into.

Once in the outer user namespace, we can create inner user namespaces for each container/pod. For these inner namespaces we'd like to reuse the container images that are pulled in the outer namespace. The overlay mount used by the containers is created in the outer namespace.

Another question: is there any plan for supporting BTRFS or is that more complicated than XFS/ext4?

brauner · 2021-08-30T09:28:21Z

@brauner I've played with idmapped mounts and try to enable them only for overlay lower layers, but it seems even this configuration is not supported.

Yes, overlayfs is currently not supported. I'm working on this. I ported another filesystem for this cycle but I've already started talking to Amir about overlayfs.

Right, so the s_user_ns, i.e. the user namespace of the superblock is init_user_ns for overlayfs, right?

looking back at this question. In the rootless case no, the idmapped mount should be managed inside of a user namespace.

For rootless, we create an outer user namespace with all the available IDs (as defined in /etc/sub?id) are mapped into.

Once in the outer user namespace, we can create inner user namespaces for each container/pod. For these inner namespaces we'd like to reuse the container images that are pulled in the outer namespace. The overlay mount used by the containers is created in the outer namespace.

So essentially you're creating nested user namespaces, right? The first level is for mounting overlayfs and the second level user namespaces are for the individual containers.

Ok, so you need the ability to created idmapped mounts of an overlayfs filesystem mounted inside of a user namespace. That's a bit trickier to do but not impossible. I expected this to happen at some point anyway.

Another question: is there any plan for supporting BTRFS or is that more complicated than XFS/ext4?

That relates to why I didn't do overlayfs this cycle. :) I ported BTRFS, i.e. with 5.15 you can create idmapped mounts of BTRFS. See David Sterba's tree:

https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/log/?h=for-next&qt=grep&q=brauner

and the associated test-suite specific to btrfs ioctls:

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/src/idmapped-mounts/idmapped-mounts.c#n13423

rhvgoyal · 2021-08-30T17:42:55Z

@brauner So there are two pieces to it from overlayfs point of view. First one is that one should be able to use an idmapped mount as lower directory. And second piece is that one can create idmapped mounts of overlay instance. I think @giuseppe is looking for first piece to begin with. That is create idmapped mounts of ext4/xfs/btrfs/nfs and then use these mounts as lower directory of overlayfs.

Creating idmapped mounts of overlayfs might be useful in case of nested overalyfs. So if outer namespace's rootfs is on overlay and it is creating nested container then it might have to create an idmapped mount shifted as per the nested container mappings on overlayfs.

I am wondering why overlayfs does not support idmapped mount as lower dir. What are the issues.

rhvgoyal · 2021-08-30T17:59:05Z

And probably we will need to create idmapped mounts from inside the outer user namespace. So we can download image, create idmapped mount inside outer NS and then use that as lower dir for overlayfs mount of nested container. This does not allow image sharing between different users but atleast this will avoid chown() (or doing uid shifting using fuse-overlayfs), IIUC.

brauner · 2021-08-31T07:10:06Z

@brauner So there are two pieces to it from overlayfs point of view. First one is that one should be able to use an idmapped mount as lower directory. And second piece is that one can create idmapped mounts of overlay

Yes, exactly and I did outline that a couple of times and implemented both.

instance. I think @giuseppe is looking for first piece to begin with. That is create idmapped mounts of ext4/xfs/btrfs/nfs and then use these mounts as lower directory of overlayfs.

Yes, I understand.

Creating idmapped mounts of overlayfs might be useful in case of nested overalyfs. So if outer namespace's rootfs is on overlay and it is creating nested container then it might have to create an idmapped mount shifted as per the nested container mappings on overlayfs.

Yes, that would be a second step.

I am wondering why overlayfs does not support idmapped mount as lower dir. What are the issues.

There are no technical reasons or problems why overlayfs doesn't support idmapped mounts.
First, I had idmapped mounts sit a bit in 5.13 and 5.14 so we could have some time fixing any immediate issues. So far there have been none which I attribute to the extensive test-suite that was added.

I announced that I would prefer to do that a while ago. And now we ported btrfs.
While overlayfs will require more changes than ext4, xfs, or btrfs it isn't more difficult. In essence, overlayfs needs to be taught to take the idmapping of the underlying filesystem into account. I have been working on that but btrfs was higher up on the list because of systemd and others. I will have patches ready soon. With Plumbers and patch review and other stuff it just takes time and I do tend to spend a lot of times on tests too.

commit 1ac2a41 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

srd424 · 2022-07-01T18:00:42Z

Apologies for a mild thread hijack, but @brauner - is there a reason why an idmapped mount of a merged overlay directory is still not allowed? I found a patch posted to linux-fsdevel back in 2020 where you'd enabled it..

brauner · 2022-07-01T19:25:55Z

Apologies for a mild thread hijack, but @brauner - is there a reason why an idmapped mount of a merged overlay directory is still not allowed? I found a patch posted to linux-fsdevel back in 2020 where you'd enabled it..

There was no use-case for it. What do you want to use this for?

srd424 · 2022-07-01T19:38:42Z

Hmm, I was thinking it would provide an easier set-up for some system container stuff until LXC/nspawn support idmapped overlays directly, but possibly I am wrong - I will try to think it through properly when I'm not full of medication. It looks like I might be able to do something with LXC pre-mount hooks instead..

commit 1ac2a41 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

srd424 · 2022-07-03T10:02:02Z

There was no use-case for it. What do you want to use this for?

Ah, it looks like I can simply mount-idmapped the layers in advance now, and then have LXC set up the overlay in the container, and it all just works. So no, I guess no real use case for it, other than completeness.

commit 1ac2a41 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 1ac2a41 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit b20dcf603b8d0bb24a45c8e6cdd345e3fb3aa3d4) Signed-off-by: Jack Vogel <[email protected]>

BugLink: https://bugs.launchpad.net/bugs/1986724 commit 1ac2a41 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]> Signed-off-by: Stefan Bader <[email protected]>

Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]>

commit 1ac2a41 upstream. Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Currently we only support idmapped mounts for filesystems mounted without an idmapping. This was a conscious decision mentioned in multiple places (cf. e.g. [1]). As explained at length in [3] it is perfectly fine to extend support for idmapped mounts to filesystem's mounted with an idmapping should the need arise. The need has been there for some time now. Various container projects in userspace need this to run unprivileged and nested unprivileged containers (cf. [2]). Before we can port any filesystem that is mountable with an idmapping to support idmapped mounts we need to first extend the mapping helpers to account for the filesystem's idmapping. This again, is explained at length in our documentation at [3] but I'll give an overview here again. Currently, the low-level mapping helpers implement the remapping algorithms described in [3] in a simplified manner. Because we could rely on the fact that all filesystems supporting idmapped mounts are mounted without an idmapping the translation step from or into the filesystem idmapping could be skipped. In order to support idmapped mounts of filesystem's mountable with an idmapping the translation step we were able to skip before cannot be skipped anymore. A filesystem mounted with an idmapping is very likely to not use an identity mapping and will instead use a non-identity mapping. So the translation step from or into the filesystem's idmapping in the remapping algorithm cannot be skipped for such filesystems. More details with examples can be found in [3]. This patch adds a few new and prepares some already existing low-level mapping helpers to perform the full translation algorithm explained in [3]. The low-level helpers can be written in a way that they only perform the additional translation step when the filesystem is indeed mounted with an idmapping. If the low-level helpers detect that they are not dealing with an idmapped mount they can simply return the relevant k{g,u}id unchanged; no remapping needs to be performed at all. The no_idmapping() helper detects whether the shortcut can be used. If the low-level helpers detected that they are dealing with an idmapped mount but the underlying filesystem is mounted without an idmapping we can rely on the previous shorcut and can continue to skip the translation step from or into the filesystem's idmapping. These checks guarantee that only the minimal amount of work is performed. As before, if idmapped mounts aren't used the low-level helpers are idempotent and no work is performed at all. This patch adds the helpers mapped_k{g,u}id_fs() and mapped_k{g,u}id_user(). Following patches will port all places to replace the old k{g,u}id_into_mnt() and k{g,u}id_from_mnt() with these two new helpers. After the conversion is done k{g,u}id_into_mnt() and k{g,u}id_from_mnt() will be removed. This also concludes the renaming of the mapping helpers we started in [4]. Now, all mapping helpers will started with the "mapped_" prefix making everything nice and consistent. The mapped_k{g,u}id_fs() helpers replace the k{g,u}id_into_mnt() helpers. They are to be used when k{g,u}ids are to be mapped from the vfs, e.g. from from struct inode's i_{g,u}id. Conversely, the mapped_k{g,u}id_user() helpers replace the k{g,u}id_from_mnt() helpers. They are to be used when k{g,u}ids are to be written to disk, e.g. when entering from a system call to change ownership of a file. This patch only introduces the helpers. It doesn't yet convert the relevant places to account for filesystem mounted with an idmapping. [1]: commit 2ca4dcc ("fs/mount_setattr: tighten permission checks") [2]: containers/podman#10374 [3]: Documentations/filesystems/idmappings.rst [4]: commit a65e58e ("fs: document and rename fsid helpers") Link: https://lore.kernel.org/r/[email protected] (v1) Link: https://lore.kernel.org/r/[email protected] (v2) Link: https://lore.kernel.org/r/[email protected] Cc: Seth Forshee <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> CC: [email protected] Reviewed-by: Seth Forshee <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>

phemmer · 2023-05-08T19:15:50Z

#10374 (comment)

We need to be able to do this in both rootfull (CAP_SYS_ADMIN) and rootless with User Namespace.

Right currently MOUNT_ATTR_IDMAP requires cap_sys_admin in init_user_ns but that will change in the future. I just want to wait until later in the year before we consider this. I don't want to jump the unprivileged gun too soon.

@brauner It's been a couple years now. Has there been any update or further thoughts on this?

livingsilver94 · 2023-05-23T14:45:21Z

See this comment: https://lwn.net/Articles/896663

Since no filesystems that support being mounted unprivileged support them

Since Linux 5.19 OverlayFS supports idmapping, and an OverlayFS can be mounted unprivileged by passing the mount option userxattr. Thus, I think that Brauner's quote isn't valid anymore (for the sole OverlayFS at least).

I'll try to idmap an OverlayFS mount soon.

brauner · 2023-05-23T15:05:53Z

Nothing's really changed. While overlayfs is mountable by unprivileged users it cannot be mounted with an idmapping as taht would be really weird. Instead, you can idmap the layers and then mount overlayfs on top of the idmapped layers as an unprivileged users.

livingsilver94 · 2023-05-23T15:15:12Z

I'm confused. See this manpage:

overlayfs (ID-mapped lower and upper layers supported since Linux 5.19)

It looks like OverlayFS can be idmapped but you're saying the opposite. Please consider I'm no expert on the matter, so what I am missing?

Instead, you can idmap the layers and then mount overlayfs on top of the idmapped layers as an unprivileged users.

That unfortunately requires CAP_SYS_ADMIN to idmap the layers, doesn't it? :(

brauner · 2023-05-23T15:25:53Z

mount --bind -X.mount.idmap=0:10000:10000 /my-lower /my-lower
mount --bind -X.mount.idmap=0:10000:10000 /my-upper /my-upper
unshare --mount --user --map-root
mount -t overlay overlayfs -o lowerdir:/my-lower,upperdir=/my-upper,[...] /somewhere

that works but

mount -t overlay overlayfs -o lowerdir:/my-lower,upperdir=/my-upper,[...] /somewhere
mount --bind -X.mount.idmap=0:10000:10000 /somewhere /somewhere

doesn't because overlayfs itself cannot be idmapped.

openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 18, 2021

ykuksenko changed the title ~~IDMAPPED Mounts - Kernel 5.12~~ idmapped mounts - kernel 5.12 May 18, 2021

github-actions bot added the stale-issue label Jun 19, 2021

rhatdan removed the stale-issue label Jul 9, 2021

github-actions bot added the stale-issue label Aug 9, 2021

giuseppe mentioned this issue Aug 16, 2021

slow (minutes delay) start when rinning with --userns=keep-id #11220

Closed

github-actions bot removed the stale-issue label Aug 31, 2021

zeehio mentioned this issue Apr 22, 2023

Group mapping in rootless #13090

Open

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idmapped mounts - kernel 5.12 #10374

idmapped mounts - kernel 5.12 #10374

ykuksenko commented May 18, 2021

rhatdan commented May 18, 2021

rhatdan commented May 18, 2021

giuseppe commented May 18, 2021

brauner commented May 19, 2021

rhatdan commented May 19, 2021

giuseppe commented May 19, 2021

brauner commented May 19, 2021

brauner commented May 19, 2021

giuseppe commented May 19, 2021

brauner commented May 19, 2021

brauner commented May 19, 2021

giuseppe commented May 19, 2021

giuseppe commented May 19, 2021

brauner commented May 19, 2021

github-actions bot commented Jun 19, 2021

rhatdan commented Jul 9, 2021

brauner commented Jul 9, 2021

github-actions bot commented Aug 9, 2021

giuseppe commented Aug 30, 2021

brauner commented Aug 30, 2021

rhvgoyal commented Aug 30, 2021

rhvgoyal commented Aug 30, 2021

brauner commented Aug 31, 2021

srd424 commented Jul 1, 2022

brauner commented Jul 1, 2022

srd424 commented Jul 1, 2022

srd424 commented Jul 3, 2022

phemmer commented May 8, 2023

livingsilver94 commented May 23, 2023 •

edited

Loading

brauner commented May 23, 2023

livingsilver94 commented May 23, 2023

brauner commented May 23, 2023

idmapped mounts - kernel 5.12 #10374

idmapped mounts - kernel 5.12 #10374

Comments

ykuksenko commented May 18, 2021

rhatdan commented May 18, 2021

rhatdan commented May 18, 2021

giuseppe commented May 18, 2021

brauner commented May 19, 2021

rhatdan commented May 19, 2021

giuseppe commented May 19, 2021

brauner commented May 19, 2021

brauner commented May 19, 2021

giuseppe commented May 19, 2021

brauner commented May 19, 2021

brauner commented May 19, 2021

giuseppe commented May 19, 2021

giuseppe commented May 19, 2021

brauner commented May 19, 2021

github-actions bot commented Jun 19, 2021

rhatdan commented Jul 9, 2021

brauner commented Jul 9, 2021

github-actions bot commented Aug 9, 2021

giuseppe commented Aug 30, 2021

brauner commented Aug 30, 2021

rhvgoyal commented Aug 30, 2021

rhvgoyal commented Aug 30, 2021

brauner commented Aug 31, 2021

srd424 commented Jul 1, 2022

brauner commented Jul 1, 2022

srd424 commented Jul 1, 2022

srd424 commented Jul 3, 2022

phemmer commented May 8, 2023

livingsilver94 commented May 23, 2023 • edited Loading

brauner commented May 23, 2023

livingsilver94 commented May 23, 2023

brauner commented May 23, 2023

livingsilver94 commented May 23, 2023 •

edited

Loading