Use squashfuse in native mode when 'allow kernel squashfs = no' #2216

dtrudg · 2023-09-21T10:49:32Z

Is your feature request related to a problem? Please describe.

Recent versions of SingularityCE have a singularity.conf directive that permits disabling kernel mounts of squashfs, which are performed in the setuid flow. There is no elegant fall-back:

$ singularity run docker://alpine
INFO:    Using cached SIF image
FATAL:   container creation failed: squashfs image mounts are not authorized

Describe the solution you'd like

squashfuse is widely available, and a recent version is even bundled with SingularityCE.

It should be possible for the kernel mount to fall back to a squashfuse mount in the setuid flow.

The text was updated successfully, but these errors were encountered:

DrDaveD · 2023-09-22T16:21:13Z

Note that for security reasons SingularityCE should avoid invoking an external program in setuid flow while having the ability to increase its privileges to root, and since it's not at that point in an unprivileged root-mapped user+mount namespace I am planning to implement the similar functionality in Apptainer based on the code for the --fusemount feature, where the starter-suid does the mount and then passes an open file descriptor to the FUSE program.

dtrudg · 2023-09-22T16:45:12Z

Note that for security reasons SingularityCE should avoid invoking an external program in setuid flow while having the ability to increase its privileges to root,

I'm not entirely clear what you mean? There are various places / ways we can do things in the setuid flow that don't involve the possibility of privilege escalation of an external binary... whether before the starter is invoked, or in dropped-privileged portions of the code.

and since it's not at that point in an unprivileged root-mapped user+mount namespace

Yes - it's important to us that this is not dependendent on being in an unprivileged root-mapped user namespace. The SLES 12 kernel, which we need to support, does not include support for FUSE mounts in an unprivileged user namespace.

I am planning to implement the similar functionality in Apptainer based on the code for the --fusemount feature, where the starter-suid does the mount and then passes an open file descriptor to the FUSE program.

Thanks for the link to your issue... that was along the lines of my initial thoughts, also.

DrDaveD · 2023-09-22T16:55:48Z

If it isn't in an unprivileged root-mapped user+mount namespace, then the FUSE program won't be able to do its own mounting without privileges ... unless you want to depend on it invoking the setuid-root fusermount/fusermount3, which as you know is problematic. So the compromise is to do a generic /dev/fuse mount first in setuid mode, then pass the file descriptor to the unprivileged FUSE program, as the --fusemount option does. That does require the fuse3 library to work, which is likely to cause some pain as I noted in the Apptainer issue. The fuse3 library accepts a /dev/fd referencing an open file descriptor in place of the mount point parameter to avoid having to do the mount itself.

dtrudg · 2023-09-25T08:26:53Z

We haven't decided the exact flow for this so far. I don't yet subscribe to the view that using a fd mount and fuse3 is definitively the best or only suitable way... but it is the first thing being considered. It's quite possible that SingularityCE will take a different approach to Apptainer, depending on the trade-offs that are most appropriate for our respective users.

If it isn't in an unprivileged root-mapped user+mount namespace, then the FUSE program won't be able to do its own mounting without privileges ... unless you want to depend on it invoking the setuid-root fusermount/fusermount3, which as you know is problematic.

At some point, we have to accept that set-uid is required in certain places, in order to get particular behaviours that userns doesn't provide, or to support older systems that lack certain kernel features/backports. There are always trade-offs... there is no solution that has zero problems. Whether we have more privileged code in Singularity, or call out to a distro provided tool is an open question.

So the compromise is to do a generic /dev/fuse mount first in setuid mode, then pass the file descriptor to the unprivileged FUSE program, as the --fusemount option does. That does require the fuse3 library to work, which is likely to cause some pain as I noted in the Apptainer issue. The fuse3 library accepts a /dev/fd referencing an open file descriptor in place of the mount point parameter to avoid having to do the mount itself.

Right - I'm not yet sure about bundling more FUSE binaries, where older distributions are shipping fuse2 versions, unless we have to.

fuse2fs does seem to be the biggest blocker for this approach - as far as I'm aware it cannot currently be built directly for fuse3, although it will work if a v3 fusermount is available?

Anyway.... these are some of the things that are being considered at this time. We're sure that each project will address the need in the appropriate way for themselves, while considering if the approach on the other side of the fork is applicable.

dtrudg · 2023-11-06T14:07:49Z

Having picked this up again, starting to poke around the code and think about it, there seem to be 3 basic approaches:

A - Rely on the availability of unprivileged user namespace creation to support FUSE in native mode, and wire up in a similar manner to that used in OCI mode. This is not ideal as we know of several sites / users who choose to disable unprivileged namespace creation due to their security posture. These sites are also some of those most likely to be interested in avoiding kernel mounts if possible. Also unsupported on SLES12.

B - Use the --fusemount mechanism or a similar approach, with pre-provisioning of an fd based mount inside the mount namespace. This is dependent on FUSE3, so is somewhat problematic for distributions which are generally using FUSE2. We don't really want to force usage of our own bundled FUSE3 squashfuse etc. Also, fuse2fs doesn't support FUSE3, which would prevent use of (much) older extfs format singularity images. I'd guess that sites who can't move to newer distros, and rootless runtimes, are also some of the most likely to need to run rather old extfs container images.

C - Follow the pattern of --sif-fuse in which the container rootfs is mounted in the host mount namespace, prior to invoking the starter, and cleaned up at container exit. This does pollute the host tmp dir with a mounted rootfs, and also relies on the distribution's setuid FUSE helpers to perform the FUSE mount. I would assume that sites that are happy with FUSE generally do have the setuid helpers in place, to allow e.g. sshfs mounts by users. There may be cleanup issues on job kill... but those may be avoided by the tendency of schedulers to now use cgroup based process monitoring, making it possible to kill the cgroup processes.

Leaning towards C at present due to the wish to support old images (extfs) and SLES12.

Deprecate the explicit `--sif-fuse` flag and `sif fuse` directive for `singularity.conf`. These were previously used to enable experimental FUSE mount of SIF/SquashFS containers. Modify image handling so that we now try squashfuse mounts automatically, with fall back to temporary sandbox extraction, when: * squashfs kernel mounts have been disabled in `singularity.conf` * we are running in a non-setuid / user namespace flow. Fixes sylabs#2216

Deprecate the explicit `--sif-fuse` flag and `sif fuse` directive for `singularity.conf`. These were previously used to enable experimental FUSE mount of SIF/SquashFS containers. Modify image handling so that we now try squashfuse mounts automatically, with fall back to temporary sandbox extraction, when: * squashfs kernel mounts have been disabled in `singularity.conf` * we are running in a non-setuid / user namespace flow. Add a `--tmp-sandbox` flag to allow forcing extraction to a temporary sandbox when a kernel mount or FUSE mount would otherwise be used. Fixes sylabs#2216

dtrudg added the enhancement New feature or request label Sep 21, 2023

dtrudg added this to the SingularityCE 4.1.0 milestone Sep 21, 2023

dtrudg added the roadmap Features / changes that are scheduled to be implemented label Sep 21, 2023

dtrudg mentioned this issue Dec 18, 2023

fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

Merged

dtrudg mentioned this issue Dec 18, 2023

fuse: automatically use squashfuse for images, deprecate --sif-fuse #2451

Merged

dtrudg closed this as completed in #2451 Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use squashfuse in native mode when 'allow kernel squashfs = no' #2216

Use squashfuse in native mode when 'allow kernel squashfs = no' #2216

dtrudg commented Sep 21, 2023

DrDaveD commented Sep 22, 2023

dtrudg commented Sep 22, 2023

DrDaveD commented Sep 22, 2023

dtrudg commented Sep 25, 2023

dtrudg commented Nov 6, 2023

Use squashfuse in native mode when 'allow kernel squashfs = no' #2216

Use squashfuse in native mode when 'allow kernel squashfs = no' #2216

Comments

dtrudg commented Sep 21, 2023

DrDaveD commented Sep 22, 2023

dtrudg commented Sep 22, 2023

DrDaveD commented Sep 22, 2023

dtrudg commented Sep 25, 2023

dtrudg commented Nov 6, 2023