fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

dtrudg · 2023-12-18T11:33:43Z

Description of the Pull Request (PR):

Broaden the --sif-fuse / sif fuse = yes functionality so that it can be used to mount SIF/SquashFS container images with FUSE in all native mode flows. Previously this was only supported in non-setuid user namespace dependent flows.

Mounts and cleanup are performed as below:

The SquashFS image is mounted, using squashfuse[_ll], into a nested temporary directory by the launcher routines in the initial unprivileged CLI process.
The container is started via the setuid or unpriv userns starter.
The starter spawns two unprivileged processes in the initial host namespaces:
- POST_START_HOST
- CLEANUP_HOST
After successful startup of a container, the POST_START_HOST process is signalled by the MASTER process. It then performs a lazy unmount (via fusermount) of the FUSE filesystem in the host namespaces, and removes the temporary directory. The FUSE filesystem remains mounted and accessible in the container namespaces until the container exits.
When the container exits, or fails to start, the CLEANUP_HOST process is signalled by the MASTER process. It checks for the existence of the temporary directory, and if it is still present unmounts the FUSE fs and removes the dir. This should only happen when container startup fails before the POST_START_HOST is able to do the same.

This approach is fairly robust against SIGKILL. As long as the lazy unmount in the host namespace has fired, a SIGKILL of the container process / runtime parent process will not leave orphan mounts, FUSE processes, or namespaces.

If processes are SIGKILL-ed during container startup, then orphan mounts / FUSE processes may be left over. There is no easy way around this without moving FUSE mounts deeper into the runtime engine, which then requires FUSE3 (which would preclude later support for extfs FUSE mounts via fuse2fs).

This fixes or addresses the following GitHub issues:

Towards Use squashfuse in native mode when 'allow kernel squashfs = no' #2216 (needs additional thought and changes to config file behaviour / compatible defaults).

Before submitting a PR, make sure you have done the following:

Read the Guidelines for Contributing, and this PR conforms to the stated requirements.
Added changes to the CHANGELOG if necessary according to the Contribution Guidelines
Added tests to validate this PR, linted with make check and tested this PR locally with a make test, and make testall if possible (see CONTRIBUTING.md).
Based this PR against the appropriate branch according to the Contribution Guidelines
Added myself as a contributor to the Contributors File

Broaden the `--sif-fuse` / `sif fuse = yes` functionality so that it can be used to mount SIF/SquashFS container images with FUSE in all native mode flows. Previously this was only supported in non-setuid user namespace dependent flows. Mounts and cleanup are performed as below: * The SquashFS image is mounted, using `squashfuse[_ll]`, into a nested temporary directory by the launcher routines in the initial unprivileged CLI process. * The container is started via the setuid or unpriv userns starter. * The starter spawns two unprivileged processes in the initial host namespaces: - POST_START_HOST - CLEANUP_HOST * After successful startup of a container, the POST_START_HOST process is signalled by the MASTER process. It then performs a lazy unmount (via fusermount) of the FUSE filesystem in the host namespaces, and removes the temporary directory. The FUSE filesystem remains mounted and accessible in the container namespaces until the container exits. * When the container exits, or fails to start, the CLEANUP_HOST process is signalled by the MASTER process. It checks for the existence of the temporary directory, and if it is still present unmounts the FUSE fs and removes the dir. This should only happen when container startup fails before the POST_START_HOST is able to do the same. This approach is fairly robust against SIGKILL. As long as the lazy unmount in the host namespace has fired, a SIGKILL of the container process / runtime parent process will not leave orphan mounts, FUSE processes, or namespaces. If processes are SIGKILL-ed during container startup, then orphan mounts / FUSE processes may be left over. There is no easy way around this without moving FUSE mounts deeper into the runtime engine, which then requires FUSE3 (which would preclude later support for extfs FUSE mounts via fuse2fs).

dtrudg self-assigned this Dec 18, 2023

dtrudg force-pushed the setuid-fuse branch 5 times, most recently from aa6d37c to 0cfef36 Compare December 18, 2023 14:43

dtrudg changed the title ~~fuse: use FUSE mount for images when setuid kernel mount disabled~~ fuse: permit SIF/Squashfs mount via FUSE in all native mode flows Dec 18, 2023

dtrudg force-pushed the setuid-fuse branch from 0cfef36 to b476fcb Compare December 18, 2023 14:50

dtrudg force-pushed the setuid-fuse branch from b476fcb to ed41304 Compare December 18, 2023 15:08

dtrudg marked this pull request as ready for review December 18, 2023 15:28

wobito approved these changes Dec 18, 2023

View reviewed changes

dtrudg merged commit aa363fe into sylabs:main Dec 18, 2023
1 check passed

dtrudg deleted the setuid-fuse branch December 18, 2023 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

dtrudg commented Dec 18, 2023 •

edited

Loading

fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

Conversation

dtrudg commented Dec 18, 2023 • edited Loading

Description of the Pull Request (PR):

This fixes or addresses the following GitHub issues:

Before submitting a PR, make sure you have done the following:

dtrudg commented Dec 18, 2023 •

edited

Loading