Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuse: permit SIF/Squashfs mount via FUSE in all native mode flows #2450

Merged
merged 1 commit into from
Dec 18, 2023

Conversation

dtrudg
Copy link
Member

@dtrudg dtrudg commented Dec 18, 2023

Description of the Pull Request (PR):

Broaden the --sif-fuse / sif fuse = yes functionality so that it can be used to mount SIF/SquashFS container images with FUSE in all native mode flows. Previously this was only supported in non-setuid user namespace dependent flows.

Mounts and cleanup are performed as below:

  • The SquashFS image is mounted, using squashfuse[_ll], into a nested temporary directory by the launcher routines in the initial unprivileged CLI process.
  • The container is started via the setuid or unpriv userns starter.
  • The starter spawns two unprivileged processes in the initial host namespaces:
    • POST_START_HOST
    • CLEANUP_HOST
  • After successful startup of a container, the POST_START_HOST process is signalled by the MASTER process. It then performs a lazy unmount (via fusermount) of the FUSE filesystem in the host namespaces, and removes the temporary directory. The FUSE filesystem remains mounted and accessible in the container namespaces until the container exits.
  • When the container exits, or fails to start, the CLEANUP_HOST process is signalled by the MASTER process. It checks for the existence of the temporary directory, and if it is still present unmounts the FUSE fs and removes the dir. This should only happen when container startup fails before the POST_START_HOST is able to do the same.

This approach is fairly robust against SIGKILL. As long as the lazy unmount in the host namespace has fired, a SIGKILL of the container process / runtime parent process will not leave orphan mounts, FUSE processes, or namespaces.

If processes are SIGKILL-ed during container startup, then orphan mounts / FUSE processes may be left over. There is no easy way around this without moving FUSE mounts deeper into the runtime engine, which then requires FUSE3 (which would preclude later support for extfs FUSE mounts via fuse2fs).

This fixes or addresses the following GitHub issues:

Before submitting a PR, make sure you have done the following:

@dtrudg dtrudg self-assigned this Dec 18, 2023
@dtrudg dtrudg force-pushed the setuid-fuse branch 5 times, most recently from aa6d37c to 0cfef36 Compare December 18, 2023 14:43
@dtrudg dtrudg changed the title fuse: use FUSE mount for images when setuid kernel mount disabled fuse: permit SIF/Squashfs mount via FUSE in all native mode flows Dec 18, 2023
Broaden the `--sif-fuse` / `sif fuse = yes` functionality so that it
can be used to mount SIF/SquashFS container images with FUSE in all
native mode flows. Previously this was only supported in non-setuid
user namespace dependent flows.

Mounts and cleanup are performed as below:

* The SquashFS image is mounted, using `squashfuse[_ll]`, into a nested
temporary directory by the launcher routines in the initial
unprivileged CLI process.
* The container is started via the setuid or unpriv userns starter.
* The starter spawns two unprivileged processes in the initial host
namespaces:
  - POST_START_HOST
  - CLEANUP_HOST
* After successful startup of a container, the POST_START_HOST process
is signalled by the MASTER process. It then performs a lazy
unmount (via fusermount) of the FUSE filesystem in the host
namespaces, and removes the temporary directory. The FUSE filesystem
remains mounted and accessible in the container namespaces until the
container exits.
* When the container exits, or fails to start, the CLEANUP_HOST
process is signalled by the MASTER process. It checks for the
existence of the temporary directory, and if it is still present
unmounts the FUSE fs and removes the dir. This should only happen when
container startup fails before the POST_START_HOST is able to do the
same.

This approach is fairly robust against SIGKILL. As long as the lazy
unmount in the host namespace has fired, a SIGKILL of the container
process / runtime parent process will not leave orphan mounts, FUSE
processes, or namespaces.

If processes are SIGKILL-ed during container startup, then orphan
mounts / FUSE processes may be left over. There is no easy way around
this without moving FUSE mounts deeper into the runtime engine, which
then requires FUSE3 (which would preclude later support for extfs FUSE
mounts via fuse2fs).
@dtrudg dtrudg marked this pull request as ready for review December 18, 2023 15:28
@dtrudg dtrudg merged commit aa363fe into sylabs:main Dec 18, 2023
1 check passed
@dtrudg dtrudg deleted the setuid-fuse branch December 18, 2023 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants