fxe

Binary execution across Linux mount-namespaces

fxe is a small, pure-Rust Linux program which demonstrates how to execute binaries across mount-namespaces.

This technique is suitable for several usecases, as it allows to ship minimal containers with specialized binaries and then to run them in namespaces where they are not available. For example, a bare-minimal ContainerLinux OS can augmented with a mount-foo container to mount foo volumes directly on the host.

This program is provided for illustrative purpose only, it is not supposed to be run as-is in production.

How this works

As the name suggests, fxe core functionality is built around fexecve(3). Short description from its manpage says:

fexecve() performs the same task as execve(), with the difference that the file to be executed is specified
via a file descriptor rather than via a pathname.

This allows fxe to get an handle to a binary available inside its container (i.e. mount-namespace), move to a different target, and execute the binary there.

Demo

This repository contains a demo program which runs a modinfo crc16 using the busybox container. However, the directory containing kernel modules is not available inside the container; instead the process changes its mount-namespace to the target one (e.g. host) and runs the modinfo binary there.

A pre-built binary is available as a Docker image at quay.io/lucab/fxe. To try it, simply do a make run:

$ make run

docker run --privileged --pid=host quay.io/lucab/fxe:latest /fxe /proc/1/ns/mnt

filename:       /lib/modules/4.11.0-1-amd64/kernel/lib/crc16.ko
description:    CRC16 calculations
license:        GPL
depends:        
intree:         Y
vermagic:       4.11.0-1-amd64 SMP mod_unload modversions

This will use /proc/1/ns/mnt as the host mount-namespace target. Other targets can be used, as long as they are bind-mounted inside the container.

The --privileged flag is a shortcut to add CAP_SYS_ADMIN and CAP_SYS_CHROOT (required by setns(2)) and to prevent the default SECCOMP filter to block it. Both can be allowed with finer granularity settings (this is left as an exercise).

The --pid=host flag is required for proper fexecve() execution. It can be changed to any arbitrary target, here it is set to host only for demonstration purpose.

Caveats

Due to how setns(2) and fexecve(3) are implemented on Linux, there are some conditions imposed on the running environment:

setns: CAP_SYS_ADMIN and CAP_SYS_CHROOT are required
setns: the target mount-namespace must be available as a file descriptor
setns: to be allowed to change mount-namespace, the process must be single-thread
fexecve: /proc must be available
fexecve: source and target processes must be running in the same PID-namespace
fexecve: scripts and dynamic binaries resources must be available in the target

See notes in both manpages for further details and explanations.

Compilation

The demo in this repository can be quickly built via make.

Pre-requisites are:

make
a stable rustc/cargo toolchain for the x86_64-unknown-linux-musl target (available via rustup)
docker run available to the current user

This currently depends on a pending PR to nix.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fxe

How this works

Demo

Caveats

Compilation

About

Releases

Packages

Languages

lucab/fxe-rs

Folders and files

Latest commit

History

Repository files navigation

fxe

How this works

Demo

Caveats

Compilation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages