Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Use runc-dmz to defeat CVE-2019-5736 #3983

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,12 @@ endif

.DEFAULT: runc

runc:
runc: runc-dmz runc-bin

runc-dmz:
$(CC) -o runc-dmz -static contrib/cmd/runc-dmz/runc-dmz.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And CFLAGS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to make this a hard dependency, it should be placed out of “contrib”


runc-bin:
$(GO_BUILD) -o runc .

all: runc recvtty sd-helper seccompagent fs-idmap
Expand Down
11 changes: 11 additions & 0 deletions contrib/cmd/runc-dmz/runc-dmz.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#include <unistd.h>

extern char **environ;

int main(int argv, char **args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int main(int argv, char **args)
int main(int argc, char **argv)

{
if (argv > 0) {
return execve(args[0], args, environ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering about ensuring the first argument, i.e. the executed binary is runc.
What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument is not runc -- it's the container pid1 program.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK! Thank you for the precision! I still need to polish my runc knowledge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument is not runc -- it's the container pid1 program.

If you try to run the C binary directly, I think the first argument of args would be runc-dmz?

}
return 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this should be non-zero

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can optimize the binary footprint by eliminating glibc and executing “syscall” instruction directly, but that may happen in follow-up PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we build this binary with musl, you get a 13KB binary. With glibc it's 1.1MB. We will need to adjust our build system for this -- but there is a separate issue that distributions might find building a single binary with musl quite difficult. Speaking from the SUSE side, I don't think we even package musl for SLES so we would need to build with glibc (meaning the binary would be 1.1MB in size -- almost 100 times larger).

As you say, this can be done in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewriting this in asm will be much easier than complicating libc dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if there is a silver bullet here:

  • with glibc, you get a big binary.
  • with musl, you get a small binary but this can be painful to build against musl from some distribution.
  • with pure assembly, you will need as many files as supported architecture and choosing which file to build depending on the target architecture.

Maybe using syscall directly can be a middle ground, but you will need to defined the execve syscall number depending on the target architecture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syscall doesn't reduce the binary size unfortunately (also, libcs provide the syscall numbers for all syscalls in <sys/syscall.h>).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sad :(.
Let me know regarding the assembly files, I can give a hand for amd64 and arm64!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://gist.github.com/Zheaoli/f37bc1fb04917fdfac36d644ee69f7e9
I made a demo for amd64. The binary file size is 4.9k.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my contribution for arm64:

From 95defc03b7ec526c9c966735ebeb5bac5a9a34a0 Mon Sep 17 00:00:00 2001
From: Francis Laniel <[email protected]>
Date: Wed, 30 Aug 2023 13:06:27 +0200
Subject: [PATCH] runc-dmz: Add arm64 assembly code.

Signed-off-by: Francis Laniel <[email protected]>
---
 contrib/cmd/runc-dmz/runc-dmz-arm64.S | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 contrib/cmd/runc-dmz/runc-dmz-arm64.S

diff --git a/contrib/cmd/runc-dmz/runc-dmz-arm64.S b/contrib/cmd/runc-dmz/runc-dmz-arm64.S
new file mode 100644
index 00000000..25387e61
--- /dev/null
+++ b/contrib/cmd/runc-dmz/runc-dmz-arm64.S
@@ -0,0 +1,25 @@
+.equ EXECVE_SYSCALL_NUMBER, 221
+.equ EXIT_SYSCALL_NUMBER, 93
+
+.text
+.global _start
+_start:
+	ldr x3, [sp] // *sp contains argc, so x3 = argc
+	cmp x3, #0
+	bgt execve
+	mov x0, #0
+
+exit:
+	mov x8, EXIT_SYSCALL_NUMBER
+	svc #0
+
+execve:
+	add x1, sp, #16 // x1 = &argv[1]
+	ldr x0, [x1] // x0 = argv[1]
+	# argv[argc] is NULL and environment starts at argv[argc + 1].
+	mov x4, #8 // x4 = 8
+	mul x3, x3, x4 // x3 = (argc + 1) * 8
+	add x2, x1, x3 // x3 = &argv[argc + 1], i.e. environment.
+	mov x8, EXECVE_SYSCALL_NUMBER
+	svc #0
+	b exit
-- 
2.34.1

I tested it both through emulation and virtualization:

francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ aarch64-linux-gnu-as -g runc-dmz-arm64.S -o runc-dmz-arm64.o                                  (95defc03...) %
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ aarch64-linux-gnu-ld -static runc-dmz-arm64.o                                                 (95defc03...) %
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ du -sh a.out                                                                                  (95defc03...) %
4,0K    a.out
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ file a.out                                                                                    (95defc03...) %
a.out: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), statically linked, with debug_info, not stripped
francis@pwmachine:~/Codes/kinvolk/runc/contrib/cmd/runc-dmz$ qemu-aarch64-static ./a.out /bin/ls -al                                                       (95defc03...) %
total 32
drwxrwxr-x 2 francis francis 4096 août  30 13:09 .
drwxrwxr-x 7 francis francis 4096 août  29 17:23 ..
-rw-rw-r-- 1 francis francis 1213 août  30 13:06 0001-runc-dmz-Add-arm64-assembly-code.patch
-rwxrwxr-x 1 francis francis 1832 août  30 13:09 a.out
-rw-rw-r-- 1 francis francis 2168 août  30 13:09 runc-dmz-arm64.o
-rw-rw-r-- 1 francis francis  524 août  30 13:06 runc-dmz-arm64.S
-rw-rw-r-- 1 francis francis  168 août  29 17:23 runc-dmz.c
-rw-rw-r-- 1 francis francis  700 août  29 18:59 runc-dmz.s
# Inside arm64 VM:
root@vm-arm64:~/share/kinvolk/runc/contrib/cmd/runc-dmz# lscpu | head -1
Architecture:                       aarch64
root@vm-arm64:~/share/kinvolk/runc/contrib/cmd/runc-dmz# ./a.out /bin/ls -al
total 32
drwxrwxr-x 2 1000 1000 4096 Aug 30 11:09 .
drwxrwxr-x 7 1000 1000 4096 Aug 29 15:23 ..
-rw-rw-r-- 1 1000 1000 1213 Aug 30 11:06 0001-runc-dmz-Add-arm64-assembly-code.patch
-rwxrwxr-x 1 1000 1000 1832 Aug 30 11:09 a.out
-rw-rw-r-- 1 1000 1000  524 Aug 30 11:06 runc-dmz-arm64.S
-rw-rw-r-- 1 1000 1000 2168 Aug 30 11:09 runc-dmz-arm64.o
-rw-rw-r-- 1 1000 1000  168 Aug 29 15:23 runc-dmz.c
-rw-rw-r-- 1 1000 1000  700 Aug 29 16:59 runc-dmz.s

Feel free to apply the patch! In any case, it was funny to write arm64 assembly from scratch as it was a long time I did not do that.
If you see any problem with the code, share your feedback and I will address your comments!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe there still have some design problems for runc-dmz proposal.
Please see: #3987 (comment)
But I can't confirm, if you can check there is no problem for runc-dmz propsal, it will be appreciated. Because I don't want to waste contributor's time for a wrong proposal.

1 change: 1 addition & 0 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -492,6 +492,7 @@ func (c *Container) commandTemplate(p *Process, childInitPipe *os.File, childLog
cmd.Env = append(cmd.Env,
"_LIBCONTAINER_INITPIPE="+strconv.Itoa(stdioFdCount+len(cmd.ExtraFiles)-1),
"_LIBCONTAINER_STATEDIR="+c.root,
"_LIBCONTAINER_DMZFD=-1",
)

cmd.ExtraFiles = append(cmd.ExtraFiles, childLogPipe)
Expand Down
13 changes: 11 additions & 2 deletions libcontainer/init_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,12 @@ func startInitialization() (retErr error) {
return err
}

// Get runc-dmz fds.
dmzFd, err := strconv.Atoi(os.Getenv("_LIBCONTAINER_DMZFD"))
if err != nil {
return fmt.Errorf("dmzFd error: %w", err)
}

// clear the current process's environment to clean any libcontainer
// specific env vars.
os.Clearenv()
Expand All @@ -201,17 +207,18 @@ func startInitialization() (retErr error) {
}()

// If init succeeds, it will not return, hence none of the defers will be called.
return containerInit(it, pipe, consoleSocket, fifofd, logFD, mountFds{sourceFds: mountSrcFds, idmapFds: idmapFds})
return containerInit(it, pipe, consoleSocket, fifofd, logFD, dmzFd, mountFds{sourceFds: mountSrcFds, idmapFds: idmapFds})
}

func containerInit(t initType, pipe *os.File, consoleSocket *os.File, fifoFd, logFd int, mountFds mountFds) error {
func containerInit(t initType, pipe *os.File, consoleSocket *os.File, fifoFd, logFd, dmzFd int, mountFds mountFds) error {
var config *initConfig
if err := json.NewDecoder(pipe).Decode(&config); err != nil {
return err
}
if err := populateProcessEnvironment(config.Env); err != nil {
return err
}

switch t {
case initSetns:
// mount and idmap fds must be nil in this case. We don't mount while doing runc exec.
Expand All @@ -224,6 +231,7 @@ func containerInit(t initType, pipe *os.File, consoleSocket *os.File, fifoFd, lo
consoleSocket: consoleSocket,
config: config,
logFd: logFd,
dmzFd: dmzFd,
}
return i.Init()
case initStandard:
Expand All @@ -233,6 +241,7 @@ func containerInit(t initType, pipe *os.File, consoleSocket *os.File, fifoFd, lo
parentPid: unix.Getppid(),
config: config,
fifoFd: fifoFd,
dmzFd: dmzFd,
logFd: logFd,
mountFds: mountFds,
}
Expand Down
19 changes: 19 additions & 0 deletions libcontainer/integration/utils_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,16 @@ func init() {
// Figure out path to get-images.sh. Note it won't work
// in case the compiled test binary is moved elsewhere.
_, ex, _, _ := runtime.Caller(0)
// make and copy runc-dmg
rootDir, err := filepath.Abs(filepath.Join(filepath.Dir(ex), "..", ".."))
if err != nil {
panic(err)
}
nowPath := getExeDir()
dmzMake, err := exec.Command("gcc", "-o", filepath.Join(nowPath, "integration.test-dmz"), "-static", filepath.Join(rootDir, "contrib/cmd/runc-dmz/runc-dmz.c")).CombinedOutput()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn’t be here, and the dmz binary should be just in the PATH?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just embed the dmz binary into the main binary

if err != nil {
panic(fmt.Errorf("make runc-dmz error %w (output: %s)", err, dmzMake))
}
getImages, err := filepath.Abs(filepath.Join(filepath.Dir(ex), "..", "..", "tests", "integration", "get-images.sh"))
if err != nil {
panic(err)
Expand All @@ -48,6 +58,15 @@ func init() {
}
}

func getExeDir() string {
exePath, err := os.Executable()
if err != nil {
panic(err)
}
res, _ := filepath.EvalSymlinks(filepath.Dir(exePath))
return res
}

func ptrInt(v int) *int {
return &v
}
Expand Down
185 changes: 16 additions & 169 deletions libcontainer/nsenter/cloned_binary.c
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ int memfd_create(const char *name, unsigned int flags)
#endif

#define CLONED_BINARY_ENV "_LIBCONTAINER_CLONED_BINARY"
#define RUNC_MEMFD_COMMENT "runc_cloned:/proc/self/exe"
#define RUNC_MEMFD_COMMENT "runc_cloned:runc-dmz"
/*
* There are newer memfd seals (such as F_SEAL_FUTURE_WRITE and F_SEAL_EXEC),
* which we use opportunistically. However, this set is the original set of
Expand All @@ -142,162 +142,6 @@ int memfd_create(const char *name, unsigned int flags)
#define RUNC_MEMFD_MIN_SEALS \
(F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE)

static void *must_realloc(void *ptr, size_t size)
{
void *old = ptr;
do {
ptr = realloc(old, size);
} while (!ptr);
return ptr;
}

/*
* Verify whether we are currently in a self-cloned program (namely, is
* /proc/self/exe a memfd). F_GET_SEALS will only succeed for memfds (or rather
* for shmem files), and we want to be sure it's actually sealed.
*/
static int is_self_cloned(void)
{
int fd, seals = 0, is_cloned = false;
struct stat statbuf = { };
struct statfs fsbuf = { };

fd = open("/proc/self/exe", O_RDONLY | O_CLOEXEC);
if (fd < 0) {
write_log(ERROR, "cannot open runc binary for reading: open /proc/self/exe: %m");
return -ENOTRECOVERABLE;
}

/*
* Is the binary a fully-sealed memfd? We don't need CLONED_BINARY_ENV for
* this, because you cannot write to a sealed memfd no matter what.
*/
seals = fcntl(fd, F_GET_SEALS);
if (seals >= 0) {
write_log(DEBUG, "checking /proc/self/exe memfd seals: 0x%x", seals);
is_cloned = (seals & RUNC_MEMFD_MIN_SEALS) == RUNC_MEMFD_MIN_SEALS;
if (is_cloned)
goto out;
}

/*
* All other forms require CLONED_BINARY_ENV, since they are potentially
* writeable (or we can't tell if they're fully safe) and thus we must
* check the environment as an extra layer of defence.
*/
if (!getenv(CLONED_BINARY_ENV)) {
is_cloned = false;
goto out;
}

/*
* Is the binary on a read-only filesystem? We can't detect bind-mounts in
* particular (in-kernel they are identical to regular mounts) but we can
* at least be sure that it's read-only. In addition, to make sure that
* it's *our* bind-mount we check CLONED_BINARY_ENV.
*/
if (fstatfs(fd, &fsbuf) >= 0)
is_cloned |= (fsbuf.f_flags & MS_RDONLY);

/*
* Okay, we're a tmpfile -- or we're currently running on RHEL <=7.6
* which appears to have a borked backport of F_GET_SEALS. Either way,
* having a file which has no hardlinks indicates that we aren't using
* a host-side "runc" binary and this is something that a container
* cannot fake (because unlinking requires being able to resolve the
* path that you want to unlink).
*/
if (fstat(fd, &statbuf) >= 0)
is_cloned |= (statbuf.st_nlink == 0);

out:
close(fd);
return is_cloned;
}

/* Read a given file into a new buffer, and providing the length. */
static char *read_file(char *path, size_t *length)
{
int fd;
char buf[4096], *copy = NULL;

if (!length)
return NULL;

fd = open(path, O_RDONLY | O_CLOEXEC);
if (fd < 0)
return NULL;

*length = 0;
for (;;) {
ssize_t n;

n = read(fd, buf, sizeof(buf));
if (n < 0)
goto error;
if (!n)
break;

copy = must_realloc(copy, (*length + n) * sizeof(*copy));
memcpy(copy + *length, buf, n);
*length += n;
}
close(fd);
return copy;

error:
close(fd);
free(copy);
return NULL;
}

/*
* A poor-man's version of "xargs -0". Basically parses a given block of
* NUL-delimited data, within the given length and adds a pointer to each entry
* to the array of pointers.
*/
static int parse_xargs(char *data, int data_length, char ***output)
{
int num = 0;
char *cur = data;

if (!data || *output != NULL)
return -1;

while (cur < data + data_length) {
num++;
*output = must_realloc(*output, (num + 1) * sizeof(**output));
(*output)[num - 1] = cur;
cur += strlen(cur) + 1;
}
(*output)[num] = NULL;
return num;
}

/*
* "Parse" out argv from /proc/self/cmdline.
* This is necessary because we are running in a context where we don't have a
* main() that we can just get the arguments from.
*/
static int fetchve(char ***argv)
{
char *cmdline = NULL;
size_t cmdline_size;

cmdline = read_file("/proc/self/cmdline", &cmdline_size);
if (!cmdline)
goto error;

if (parse_xargs(cmdline, cmdline_size, argv) <= 0)
goto error;

return 0;

error:
free(cmdline);
return -EINVAL;
}

enum {
EFD_NONE = 0,
EFD_MEMFD,
Expand Down Expand Up @@ -499,12 +343,20 @@ static int clone_binary(void)
struct stat statbuf = { };
size_t sent = 0;
int fdtype = EFD_NONE;
char runcpath[PATH_MAX] = { 0 };
char dmzpath[PATH_MAX] = { 0 };

execfd = make_execfd(&fdtype);
if (execfd < 0 || fdtype == EFD_NONE)
return -ENOTRECOVERABLE;

binfd = open("/proc/self/exe", O_RDONLY | O_CLOEXEC);
if (readlink("/proc/self/exe", runcpath, PATH_MAX) < 1)
goto error;

if (snprintf(dmzpath, PATH_MAX, "%s%s", runcpath, "-dmz") < 0)
goto error;

binfd = open(dmzpath, O_RDONLY | O_CLOEXEC);
if (binfd < 0)
goto error;

Expand Down Expand Up @@ -543,24 +395,19 @@ extern char **environ;
int ensure_cloned_binary(void)
{
int execfd;
char **argv = NULL;

/* Check that we're not self-cloned, and if we are then bail. */
int cloned = is_self_cloned();
if (cloned > 0 || cloned == -ENOTRECOVERABLE)
return cloned;

if (fetchve(&argv) < 0)
return -EINVAL;

execfd = clone_binary();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary text should be embedded into the runc binary (using //go:embed) and all of this code should be moved to the Go code (like I did in #3953). But if you like, I can do that in a follow-up PR (or I can take this code and the memfd code from #3953 and make a single PR to clean this all up in one go).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I can take this code and the memfd code from #3953 and make a single PR to clean this all up in one go

Yes, I think you can take this if you think this proposal worth to try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll open a separate PR with this code and combine it with the cloned-binary patch. Thanks for working on this, I like the idea a lot!

if (execfd < 0)
return -EIO;

if (putenv(CLONED_BINARY_ENV "=1"))
char envString[PATH_MAX] = { 0 };
if (sprintf(envString, "%d", execfd) < 0)
goto error;

if (setenv("_LIBCONTAINER_DMZFD", envString, 1))
goto error;

fexecve(execfd, argv, environ);
return 0;
error:
close(execfd);
return -ENOEXEC;
Expand Down
26 changes: 26 additions & 0 deletions libcontainer/nsenter/nsenter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ import (
"io"
"os"
"os/exec"
"path/filepath"
"runtime"
"strings"
"testing"

Expand All @@ -16,6 +18,29 @@ import (
"golang.org/x/sys/unix"
)

func init() {
_, ex, _, _ := runtime.Caller(0)
// make and copy runc-dmg
rootDir, err := filepath.Abs(filepath.Join(filepath.Dir(ex), "..", ".."))
if err != nil {
panic(err)
}
nowPath := getExeDir()
dmzMake, err := exec.Command("gcc", "-o", filepath.Join(nowPath, "nsenter.test-dmz"), "-static", filepath.Join(rootDir, "contrib/cmd/runc-dmz/runc-dmz.c")).CombinedOutput()
if err != nil {
panic(fmt.Errorf("make runc-dmz error %w (output: %s)", err, dmzMake))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dmz binary should be just preinstalled in the PATH

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just embed the dmz binary into the main binary

}
}

func getExeDir() string {
exePath, err := os.Executable()
if err != nil {
panic(err)
}
res, _ := filepath.EvalSymlinks(filepath.Dir(exePath))
return res
}

func TestNsenterValidPaths(t *testing.T) {
args := []string{"nsenter-exec"}
parent, child := newPipe(t)
Expand All @@ -24,6 +49,7 @@ func TestNsenterValidPaths(t *testing.T) {
// join pid ns of the current process
fmt.Sprintf("pid:/proc/%d/ns/pid", os.Getpid()),
}
fmt.Printf("=========os.Args[0] %s\n", os.Args[0])
cmd := &exec.Cmd{
Path: os.Args[0],
Args: args,
Expand Down
Loading