Handle seccomp policies that don't include ptrace(2) #846

apyrgio · 2024-06-26T10:29:48Z

Problem

Our recent gVisor integration (#590) requires allowlisting the ptrace(2) system call in the outer container, in order to spawn the inner container with runsc. Nowadays, this is the default [1], but we have encountered systems that don't allow this system call, and thus Dangerzone cannot run in them, at least out of the box.

Affected systems are:

Ubuntu Focal (via OpenSUSE's repo), with Podman version 3.4.2
Ubuntu Jammy, with Podman version 3.4.4
Debian Bullseye, with Podman version 3.0.1
Older Docker Desktop releases, e.g., with runc version 1.1.5

Background

Before explaining how we plan to fix this issue, we'll give some background on the ptrace(2) system call.

First of all, why is this syscall dangerous in the first place? The main reason is that a malicious process can use it in order to escalate its privileges, or thwart some system protections. A real-life example is CVE-2019-2054. This CVE is the reason why ptrace(2) is not allowed in Linux kernels < 4.8, but it's not the only ptrace-related CVE that has been reported.

In order to control the scope of ptrace(2) system call, the Linux kernel offers the following mechanisms:

The CAP_SYS_PTRACE Linux capability. If this capability is enabled, then the process can have full tracing capabilities, such as tracing other processes that it has not started. If this capability is not granted, then the usage of ptrace(2) is still allowed, but restricted through the mechanisms listed below.
Disabling the system call (or arguments to it) via a seccomp policy. For instance:
- Docker originally had disabled ptrace(2) in their seccomp policy, and then re-enabled it for kernels >= 4.8.
- Podman similarly lifted this restriction a few years later.
- Containerd did so around the same time.
The YAMA Linux Security Module ptrace_scope setting. This setting controls the behavior of ptrace(2) system-wide. In the Linux platforms we support, the default seems to be 1, i.e., allow ptrace(2) only for processes that the parent has direct relationship with (e.g., child processes).

[1] See Podman's seccomp policy, Docker's seccomp policy, and containerd's seccomp policy.

The text was updated successfully, but these errors were encountered:

apyrgio · 2024-06-26T11:22:25Z

Requirements

Our solution must take into account the following:

It must work on kernels >= 4.8.
It must work with the default ptrace_scope on Linux systems.
It must work on older Podman and Docker Desktop releases.
- Yes, these releases may be insecure by now, but if we don't support them and our users cannot update to newer ones, they will just open the suspicious file.
The user must not interact with the system in order to make Dangerzone work.

On (1), we have verified that none of the systems we support has Linux kernel < 4.8. This applies also to Windows (WSL2) and macOS (HyperKit). On (2), we have seen that the default ptrace_scope is 1 in platforms we support. This scope is supported by gVisor.

Solution

For Podman versions < 4, we already have a workaround in our code that starts the process with Podman's default seccomp policy as of June 6th, 2024 (see seccomp.json):

dangerzone/dangerzone/isolation_provider/container.py

Lines 117 to 119 in c2a47ec

    
           if Container.get_runtime_version() < (4, 0): 
        
               seccomp_json_path = get_resource_path("seccomp.gvisor.json") 
        
               security_args += ["--security-opt", f"seccomp={seccomp_json_path}"]

For Docker Desktop, we have not a similar workaround, because we don't know exactly when was this restriction lifted. We do know that Containerd 1.6.7 first allowed the ptrace() syscall, and that Docker Desktop 4.12.0 included this Containerd version. However, we have tested with Docker Desktop release 4.19.0 on macOS, and the ptrace() syscall was disabled, so we're not sure.

So, our suggestion is to:

Check if the Docker Desktop release is recent. We have had good results with Docker Desktop 4.27.0, for example.
If the release is older, spawn a container using the stored seccomp.json file we have for Podman as well.

This way, older releases will use our Podman seccomp policy, which will guarantee that ptrace(2) will be allowed. In case an older Docker Desktop release allows the ptrace(2) system call, our seccomp policy will mask it, but the differences should be negligible.

Newer releases will use their default seccomp policy, and thus we will not mask any security-related fixes that happen in the future.

Alternatives

Docker also allows the ptrace(2) system call, if CAP_SYS_PTRACE is specified in the container invocation. Note that we don't add this Linux capability in the current implementation:

dangerzone/dangerzone/isolation_provider/container.py

Lines 123 to 124 in c2a47ec

    
           security_args += ["--cap-drop", "all"] 
        
           security_args += ["--cap-add", "SYS_CHROOT"]

Why is that? Because using the CAP_SYS_PTRACE capability, the outer container will be able to trace any process, which significantly increases our attack surface.

For this reason, we choose not to go down that path, and simply pass our own seccomp policy.

apyrgio · 2024-06-26T12:16:34Z

It seems that docker version gives an output that is not friendly to parsing, if we just want the Docker Desktop release (i.e., the 4.27.2 part):

$ docker version -f {{.Server.Platform.Name}}
Docker Desktop 4.27.2 (137060)
$ docker version -f json
{
    "Client": {
        "CloudIntegration": "v1.0.35+desktop.10",
        "Version": "25.0.3",
        "ApiVersion": "1.44",
        "DefaultAPIVersion": "1.44",
        "GitCommit": "4debf41",
        "GoVersion": "go1.21.6",
        "Os": "darwin",
        "Arch": "arm64",
        "BuildTime": "Tue Feb  6 21:13:26 2024",
        "Context": "default"
    },
    "Server": {
        "Platform": {
            "Name": "Docker Desktop 4.27.2 (137060)"
        },
        "Components": [
            {
                "Name": "Engine",
                "Version": "25.0.3",
                "Details": {
                    "ApiVersion": "1.44",
                    "Arch": "arm64",
                    "BuildTime": "Tue Feb  6 21:14:22 2024",
                    "Experimental": "false",
                    "GitCommit": "f417435",
                    "GoVersion": "go1.21.6",
                    "KernelVersion": "6.6.12-linuxkit",
                    "MinAPIVersion": "1.24",
                    "Os": "linux"
                }
            },
            {
                "Name": "containerd",
                "Version": "1.6.28",
                "Details": {
                    "GitCommit": "ae07eda36dd25f8a1b98dfbf587313b99c0190bb"
                }
            },
            {
                "Name": "runc",
                "Version": "1.1.12",
                "Details": {
                    "GitCommit": "v1.1.12-0-g51d5e94"
                }
            },
            {
                "Name": "docker-init",
                "Version": "0.19.0",
                "Details": {
                    "GitCommit": "de40ad0"
                }
            }
        ],
        "Version": "25.0.3",
        "ApiVersion": "1.44",
        "MinAPIVersion": "1.24",
        "GitCommit": "f417435",
        "GoVersion": "go1.21.6",
        "Os": "linux",
        "Arch": "arm64",
        "KernelVersion": "6.6.12-linuxkit",
        "BuildTime": "2024-02-06T21:14:22.000000000+00:00"
    }
}

We can use the Docker Engine version instead:

$ docker version -f {{.Server.Version}}
25.0.3

Most likely, we can consider anything greater than 25.0 as safe.

We are aware that some Docker Desktop releases before 25.0.0 ship with a seccomp policy which disables the `ptrace(2)` system call. In such cases, we opt to use our own seccomp policy which allows this system call. This seccomp policy is the default one in the latest releases of Podman, and we use it in Linux distributions where Podman version is < 4.0. Fixes #846

apyrgio · 2024-09-24T12:19:22Z

Based on discussions in #865, we will actually enforce this seccomp policy across all container engines. There are two reasons for doing so:

The seccomp policies that are shipped by default in various container engines tend to get more lax over time, but our application needs a very specific set of syscalls, now that we have integrated gVisor. So, we can freeze the list of allowed syscalls, and thus not broaden the attack surface of the outer container.
There seem to be container engines (like Orbstack, see All conversions fail with "Unspecified error", using Orbstack #908) which use stricter seccomp policies. Our detection method that works for Docker Desktop, and enables our custom seccomp policy, does not work for them. By uniformly setting our own seccomp policy, we can interoperate with these container engines as well, even though they are not officially supported.

apyrgio added bug Something isn't working container security labels Jun 26, 2024

apyrgio added this to the 0.7.0 milestone Jun 26, 2024

apyrgio mentioned this issue Jun 26, 2024

Use a custom seccomp policy for older Docker Desktop releases #847

Merged

apyrgio closed this as completed in e7e3430 Jun 26, 2024

apyrgio changed the title ~~Handle seccomp policies that don't include ptrace()~~ Handle seccomp policies that don't include ptrace(2) Jun 26, 2024

almet mentioned this issue Jul 11, 2024

Dangerzone not compatible with colima? #865

Open

apyrgio mentioned this issue Sep 24, 2024

Always use our own seccomp policy as a default. #926

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle seccomp policies that don't include ptrace(2) #846

Handle seccomp policies that don't include ptrace(2) #846

apyrgio commented Jun 26, 2024

apyrgio commented Jun 26, 2024 •

edited

Loading

apyrgio commented Jun 26, 2024

apyrgio commented Sep 24, 2024

Handle seccomp policies that don't include ptrace(2) #846

Handle seccomp policies that don't include ptrace(2) #846

Comments

apyrgio commented Jun 26, 2024

Problem

Background

apyrgio commented Jun 26, 2024 • edited Loading

Requirements

Solution

Alternatives

apyrgio commented Jun 26, 2024

apyrgio commented Sep 24, 2024

apyrgio commented Jun 26, 2024 •

edited

Loading