Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gVisor inside Docker #2

Closed
shg8 opened this issue Sep 3, 2024 · 16 comments
Closed

gVisor inside Docker #2

shg8 opened this issue Sep 3, 2024 · 16 comments

Comments

@shg8
Copy link

shg8 commented Sep 3, 2024

Thanks for the great project! Running gVisor inside a Docker container seems to require a privileged container. Additionally, I had to use this script to enable nested cgroup. Otherwise, I would get the following error:

Sandbox runtime failed: Sandbox failed to start: Command '['/tmp/gvisor/runsc', '--rootless=true', '--directfs=false', '--network=host', '--ignore-cgroups=false', '--root=/tmp/sandbox_cjm736ki/runtime', '--debug=false', '--debug-log=/tmp/sandbox_cjm736ki/logs/', 'run', '--bundle=/tmp/sandbox_cjm736ki/bundle', 'sandbox']' returned non-zero exit status 128.; stderr: running container: creating container: cannot set up cgroup for root: configuring cgroup: write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy; logs: defaultdict(<class 'list'>, {'runsc.log.20240903-224047.843558.run.txt': ['W0903 22:40:47.845254 120 util.go:64] FATAL ERROR: running container: creating container: cannot set up cgroup for root: configuring cgroup: write /sys/fs/cgroup/cgroup.subtree_control: device or resource busy', 'W0903 22:40:47.845367 120 main.go:231] Failure to execute command, err: 1']})

@EtiennePerot
Copy link
Owner

EtiennePerot commented Sep 4, 2024

Hm, from my testing this was only necessary when enforcing RAM limits (which uses the cgroup controller to create a memory-constrained cgroup), and there is code to check that /sys/fs/cgroup/cgroup.subtree_control is writable and return a more readable error message when it is not in this case:

https://github.com/EtiennePerot/open-webui-code-execution/blob/c71ff8fdc4b2d4d4b6f958e9e35c63a792ae45e1/open-webui/functions/run_code.py#L457-L487

Does this error occur even when you have RAM limiting disabled (as is the default setting)?

Running gVisor inside a Docker container seems to require a privileged container

It's not necessary to add --privileged but it does need some extra capabilities. I will add more detailed Docker setup instructions soon.

@EtiennePerot
Copy link
Owner

I have added setup instructions for Docker. See if they can work around this problem without requiring --privileged.

@EtiennePerot
Copy link
Owner

Can you check if adding --cgroupns=host to the docker run invocation works around this issue?

@shg8
Copy link
Author

shg8 commented Sep 5, 2024

Thanks for the update. I checked out the setup instructions but wasn't able to get it to work without privileged. Here's a list of the things I've tried to no avail:

  • Creating a custom seccomp profile that allows the unshare syscall
  • Setting seccomp to unconfined (as instructed in the docs)
  • Adding the SYS_ADMIN capabilities

Here's the output of the self test:

➜  ~ sudo docker run --rm \
    --security-opt=seccomp=unconfined \
    --security-opt=label=type:container_engine_t \
    --mount=type=bind,source="$(pwd)/open-webui-code-execution",target=/selftest \
    ghcr.io/open-webui/open-webui:main \
    python3 /selftest/open-webui/tools/run_code.py --use-sample-code --debug
Emitting status event: {'status': 'in_progress', 'description': 'Checking if environment supports sandboxing...', 'done': False}
Event: {'type': 'status', 'data': {'status': 'in_progress', 'description': 'Checking if environment supports sandboxing...', 'done': False}}
Emitting status event: {'status': 'in_progress', 'description': 'Auto-installing gVisor...', 'done': False}
Event: {'type': 'status', 'data': {'status': 'in_progress', 'description': 'Auto-installing gVisor...', 'done': False}}
Emitting status event: {'status': 'in_progress', 'description': 'Initializing sandbox configuration...', 'done': False}
Event: {'type': 'status', 'data': {'status': 'in_progress', 'description': 'Initializing sandbox configuration...', 'done': False}}
Emitting status event: {'status': 'in_progress', 'description': 'Setting up sandbox environment...', 'done': False}
Event: {'type': 'status', 'data': {'status': 'in_progress', 'description': 'Setting up sandbox environment...', 'done': False}}
Emitting status event: {'status': 'in_progress', 'description': 'Running Python code in gVisor sandbox...', 'done': False}
Event: {'type': 'status', 'data': {'status': 'in_progress', 'description': 'Running Python code in gVisor sandbox...', 'done': False}}
Emitting status event: {'status': 'error', 'description': "Sandbox runtime failed: Sandbox failed to start: Command '['/tmp/gvisor/runsc', '--rootless=true', '--directfs=false', '--network=host', '--ignore-cgroups=true', '--root=/tmp/sandbox_e6ngllrj/runtime', '--debug=true', '--debug-log=/tmp/sandbox_e6ngllrj/logs/', 'run', '--bundle=/tmp/sandbox_e6ngllrj/bundle', 'sandbox']' returned non-zero exit status 128.; stderr: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF; logs: defaultdict(<class 'list'>, {'runsc.log.20240905-021314.955720.run.txt': ['W0905 02:13:14.990443      13 util.go:64] FATAL ERROR: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF', 'W0905 02:13:14.990503      13 main.go:231] Failure to execute command, err: 1'], 'runsc.log.20240905-021314.955720.gofer.txt': ['W0905 02:13:14.981137       1 util.go:64] FATAL ERROR: error converting mounts: permission denied'], 'runsc.log.20240905-021314.955720.boot.txt': ['W0905 02:13:14.989358      28 util.go:64] FATAL ERROR: error setting up chroot: error converting mounts: permission denied']})", 'done': True}
Event: {'type': 'status', 'data': {'status': 'error', 'description': "Sandbox runtime failed: Sandbox failed to start: Command '['/tmp/gvisor/runsc', '--rootless=true', '--directfs=false', '--network=host', '--ignore-cgroups=true', '--root=/tmp/sandbox_e6ngllrj/runtime', '--debug=true', '--debug-log=/tmp/sandbox_e6ngllrj/logs/', 'run', '--bundle=/tmp/sandbox_e6ngllrj/bundle', 'sandbox']' returned non-zero exit status 128.; stderr: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF; logs: defaultdict(<class 'list'>, {'runsc.log.20240905-021314.955720.run.txt': ['W0905 02:13:14.990443      13 util.go:64] FATAL ERROR: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF', 'W0905 02:13:14.990503      13 main.go:231] Failure to execute command, err: 1'], 'runsc.log.20240905-021314.955720.gofer.txt': ['W0905 02:13:14.981137       1 util.go:64] FATAL ERROR: error converting mounts: permission denied'], 'runsc.log.20240905-021314.955720.boot.txt': ['W0905 02:13:14.989358      28 util.go:64] FATAL ERROR: error setting up chroot: error converting mounts: permission denied']})", 'done': True}}
{"status": "ERROR", "output": "Sandbox runtime failed: Sandbox failed to start: Command '['/tmp/gvisor/runsc', '--rootless=true', '--directfs=false', '--network=host', '--ignore-cgroups=true', '--root=/tmp/sandbox_e6ngllrj/runtime', '--debug=true', '--debug-log=/tmp/sandbox_e6ngllrj/logs/', 'run', '--bundle=/tmp/sandbox_e6ngllrj/bundle', 'sandbox']' returned non-zero exit status 128.; stderr: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF; logs: defaultdict(<class 'list'>, {'runsc.log.20240905-021314.955720.run.txt': ['W0905 02:13:14.990443      13 util.go:64] FATAL ERROR: running container: creating container: cannot create sandbox: cannot read client sync file: waiting for sandbox to start: EOF', 'W0905 02:13:14.990503      13 main.go:231] Failure to execute command, err: 1'], 'runsc.log.20240905-021314.955720.gofer.txt': ['W0905 02:13:14.981137       1 util.go:64] FATAL ERROR: error converting mounts: permission denied'], 'runsc.log.20240905-021314.955720.boot.txt': ['W0905 02:13:14.989358      28 util.go:64] FATAL ERROR: error setting up chroot: error converting mounts: permission denied']})"}

Also, setting cgroupns to host resolved the problem for device or resource busy. The container still had to be privileged though. Not sure about the security implications this has.

@EtiennePerot
Copy link
Owner

EtiennePerot commented Sep 5, 2024

Thanks for testing this out. After some research, my current understanding of the cgroupfs issue is that there are two distinct things standing in the way of creating child cgroups:

  • By default, Docker mounts /sys/fs/cgroup as read-only from the container, preventing any child cgroup from being created. Adding --mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false overrides Docker's /sys/fs/cgroup mount to be writable. Setting --privileged=true has many effects, and changing the /sys/fs/cgroup writability bit appears to be one of them.
$ docker run --rm busybox grep cgroup /proc/mounts
cgroup /sys/fs/cgroup cgroup2 ro,nosuid,nodev,noexec,relatime 0 0

$ docker run --rm --mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false busybox grep cgroup /proc/mounts
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0

$ docker run --rm --privileged=true busybox grep cgroup /proc/mounts
cgroup /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0
  • By default, when a container is started, it is put in its own cgroup in the main cgroup namespace, but is then unshare'd to a private cgroup namespace where it becomes the "root cgroup" of that cgroup namespace.
$ docker run --rm busybox sh -c 'readlink /proc/self/ns/cgroup; cat /proc/self/cgroup'
/cgroup:[4026534726]
0::/
  • cgroupfs's cgroup.subtree_control file appears to reject writes with EBUSY when coming from the root cgroup in the current cgroup namespace. I am not sure why this is (haven't looked deeply). What the docker-in-docker hack does is to move all processes from the cgroup namespace's root cgroup (which is the container's dedicated cgroup, but not the host's actual root cgroup as seen from the host's main cgroup namespace) into a new child cgroup (possible after the mount is writable, per above). Once all processes are moved, the cgroup namespace's root cgroup's cgroup.subtree_control file no longer rejects writes.
  • Setting --cgroupns=host still makes the container run in a child cgroup of the main cgroup namespace, but no longer unshare'd to a different cgroup namespace. So the container can see the precise cgroup hierarchy under which it is running in on the host.
$ docker run --rm --cgroupns=host busybox sh -c 'readlink /proc/self/ns/cgroup; cat /proc/self/cgroup'
cgroup:[4026531835]
0::/system.slice/docker-da3f71d8360c44e0b1dc8629707bbac35cdbadc8f7d4f825cd781b099b746d55.scope

... So if my understanding is correct, things should work if you add both --cgroupns=host and --mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false, without needing --privileged=true. I will also make the tool automatically do the Docker-in-Docker cgroup-switcheroo trick to avoid needing --cgroupns=host.

In the meantime, I have updated the container runtime setup doc to reflect the need for the above.

@shg8
Copy link
Author

shg8 commented Sep 6, 2024

Thanks for the detailed explanation. Unfortunately, the combination didn't work, and the error was the same. It seems to be the same error as this.

@EtiennePerot
Copy link
Owner

EtiennePerot commented Sep 6, 2024

@shg8 If you get the error setting up chroot: error converting mounts: permission denied error when running with --cgroupns=host --mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false but not --privileged=true, then that means the above was actually a correct diagnosis of the cgroup issue (the lack of --privileged=true indicates that we made it past it without extra privileges). The error setting up chroot: error converting mounts: permission denied is just the next error in line :)

Thanks for pointing out the similarity with the Dangerzone issue you linked to as well. I wasn't suspecting it because you're running with regular Docker (which as far as I know mounts /tmp as writable), so it shouldn't run into the same problem. But I will nonetheless add fallback code to other temporary directories that are writable in order to work around issues for users with non-Docker container runtimes. Will post on this bug once that's available.

@shg8
Copy link
Author

shg8 commented Sep 6, 2024

Oh I'm afraid this comes before the cgroup issue. The cgroup device or resource busy error only comes after setting privileged to true. The error remains the same before and after adding the bind mount.

@shg8
Copy link
Author

shg8 commented Sep 6, 2024

I tried your solution in the Dangerzone comment of setting a different tmpdir, but it didn't seem to make a difference.

@EtiennePerot
Copy link
Owner

EtiennePerot commented Sep 9, 2024

Hello again,

I uploaded a new version of the code runner tool and function that includes a lot more debug logging, and should remove the need to do the whole cgroup dance because it will do that automatically. It also has more a elaborate self-test mode.

The tool is available here and the function is available here.

In order to debug your issue, I'd recommend creating a new container identical to the OpenWebUI one (without --privileged=true or --cgroupns=host; please post the docker run command-line you're using), then run the tool's self-test mode in that container, and post the output in this issue.

python3 path/to/tools/run_code.py --self_test

If it fails, also add --debug to the above command-line, and it should generate more debug info.

@shg8
Copy link
Author

shg8 commented Sep 17, 2024

Thanks for the follow-up. Please see the output in the attached file. The same problem seems to persist.

The container was started with the following.

    volumes:
      - type: bind
        source: /sys/fs/cgroup
        target: /sys/fs/cgroup
        read_only: false
    cgroup: host
    security_opt:
      - seccomp:unconfined

@EtiennePerot
Copy link
Owner

EtiennePerot commented Sep 22, 2024

Thank you for the debug log. The good news is that I've been able to reproduce this. The bad news is I still don't know why this is happening. It might be AppArmor (added a bullet point about it in the docs) but I don't think that's the last hurdle. Still looking.

EtiennePerot added a commit that referenced this issue Sep 23, 2024
This fixes self-re-execution thanks to Open WebUI having merged
open-webui/open-webui#5511.

It also works around more permission issues due to procfs mounts.
Docs updated.

Fixes #11
Fixes #12
Updates #2
Updates #3
@EtiennePerot
Copy link
Owner

I think I figured it out. Please try 0.6.0 after doing the new instructions on the setup docs. Specifically, this means adding apparmor:unconfined to security_opt, and another non-recursive bind mount of source: /proc target: /proc2.

@EtiennePerot
Copy link
Owner

I believe this is fixed. Please reopen if you still experience this issue after updating to v0.6.0 and following the setup docs.

There is also a separate issue when running in Docker with systems using old cgroups v1, but this is tracked in issue #14.

@shg8
Copy link
Author

shg8 commented Sep 30, 2024

Yup, everything works as expected now. Thanks a lot for debugging this issue. One question out of curiosity - does this approach provide more security over --privileged if the container running open-webui itself is compromised?

@EtiennePerot
Copy link
Owner

EtiennePerot commented Sep 30, 2024

@shg8 Yes, it does. I just submitted cc3f52f which adds a section to the docs explaining the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants