Skip to content

Commit

Permalink
Release 0.6.0.
Browse files Browse the repository at this point in the history
This fixes self-re-execution thanks to Open WebUI having merged
open-webui/open-webui#5511.

It also works around more permission issues due to procfs mounts.
Docs updated.

Fixes #11
Fixes #12
Updates #2
Updates #3
  • Loading branch information
EtiennePerot committed Sep 23, 2024
1 parent 65c2faf commit 0ff908d
Show file tree
Hide file tree
Showing 4 changed files with 274 additions and 60 deletions.
2 changes: 2 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
"maxsplit",
"memfd",
"mountinfo",
"mtab",
"newcgroup",
"NEWNS",
"preexec",
Expand All @@ -39,6 +40,7 @@
"subcontainers",
"subfile",
"subfiles",
"submounts",
"syscall",
"UNSTARTED",
"urandom",
Expand Down
19 changes: 14 additions & 5 deletions docs/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ The below is the minimal subset of changes that `--privileged=true` does that is
* On **Docker**: Add `--mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false` to `docker run`.
* On **Kubernetes**: Add a [`hostPath` volume](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) with `path` set to `/sys/fs/cgroup`, then mount it in your container's `volumeMounts` with options `mountPath` set to `/sys/fs/cgroup` and `readOnly` set to `false`.
* **Why**: This is needed so that gVisor can create child [cgroups](https://en.wikipedia.org/wiki/Cgroups), necessary to enforce per-sandbox resource usage limits.
* **Mount `procfs` at `/proc2`**:
* On **Docker**: Add `--mount=type=bind,source=/proc,target=/proc2,readonly=false,bind-recursive=disabled` to `docker run`.
* On **Kubernetes**: Add a [`hostPath` volume](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) with `path` set to `/proc`, then mount it in your container's `volumeMounts` with options `mountPath` set to `/proc2` and `readOnly` set to `false`.
* **Why**: By default, in non-privileged mode, the container runtime will mask certain sub-paths of `/proc` inside the container by creating submounts of `/proc` (e.g. `/proc/bus`, `/proc/sys`, etc.). gVisor does not really care or use anything under these sub-mounts, but *does* need to be able to mount `procfs` in the chroot environment it isolates itself in. However, its ability to mount `procfs` requires having an existing unobstructed view of `procfs` (i.e. a mount of `procfs` with no submounts). Otherwise, such mount attempts will be denied by the kernel (see the explanation for "locked" mounts on [`mount_namespaces(8)`](https://www.man7.org/linux/man-pages/man7/mount_namespaces.7.html)). Therefore, exposing an unobstructed (non-recursive) view of `/proc` elsewhere in the container filesystem (such as `/proc2`) informs the kernel that it is OK for this container to be able to mount `procfs`.
* Remove the container's default **AppArmor profile**:
* On **Docker**: Add `--security-opt=apparmor=unconfined` to `docker run`.
* On **Kubernetes**: Set [`spec.securityContext.appArmorProfile.type`](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-apparmor-profile-for-a-container) to `Unconfined`.
Expand All @@ -66,20 +70,22 @@ The below is the minimal subset of changes that `--privileged=true` does that is

## Self-test mode

To verify that your setup works, you can run the tool in self-test mode using `run_code.py`'s `--use-sample-code` flag.
To verify that your setup works, you can run the function and the tool in self-test mode using the `--self_test` flag.

For example, here is a Docker invocation running the `run_code.py` script inside the Open WebUI container image with the above flags:

```shell
$ git clone https://github.com/EtiennePerot/open-webui-code-execution && \
cd open-webui-code-execution && \
docker run --rm \
--security-opt=seccomp=unconfined \
--security-opt=apparmor=unconfined \
--security-opt=label=type:container_engine_t \
--mount=type=bind,source=/sys/fs/cgroup,target=/sys/fs/cgroup,readonly=false \
--mount=type=bind,source="$(pwd)/open-webui-code-execution",target=/selftest \
--mount=type=bind,source=/proc,target=/proc2,readonly=false,bind-recursive=disabled \
--mount=type=bind,source="$(pwd)",target=/test \
ghcr.io/open-webui/open-webui:main \
python3 /selftest/open-webui/tools/run_code.py --self_test
sh -c 'python3 /test/open-webui/tools/run_code.py --self_test && python3 /test/open-webui/functions/run_code.py --self_test'
```

If all goes well, you should see:
Expand All @@ -97,10 +103,12 @@ If all goes well, you should see:
✔ Self-test long_running_code passed.
⏳ Running self-test: ram_hog
✔ Self-test ram_hog passed.
✅ All self-tests passed, good go to!
✅ All tool self-tests passed, good go to!
...
✅ All function self-tests passed, good go to!
```

If you get an error, try to add the `--debug` flag at the very end of this command (i.e. as a `run_code.py` flag) for extra information, then file a bug.
If you get an error, try to add the `--debug` to each `run_code.py` invocation for extra information, then file a bug.

## Set valves

Expand All @@ -114,6 +122,7 @@ The code execution tool and function have the following valves available:
* Useful for multi-user setups to avoid denial-of-service.
* **Auto Install**: Whether to automatically download and install gVisor if not present in the container.
* If not installed, gVisor will be automatically installed in `/tmp`.
* You can set the HTTPS proxy used for this download using the `HTTPS_PROXY` environment variable.
* Useful for convenience, but should be disabled for production setups.
* **Debug**: Whether to produce debug logs.
* This should never be enabled in production setups as it produces a lot of information that isn't necessary for regular use.
Expand Down
163 changes: 130 additions & 33 deletions open-webui/functions/run_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
author: EtiennePerot
author_url: https://github.com/EtiennePerot/open-webui-code-execution
funding_url: https://github.com/EtiennePerot/open-webui-code-execution
version: 0.5.0
version: 0.6.0
license: Apache-2.0
"""

Expand Down Expand Up @@ -35,6 +35,8 @@
import asyncio
import argparse
import base64
import ctypes
import ctypes.util
import contextlib
import copy
import fcntl
Expand Down Expand Up @@ -78,7 +80,7 @@ class Valves(pydantic.BaseModel):
)
AUTO_INSTALL: bool = pydantic.Field(
default=True,
description=f"Whether to automatically install gVisor if not installed on the system; may be overridden by environment variable {_VALVE_OVERRIDE_ENVIRONMENT_VARIABLE_NAME_PREFIX}AUTO_INSTALL.",
description=f"Whether to automatically install gVisor if not installed on the system; may be overridden by environment variable {_VALVE_OVERRIDE_ENVIRONMENT_VARIABLE_NAME_PREFIX}AUTO_INSTALL. Use the 'HTTPS_PROXY' environment variable to control the proxy used for download.",
)
DEBUG: bool = pydantic.Field(
default=False,
Expand All @@ -105,7 +107,7 @@ class Valves(pydantic.BaseModel):
)
WEB_ACCESSIBLE_DIRECTORY_URL: str = pydantic.Field(
default="/cache/functions/run_code",
description=f"URL corresponding to WEB_ACCESSIBLE_DIRECTORY_PATH. May start with '/' to make it relative to the Open WebUI serving domain. may be overridden by environment variable {_VALVE_OVERRIDE_ENVIRONMENT_VARIABLE_NAME_PREFIX}WEB_ACCESSIBLE_DIRECTORY_URL.",
description=f"URL corresponding to WEB_ACCESSIBLE_DIRECTORY_PATH. May start with '/' to make it relative to the Open WebUI serving domain. May be overridden by environment variable {_VALVE_OVERRIDE_ENVIRONMENT_VARIABLE_NAME_PREFIX}WEB_ACCESSIBLE_DIRECTORY_URL.",
)

def __init__(self, valves):
Expand Down Expand Up @@ -161,7 +163,6 @@ async def action(

async def _fail(error_message, status="SANDBOX_ERROR"):
await emitter.fail(error_message)
await emitter.code_execution_result(f"{status}: {error_message}")
return json.dumps({"status": status, "output": error_message})

await emitter.status("Checking messages for code blocks...")
Expand Down Expand Up @@ -327,7 +328,6 @@ def _log(filename: str, log_line: str):
print(f"[{filename}] {log_line}", file=sys.stderr)

sandbox.debug_logs(_log)
await emitter.code_execution_result(output)
if status == "OK":
generated_files_output = ""
if len(generated_files) > 0:
Expand Down Expand Up @@ -443,14 +443,6 @@ async def status(
async def fail(self, description="Unknown error"):
await self.status(description=description, status="error", done=True)

async def code_execution_result(self, output):
await self._emit(
"code_execution_result",
{
"output": output,
},
)

async def message(self, content):
await self._emit(
"message",
Expand Down Expand Up @@ -1097,12 +1089,57 @@ class Sandbox:
("id",),
("uname", "-a"),
("ls", "-l", "/proc/self/ns"),
("findmnt",),
(sys.executable, "--version"),
)

# Environment variable used to detect interpreter re-execution.
_MARKER_ENVIRONMENT_VARIABLE = "__CODE_EXECUTION_STAGE"

# Copy of this file's own contents, for re-execution.
# Must be populated at import time using `main`.
_SELF_FILE = None

# libc bindings.
# Populated using `_libc`.
_LIBC = None

class _Libc:
"""
Wrapper over libc functions.
"""

def __init__(self):
libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
libc.mount.argtypes = (ctypes.c_char_p,)
self._libc = libc

def mount(self, source, target, fs, options):
if (
self._libc.mount(
source.encode("ascii"),
target.encode("ascii"),
fs.encode("ascii"),
0,
options.encode("ascii"),
)
< 0
):
errno = ctypes.get_errno()
raise OSError(
errno,
f"mount({source}, {target}, {fs}, {options}): {os.strerror(errno)}",
)

def umount(self, path):
if self._libc.umount(path.encode("ascii")) < 0:
errno = ctypes.get_errno()
raise OSError(errno, f"umount({path}): {os.strerror(errno)}")

def unshare(self, flags):
if self._libc.unshare(flags) < 0:
raise OSError(f"unshare({flags}) failed")

class _Switcheroo:
"""
Management of the switcheroo procedure for running in a usable cgroup namespace and node.
Expand All @@ -1115,7 +1152,8 @@ class _Switcheroo:
_CGROUP_SUPERVISOR_NAME = "supervisor"
_CGROUP_LEAF = "leaf"

def __init__(self, log_path, max_sandbox_ram_bytes):
def __init__(self, libc, log_path, max_sandbox_ram_bytes):
self._libc = libc
self._log_path = log_path
self._max_sandbox_ram_bytes = max_sandbox_ram_bytes
self._my_euid = None
Expand Down Expand Up @@ -1457,7 +1495,9 @@ def _find_self_in_cgroup_hierarchy(self):
for dirpath, _, subfiles in os.walk(
self._CGROUP_ROOT, onerror=None, followlinks=False
):
if not dirpath.startswith(cgroup_root_slash):
if dirpath != self._CGROUP_ROOT and not dirpath.startswith(
cgroup_root_slash
):
continue
if "cgroup.procs" not in subfiles:
continue
Expand Down Expand Up @@ -1983,8 +2023,56 @@ def cgroups_available(cls) -> bool:
else:
return True

@staticmethod
def unshare(flags):
@classmethod
def check_procfs(cls):
"""
Verifies that we have an unobstructed view of procfs.
:return: Nothing.
:raises EnvironmentNeedsSetupException: If procfs is obstructed.
"""
mount_infos = []
with open("/proc/self/mountinfo", "rb") as mountinfo_f:
for line in mountinfo_f:
line = line.decode("utf-8").strip()
if not line:
continue
mount_components = line.split(" ")
if len(mount_components) != 10:
continue
hyphen_index = mount_components.index("-")
if hyphen_index < 6:
continue
mount_info = {
"mount_path": mount_components[4],
"path_within_mount": mount_components[3],
"fs_type": mount_components[hyphen_index + 1],
}
mount_infos.append(mount_info)
procfs_mounts = frozenset(
m["mount_path"]
for m in mount_infos
if m["fs_type"] == "proc" and m["path_within_mount"] == "/"
)
if len(procfs_mounts) == 0:
raise cls.EnvironmentNeedsSetupException(
"procfs is not mounted; please mount it"
)
obstructed_procfs_mounts = set()
for mount_info in mount_infos:
for procfs_mount in procfs_mounts:
if mount_info["mount_path"].startswith(procfs_mount + os.sep):
obstructed_procfs_mounts.add(procfs_mount)
for procfs_mount in procfs_mounts:
if procfs_mount not in obstructed_procfs_mounts:
return # We have at least one unobstructed procfs view.
assert len(obstructed_procfs_mounts) > 0, "Logic error"
raise cls.EnvironmentNeedsSetupException(
"procfs is obstructed; please mount a new procfs mount somewhere in the container, e.g. /proc2 (`--mount=type=bind,source=/proc,target=/proc2,readonly=false`)"
)

@classmethod
def unshare(cls, flags):
"""
Implementation of `os.unshare` that works on Python < 3.12.
Expand All @@ -1995,13 +2083,7 @@ def unshare(flags):
return os.unshare(flags)

# Python <= 3.11:
import ctypes

libc = ctypes.CDLL(None)
libc.unshare.argtypes = [ctypes.c_int]
rc = libc.unshare(flags)
if rc == -1:
raise OSError(f"unshare({flags}) failed")
return cls._libc().unshare(flags)

@classmethod
def check_unshare(cls):
Expand Down Expand Up @@ -2109,18 +2191,29 @@ def check_setup(cls, language: str, auto_install_allowed: bool):
cls.check_platform()
cls.check_unshare()
cls.check_cgroups()
cls.check_procfs()
if not auto_install_allowed and cls.get_runsc_path() is None:
raise cls.GVisorNotInstalledException(
"gVisor is not installed (runsc binary not found in $PATH); please install it or enable AUTO_INSTALL valve for auto installation"
)

@classmethod
def maybe_main(cls):
def _libc(cls):
if cls._LIBC is None:
cls._LIBC = cls._Libc()
return cls._LIBC

@classmethod
def main(cls):
"""
Entry-point for re-execution.
Entry-point for (re-)execution.
Populates `cls._SELF_FILE`, so must be called during import.
May call `sys.exit` if this is intended to be a code evaluation re-execution.
"""
if os.environ.get(cls._MARKER_ENVIRONMENT_VARIABLE) is None:
if cls._SELF_FILE is None:
with open(__file__, "r") as self_f:
cls._SELF_FILE = self_f.read()
if cls._MARKER_ENVIRONMENT_VARIABLE not in os.environ:
return
directives = json.load(sys.stdin)
try:
Expand Down Expand Up @@ -2214,6 +2307,7 @@ def _init(self, settings):
self._persistent_home_dir = self._settings["persistent_home_dir"]
self._sandboxed_command = None
self._switcheroo = self._Switcheroo(
libc=self._libc(),
log_path=os.path.join(self._logs_path, "switcheroo.txt"),
max_sandbox_ram_bytes=self._max_ram_bytes,
)
Expand Down Expand Up @@ -2242,6 +2336,7 @@ def _setup_sandbox(self):
raise self.SandboxException(
f"Persistent home directory {self._persistent_home_dir} does not exist"
)
oci_config["root"]["path"] = rootfs_path

try:
self._switcheroo.do()
Expand All @@ -2253,8 +2348,6 @@ def _setup_sandbox(self):
else:
raise e.__class__(f"{e}; {switcheroo_status}")

oci_config["root"]["path"] = rootfs_path

# Locate the interpreter to use.
interpreter_path = sys.executable
if self._language == self.LANGUAGE_BASH:
Expand Down Expand Up @@ -2512,12 +2605,15 @@ def run(self) -> subprocess.CompletedProcess:
:raises Sandbox.InterruptedExecutionError: If the code interpreter died without providing a return code; usually due to running over resource limits.
:raises sandbox.CodeExecutionError: If the code interpreter failed to execute the given code. This does not represent a sandbox failure.
"""
reexec_path = os.path.join(self._tmp_dir, "self.py")
with open(reexec_path, "w") as reexec_f:
reexec_f.write(self._SELF_FILE)
new_env = os.environ.copy()
new_env[self._MARKER_ENVIRONMENT_VARIABLE] = "1"
data = json.dumps({"settings": self._settings})
try:
result = subprocess.run(
(sys.executable, os.path.abspath(__file__)),
(sys.executable, reexec_path),
env=new_env,
input=data,
text=True,
Expand Down Expand Up @@ -2862,6 +2958,7 @@ def _verify():
"code": (f"head -c{64 * 1024 * 1024} /dev/urandom > random_data.bin",),
"valves": {
"MAX_MEGABYTES_PER_USER": 32,
"MAX_RAM_MEGABYTES": 2048,
},
"status": "STORAGE_ERROR",
"post": _want_user_storage_num_files(16),
Expand Down Expand Up @@ -3052,17 +3149,17 @@ def _print_output(obj):
else:
print(f"✔️ Self-test {name} passed.", file=sys.stderr)
if success:
print("✅ All self-tests passed, good go to!", file=sys.stderr)
print("✅ All function self-tests passed, good go to!", file=sys.stderr)
sys.exit(0)
else:
print("☠️ One or more self-tests failed.", file=sys.stderr)
print("☠️ One or more function self-tests failed.", file=sys.stderr)
sys.exit(1)
assert False, "Unreachable"


Sandbox.main()
# Debug utility: Run code from stdin if running as a normal Python script.
if __name__ == "__main__":
Sandbox.maybe_main()
parser = argparse.ArgumentParser(
description="Run arbitrary code in a gVisor sandbox."
)
Expand Down
Loading

0 comments on commit 0ff908d

Please sign in to comment.