Skip to content

Commit

Permalink
wasi: renames WASIConfig to SysConfig and makes stdio defaults safer (#…
Browse files Browse the repository at this point in the history
…396)

This introduces `SysConfig` to replace `WASIConfig` and formalize documentation around system calls.

The only incompatible change planned after this is to switch from wasi.FS to fs.FS

Implementation Notes:

Defaulting to os.Stdin os.Stdout and os.Stderr doesn't make sense for
the same reasons as why we don't propagate ENV or ARGV: it violates
sand-boxing. Moreover, these are worse as they prevent concurrency and
can also lead to console overload if accidentally not overridden.

This also changes default stdin to read EOF as that is safer than reading
from os.DevNull, which can run the host out of file descriptors.

Finally, this removes "WithPreopens" for "WithFS" and "WithWorkDirFS",
to focus on the intended result. Similar Docker, if the WorkDir isn't set, it
defaults to the same as root.

Signed-off-by: Adrian Cole <[email protected]>
  • Loading branch information
codefromthecrypt authored Mar 23, 2022
1 parent 2cb56be commit 59617a2
Show file tree
Hide file tree
Showing 26 changed files with 1,218 additions and 721 deletions.
58 changes: 57 additions & 1 deletion RATIONALE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ wazero shares constants and interfaces with internal code by a sharing pattern d
* shared interfaces and constants go in a package under root.
* Ex. package `wasi` -> `/wasi/*.go`
* user code that refer to that package go into the flat root package `wazero`.
* Ex. `WASIConfig` -> `/wasi.go`
* Ex. `StartWASICommand` -> `/wasi.go`
* implementation code begin in a corresponding package under `/internal`.
* Ex package `internalwasi` -> `/internal/wasi/*.go`

Expand Down Expand Up @@ -87,6 +87,62 @@ runtime vs interpreting Wasm directly (the `naivevm` interpreter).
Note: `microwasm` was never specified formally, and only exists in a historical codebase of wasmtime:
https://github.com/bytecodealliance/wasmtime/blob/v0.29.0/crates/lightbeam/src/microwasm.rs

## Why is `SysConfig` decoupled from WASI?

WebAssembly System Interfaces (WASI) is a formalization of a practice that can be done anyway: Define a host function to
access a system interface, such as writing to STDOUT. WASI stalled at snapshot-01 and as of early 2022, is being
rewritten entirely.

This instability implies a need to transition between WASI specs, which places wazero in a position that requires
decoupling. For example, if code uses two different functions to call `fd_write`, the underlying configuration must be
centralized and decoupled. Otherwise, calls using the same file descriptor number will end up writing to different
places.

In short, wazero defined system configuration in `SysConfig`, not a WASI type. This allows end-users to switch from
one spec to another with minimal impact. This has other helpful benefits, as centralized resources are simpler to close
coherently (ex via `Module.Close`).

### Background on `SysConfig` design
WebAssembly 1.0 (20191205) specifies some aspects to control isolation between modules ([sandboxing](https://en.wikipedia.org/wiki/Sandbox_(computer_security))).
For example, `wasm.Memory` has size constraints and each instance of it is isolated from each other. While `wasm.Memory`
can be shared, by exporting it, it is not exported by default. In fact a WebAssembly Module (Wasm) has no memory by
default.

While memory is defined in WebAssembly 1.0 (20191205), many aspects are not. Let's use an example of `exec.Cmd` as for
example, a WebAssembly System Interfaces (WASI) command is implemented as a module with a `_start` function, and in many
ways acts similar to a process with a `main` function.

To capture "hello world" written to the console (stdout a.k.a. file descriptor 1) in `exec.Cmd`, you would set the
`Stdout` field accordingly, perhaps to a buffer. In WebAssembly 1.0 (20191205), the only way to perform something like
this is via a host function (ex `ModuleBuilder.ExportFunction`) and internally copy memory corresponding to that string
to a buffer.

WASI implements system interfaces with host functions. Concretely, to write to console, a WASI command `Module` imports
"fd_write" from "wasi_snapshot_preview1" and calls it with the `fd` parameter set to 1 (STDOUT).

The [snapshot-01](https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md) version of WASI has no
means to declare configuration, although its function definitions imply configuration for example if fd 1 should exist,
and if so where should it write. Moreover, snapshot-01 was last updated in late 2020 and the specification is being
completely rewritten as of early 2022. This means WASI as defined by "snapshot-01" will not clarify aspects like which
file descriptors are required. While it is possible a subsequent version may, it is too early to tell as no version of
WASI has reached a stage near W3C recommendation. Even if it did, module authors are not required to only use WASI to
write to console, as they can define their own host functions, such as they did before WASI existed.

wazero aims to serve Go developers as a primary function, and help them transition between WASI specifications. In
order to do this, we have to allow top-level configuration. To ensure isolation by default, `SysConfig` has WithXXX
that override defaults to no-op or empty. One `SysConfig` instance is used regardless of how many times the same WASI
functions are imported. The nil defaults allow safe concurrency in these situations, as well lower the cost when they
are never used. Finally, a one-to-one mapping with `Module` allows the module to close the `SysConfig` instead of
confusing users with another API to close.

Naming, defaults and validation rules of aspects like `STDIN` and `Environ` are intentionally similar to other Go
libraries such as `exec.Cmd` or `syscall.SetEnv`, and differences called out where helpful. For example, there's no goal
to emulate any operating system primitive specific to Windows (such as a 'c:\' drive). Moreover, certain defaults
working with real system calls are neither relevant nor safe to inherit: For example, `exec.Cmd` defaults to read STDIN
from a real file descriptor ("/dev/null"). Defaulting to this, vs reading `io.EOF`, would be unsafe as it can exhaust
file descriptors if resources aren't managed properly. In other words, blind copying of defaults isn't wise as it can
violate isolation or endanger the embedding process. In summary, we try to be similar to normal Go code, but often need
act differently and document `SysConfig` is more about emulating, not necessarily performing real system calls.

## Implementation limitations

Expand Down
2 changes: 2 additions & 0 deletions builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ type ModuleBuilder interface {
Build() (*Module, error)

// Instantiate is a convenience that calls Build, then Runtime.InstantiateModule
//
// Note: Fields in the builder are copied during instantiation: Later changes do not affect the instantiated result.
Instantiate() (wasm.Module, error)
}

Expand Down
208 changes: 205 additions & 3 deletions config.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,15 @@ package wazero

import (
"context"
"errors"
"fmt"
"io"
"math"

internalwasm "github.com/tetratelabs/wazero/internal/wasm"
"github.com/tetratelabs/wazero/internal/wasm/interpreter"
"github.com/tetratelabs/wazero/internal/wasm/jit"
"github.com/tetratelabs/wazero/wasi"
)

// NewRuntimeConfigJIT compiles WebAssembly modules into runtime.GOARCH-specific assembly for optimal performance.
Expand Down Expand Up @@ -77,7 +82,204 @@ type Module struct {
module *internalwasm.Module
}

// WithName returns a new instance which overrides the name.
func (m *Module) WithName(moduleName string) *Module {
return &Module{name: moduleName, module: m.module}
// WithName configures the module name. Defaults to what was decoded from the module source.
//
// If the source was in WebAssembly 1.0 (20191205) Binary Format, this defaults to what was decoded from the custom name
// section. Otherwise, if it was decoded from Text Format, this defaults to the module ID stripped of leading '$'.
//
// For example, if the Module was decoded from the text format `(module $math)`, the default name is "math".
//
// See https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#name-section%E2%91%A0
// See https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#custom-section%E2%91%A0
// See https://www.w3.org/TR/2019/REC-wasm-core-1-20191205/#modules%E2%91%A0%E2%91%A2
func (m *Module) WithName(name string) *Module {
m.name = name
return m
}

// SysConfig configures resources needed by functions that have low-level interactions with the host operating system.
// Using this, resources such as STDIN can be isolated (ex via StartWASICommandWithConfig), so that the same module can
// be safely instantiated multiple times.
//
// Note: While wazero supports Windows as a platform, host functions using SysConfig follow a UNIX dialect.
// See RATIONALE.md for design background and relationship to WebAssembly System Interfaces (WASI).
type SysConfig struct {
stdin io.Reader
stdout io.Writer
stderr io.Writer
args []string
// environ is pair-indexed to retain order similar to os.Environ.
environ []string
// environKeys allow overwriting of existing values.
environKeys map[string]int

// preopenFD has the next FD number to use
preopenFD uint32
// preopens are keyed on file descriptor and only include the Path and FS fields.
preopens map[uint32]*internalwasm.FileEntry
// preopenPaths allow overwriting of existing paths.
preopenPaths map[string]uint32
}

func NewSysConfig() *SysConfig {
return &SysConfig{
environKeys: map[string]int{},
preopenFD: uint32(3), // after stdin/stdout/stderr
preopens: map[uint32]*internalwasm.FileEntry{},
preopenPaths: map[string]uint32{},
}
}

// WithStdin configures where standard input (file descriptor 0) is read. Defaults to return io.EOF.
//
// This reader is most commonly used by the functions like "fd_read" in wasi.ModuleSnapshotPreview1 although it could be
// used by functions imported from other modules.
//
// Note: The caller is responsible to close any io.Reader they supply: It is not closed on wasm.Module Close.
// Note: This does not default to os.Stdin as that both violates sandboxing and prevents concurrent modules.
// See https://linux.die.net/man/3/stdin
func (c *SysConfig) WithStdin(stdin io.Reader) *SysConfig {
c.stdin = stdin
return c
}

// WithStdout configures where standard output (file descriptor 1) is written. Defaults to io.Discard.
//
// This writer is most commonly used by the functions like "fd_write" in wasi.ModuleSnapshotPreview1 although it could
// be used by functions imported from other modules.
//
// Note: The caller is responsible to close any io.Writer they supply: It is not closed on wasm.Module Close.
// Note: This does not default to os.Stdout as that both violates sandboxing and prevents concurrent modules.
// See https://linux.die.net/man/3/stdout
func (c *SysConfig) WithStdout(stdout io.Writer) *SysConfig {
c.stdout = stdout
return c
}

// WithStderr configures where standard error (file descriptor 2) is written. Defaults to io.Discard.
//
// This writer is most commonly used by the functions like "fd_write" in wasi.ModuleSnapshotPreview1 although it could
// be used by functions imported from other modules.
//
// Note: The caller is responsible to close any io.Writer they supply: It is not closed on wasm.Module Close.
// Note: This does not default to os.Stderr as that both violates sandboxing and prevents concurrent modules.
// See https://linux.die.net/man/3/stderr
func (c *SysConfig) WithStderr(stderr io.Writer) *SysConfig {
c.stderr = stderr
return c
}

// WithArgs assigns command-line arguments visible to an imported function that reads an arg vector (argv). Defaults to
// none.
//
// These values are commonly read by the functions like "args_get" in wasi.ModuleSnapshotPreview1 although they could be
// read by functions imported from other modules.
//
// Similar to os.Args and exec.Cmd Env, many implementations would expect a program name to be argv[0]. However, neither
// WebAssembly nor WebAssembly System Interfaces (WASI) define this. Regardless, you may choose to set the first
// argument to the same value set via WithName.
//
// Note: This does not default to os.Args as that violates sandboxing.
// Note: Runtime.InstantiateModule errs if any value is empty.
// See https://linux.die.net/man/3/argv
// See https://en.wikipedia.org/wiki/Null-terminated_string
func (c *SysConfig) WithArgs(args ...string) *SysConfig {
c.args = args
return c
}

// WithEnv sets an environment variable visible to a Module that imports functions. Defaults to none.
//
// Validation is the same as os.Setenv on Linux and replaces any existing value. Unlike exec.Cmd Env, this does not
// default to the current process environment as that would violate sandboxing. This also does not preserve order.
//
// Environment variables are commonly read by the functions like "environ_get" in wasi.ModuleSnapshotPreview1 although
// they could be read by functions imported from other modules.
//
// While similar to process configuration, there are no assumptions that can be made about anything OS-specific. For
// example, neither WebAssembly nor WebAssembly System Interfaces (WASI) define concerns processes have, such as
// case-sensitivity on environment keys. For portability, define entries with case-insensitively unique keys.
//
// Note: Runtime.InstantiateModule errs if the key is empty or contains a NULL(0) or equals("") character.
// See https://linux.die.net/man/3/environ
// See https://en.wikipedia.org/wiki/Null-terminated_string
func (c *SysConfig) WithEnv(key, value string) *SysConfig {
// Check to see if this key already exists and update it.
if i, ok := c.environKeys[key]; ok {
c.environ[i+1] = value // environ is pair-indexed, so the value is 1 after the key.
} else {
c.environKeys[key] = len(c.environ)
c.environ = append(c.environ, key, value)
}
return c
}

// WithFS assigns the file system to use for any paths beginning at "/". Defaults to not found.
//
// Note: This sets WithWorkDirFS to the same file-system unless already set.
func (c *SysConfig) WithFS(fs wasi.FS) *SysConfig {
c.setFS("/", fs)
return c
}

// WithWorkDirFS indicates the file system to use for any paths beginning at ".". Defaults to the same as WithFS.
func (c *SysConfig) WithWorkDirFS(fs wasi.FS) *SysConfig {
c.setFS(".", fs)
return c
}

// withFS is hidden especially until #394 as existing use cases should be possible by composing file systems.
// TODO: in #394 add examples on WithFS to accomplish this.
func (c *SysConfig) setFS(path string, fs wasi.FS) {
// Check to see if this key already exists and update it.
entry := &internalwasm.FileEntry{Path: path, FS: fs}
if fd, ok := c.preopenPaths[path]; ok {
c.preopens[fd] = entry
} else {
c.preopens[c.preopenFD] = entry
c.preopenPaths[path] = c.preopenFD
c.preopenFD++
}
}

// toSysContext creates a baseline internalwasm.SysContext configured by SysConfig.
func (c *SysConfig) toSysContext() (sys *internalwasm.SysContext, err error) {
var environ []string // Intentionally doesn't pre-allocate to reduce logic to default to nil.
// Same validation as syscall.Setenv for Linux
for i := 0; i < len(c.environ); i += 2 {
key, value := c.environ[i], c.environ[i+1]
if len(key) == 0 {
err = errors.New("environ invalid: empty key")
return
}
for j := 0; j < len(key); j++ {
if key[j] == '=' { // NUL enforced in NewSysContext
err = errors.New("environ invalid: key contains '=' character")
return
}
}
environ = append(environ, key+"="+value)
}

// Ensure no-one set a nil FD. We do this here instead of at the call site to allow chaining as nil is unexpected.
rootFD := uint32(0) // zero is invalid
setWorkDirFS := false
preopens := c.preopens
for fd, fs := range preopens {
if fs.FS == nil {
err = fmt.Errorf("FS for %s is nil", fs.Path)
return
} else if fs.Path == "/" {
rootFD = fd
} else if fs.Path == "." {
setWorkDirFS = true
}
}

// Default the working directory to the root FS if it exists.
if rootFD != 0 && !setWorkDirFS {
preopens[c.preopenFD] = &internalwasm.FileEntry{Path: ".", FS: preopens[rootFD].FS}
}

return internalwasm.NewSysContext(math.MaxUint32, c.args, environ, c.stdin, c.stdout, c.stderr, preopens)
}
Loading

0 comments on commit 59617a2

Please sign in to comment.