Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sandbox): add -Zsandbox unstable flag #66

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

weihanglo
Copy link
Owner

@weihanglo weihanglo commented Oct 19, 2024

What does this PR try to resolve?

This is an experiment for rust-lang/rust-project-goals#108.

An unstable [sandbox] config table is added behind the -Zsandbox unstable flag:

  • sandbox.runner: Target platform sandboxed build scripts built for
  • sandbox.target: Runner that executes artifacts of build scripts

Here is an example .cargo/config.toml:

[sandbox]
target = "wasm32-wasip1"
runner = [
  "wasmtime",
  "--wasi",
  "inherit-env=y",
  "--dir",
  "/path/to/my/package",
]

Also an example main.rs:

use std::path::PathBuf;
use std::time::SystemTime;
use std::time::UNIX_EPOCH;

fn main() {
    println!("cargo:warning=Hello from sandbox wasm");
    // Access to Cargo internal environment variables
    //
    // Require `--env` or `-S inherit-env=y`
    let cargo_env = std::env::var("CARGO").unwrap();
    println!("cargo:warning=CARGO={cargo_env}");

    // Access to inherited enviroment variables
    //
    // Require `-S inherit-env=y`
    let shell = std::env::var("SHELL").unwrap();
    println!("cargo:warning=SHELL={shell}");

    // Files access under a given directory
    //
    // Require `--dir $CARGO_MANIFEST_DIR`
    let ts_file = PathBuf::from(std::env::var("CARGO_MANIFEST_DIR").unwrap())
        .join("target")
        .join("wasi-timestamp");
    let timestamp = SystemTime::now()
        .duration_since(UNIX_EPOCH)
        .unwrap()
        .as_millis();
    std::fs::write(&ts_file, format!("{timestamp}").as_bytes()).unwrap();
    let ts = std::fs::read_to_string(ts_file).unwrap();
    println!("cargo:warning=current timestamp is {ts}");

    // Spawn process
    //
    // Unsupported 😭😭😭
    std::process::Command::new(cargo_env).status().unwrap();
}

What did we achieve in this experiment?

As you can see, we can easily swap to any sandbox runner with a custom target.

We use wasm32-wasip1 above as an example,
as it is the one with smaller footprint, very cross-platform, and pretty popular in the Rust community.
However, it turns out that `wasm32-wasip1 doesn't support POSIX process spawning.
Use cases of process spawning in build scripts are essential, such as

  • Calling pkg-config to find system libraries.
  • Invoking compilers to build non Rust code, e.g., C compiler, Protobuf compiler.
  • Invoking cargo metadata to get crate metadata.
  • Retrieving version control information via git.

According to the design axioms,

Restrcting process spawning is the top one axiom.
We made that with wasm32-wasip1,
but have no way to opt-out.
This contradicts to the "ensuring -sys crates be built" axiom.

As a result, it is unlikely to use wasm32-wasip1 as a default sandbox environment with this experiment.

Other possibilities

Have talked to some other folks, there are some potential route we could take if we chose wasi as a default sandbox environment.

The offical build-rs crate to the rescue

The Cargo team recently adopted the build-rs crate.
It is going to be the official crate providing API for writing build scripts.
We could take the advantage of it, telling everyone instead of using std::process::Command, use something like build_rs::Command. So that Cargo could have a full control over how a build script spawning processes.

While it sounds ideal, this doesn't help the current situation because

  • It doesn't help old crates.
  • It's hard to convince people to change their build scripts for trying an unstable feature.
  • Using build-rs in build scripts may increase build time; people may refuse to use.
  • I myself tend to have a big switch to turn on and off, which is easier to try a new feature.

A Cargo-flavored wasi standard library

There was a discussion in the GSoC "sandboxed proc-macro with wasm" about shipping a custom verion of the standard library for sandboxed wasm. For Cargo's build script, we could potentially ship a wasm32-wasi-cargo target. The std in this custom target could intercept any exec call or process spawning. Then it calls back to the host process (which is Cargo in our case) to determine how to handle process execution. The host process could either reject, or run the external program and post back the result.

This idea sounds pretty hacky and need more investigations of the communication mechanism between the host process and the wasm runtime. Perhaps via sockets, The WebAssembly Component Model, other host function call mechanism. There is also WASIX project which supprots fork/exec though it is currently not a WASI standard not even a proposal.

Continue with other more mature sandbox runtime choices

Since one of major design space is the user interafce of sandbox configuration,
we could leave off sandbox runtimes and explore more on the configuration side.

We could, for example, use docker or eBPF as a temporary default runtime, and explore how the configuration should look like.
We may want to take a look at the configuration of Cackle-rs as a starting point. By doing so, we wouldn't block on waiting for wasm runtime to being more mature.

* `sandbox.runner`: Target platform sandboxed build scripts built for
* `sandbox.target`: Runner that executes artifacts of build scripts

Example config:

```toml
[sandbox]
target = "wasm32-wasip1"
runner = [
  "wasmtime",
  "--wasi",
  "inherit-env=y",
  "--dir",
  "/path/to/my/package",
]
```
Add support for `[sandbox]` to specify a target platform
and a custom build script runner.

To help the external runner finds and executes the executable produced
by a build script, it is assumed that only one executable was produced.
@weihanglo weihanglo force-pushed the custom-sandbox-runner-target branch from e0889df to 4ca4cd0 Compare October 19, 2024 20:06
@smoelius
Copy link

Hi, @weihanglo. I learned of this project from the the October project goals update.

I wanted to make you aware of a related project build-wrap that we have been working on at Trail of Bits.

build-wrap's goals are similar those of this PR, though build-wrap relies on the OS to achieve them. I hope it can serve as a source of ideas or at least inspiration. Cheers.

@weihanglo
Copy link
Owner Author

Hello @smoelius!
Thank you for sharing! I believe I have seen this awesome project but at some point I forgot. Perhaps lacking of Windows support might be the reason I didn't consider it in the place, yet it is still a thing worth further exploration. I'll look into it :)

@alexcrichton
Copy link

👋 If you and/or Cargo folks are interested I'd be happy to chat in-depth about WASI/WebAssembly/etc things. I can also dump info here if that helps too!

As a perhaps tl;dr; my overall read of the situation here (please correct me if I'm wrong) is that if WASI targets supported std::process::Command it would largely solve the concerns here. If that's true then it's correct that WASI has no support for that today, but I also personally think it would be reasonable to strive to support this in WASI. This certainly isn't the first project (nor will it be the last) to want to support spawning processes. I personally think it's viable to write a WASI interface for spawning processes in such a way that can get agreement in the upstream WASI proposal itself. To be clear though this would be a long process, not a quick fix. It'd require a "champion" of sorts to help the proposal through the phases of WASI which I could certainly be able to help out with though.

In any case though I'm also happy to attend a meeting if you or others would find that helpful too!

@RalfJung
Copy link

RalfJung commented Nov 1, 2024

If we just let a wasm script spawn processes that then run outside the sandbox, that seems pointless. So I don't see how the wasm-based approach here is supposed to work.

@weihanglo
Copy link
Owner Author

If we just let a wasm script spawn processes that then run outside the sandbox, that seems pointless. So I don't see how the wasm-based approach here is supposed to work.

Those spawn processes still run inside the sandbox I believe, just like other sockets and HTTP requests wasi proposals. The wasi runtime needs to provide an interface for granting/limiting what can be spawn. It is pretty hard to design a proper interface for that though.

@RalfJung
Copy link

RalfJung commented Nov 1, 2024

That requires a sandbox that can run actual binaries. If we have a sandbox that can run actual binaries (not just wasm code), we may as well build the build script the usual way and run it inside the sandbox. So at that point the use of wasm seems a bit pointless.

@jeffparsons
Copy link

jeffparsons commented Nov 1, 2024

If we just let a wasm script spawn processes that then run outside the sandbox, that seems pointless.

As a WASI user, the way I'd imagine this working would be that all process spawning is denied by default, and there would be embedder-defined mechanisms for allow-listing spawning, e.g., specific external processes, possibly even with specific patterns of arguments.

I don't have a lot of experience with writing interesting Rust build scripts, but I'd imagine this approach would cover the vast majority of *-sys crates etc. that just need to link to some system-provided C libraries.

Edit: to clarify, the advantage of this over, say, defining Cargo-specific Component Model interfaces, is that build scripts can still be written in Plain Old Rust, and as long as they're only calling out to a set of well-known external programs that are trusted-ish by Rust anyway, then they'll Just Work inside a Wasm sandbox. Any build scripts that try to do anything more exotic than that (e.g call Google's protoc) would fail. So then maybe we could have a mechanism for users to say "I'm okay with this crate's build script calling this external program" but deny by default in "paranoid mode".

@alexcrichton
Copy link

Personally I feel there's nuance that might be worth exploring here. Should arbitrary executables be allowed by default? Probably not. If the executables were themselves wasm, then probably so! As @jeffparsons mentions I'd also imagine a sort of allow-list style system which could also be in place. Alternatively if this isn't a security sandbox and is instead just a "hermetic build" style sandbox then it's perhaps totally ok to execute whatever so long as the build system can track the dependencies of the "whatever". Furthermore if the goal is to precompile build scripts to wasm then allowing arbitrary executables is totally fine.

Basically if there is room to explore here (which I'd also completely understand if there isn't appetite for that or energy to explore) then I think it could be worth digging in to this more on the wasm side without immediately bouncing off on initial concerns

@RalfJung
Copy link

RalfJung commented Nov 2, 2024

IMO security is one of the main motivations for this -- being able to do check-builds of a project and open it in an IDE without having to worry about it exfiltrating secrets from my machine, or mutating any part of the filesystem outside the project folder. As someone who regularly tries to understand third-party libraries, which requires exploring their sourcecode, I would quite appreciate if that wouldn't require me to do manual sandboxing. :) (Note that in many cases I do not run these libraries nor their test suites, so I don't have to trust the generated code.)

Sorry if this is going off-topic, I don't know what is the right place for the "general project discussion" here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants