Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preliminary wasm32 support for git-pack #735

Merged
merged 3 commits into from
Feb 15, 2023
Merged

preliminary wasm32 support for git-pack #735

merged 3 commits into from
Feb 15, 2023

Conversation

Byron
Copy link
Member

@Byron Byron commented Feb 14, 2023

The goal of this PR is to make accelerated pack resolution available in WASM. This includes the following steps

  • an iterator of entries that were decoded from a pack (the pack decoding is excluded here)
  • resolve thin-packs on the fly
  • create a an index for accelerated pack resolution
  • run pack resolution with user-definable code (it will see the decoded object in full to do whatever it needs to)

From there it should be possible to have a WASM server receive a pack assuming it has a way to decode the pack into entries.
Maybe the decoding step must be possible here too for convenience.

Tasks

  • CI validation of WASM support (or the lack of it)
  • Crates that work already and should keep working

It seems it's best to extract parts into their own crate

Tasks for Extraction

All of what follows should compile asWASM-unknown-unknown target

  • std - probably these work out of the box, needed by cache::delta::Tree
    • AtomicBool for interruptions
    • io::BufRead
    • io::Write
  • git-features - mostly for decoding that pack
    • crc32
    • SHA1 hashing (should work)
    • flate2 (minizoxide backend only) - should already work
    • progress (and with that much of prodash with Arc and AtomicUsize for counters
  • git-hash for Kind and ObjectId
  • git-pack
    • LookupRefDeltaObjectsIter
    • BytesToEntriesIter
      • decoding machinery
    • EntriesToBytesIter - writes a pack right back to disk for lookup, but there is no disk
    • cache::delta::Tree - used to see each packed object and its hash for connectivity checks knowing what's in the pack
    • index::File::write_data_iter_to_stream(…) - not needed as the delta::Tree is doing all the work. It does, however, contain the core algorithm on bringing everything together and in theory can be used to create a pack index file in memory.

Research

Blockers: everything with IO, namely

  • git-lock and git-tempfile
  • libc seems to support only wasm32-unknown-emscripten

To circumvent, some crates might have to be split. Problem here is type ownership - WASM compatible crates probably shouldn't own the type in question so must provide pure functions with a lot of parameters or contexts.

There might be duplication of documentation unless these are just referring to each other. It's still a slightly strange setup to have WASM in different crates, but inclusion seems easier to handle than exclusion.

Learnings

  • using wasm32-wasi seems to generally produce better error messages as it supports more out of the box. This can pinpoint locations where incompatible crates are being used.

Interesting Reads

@Byron Byron force-pushed the git-pack-wasm branch 7 times, most recently from 994717e to f5e89fe Compare February 14, 2023 10:19
@jeffparsons
Copy link
Contributor

@Byron How closely have you been following Wasm/WASI developments?

WASI "preview 2" is coming soon, and it changes a lot — in particular, it is being rebased on top of the WebAssembly Component Model. If you are only writing programs that target wasm32-wasi then you won't need to care much about this (other than knowing that when rustc learns about preview 2, you'll be able to do a lot more out of the box, e.g. networking) but if you want to be able to build independently-deployable Wasm modules for different bits of of Gitoxide, then waiting for snapshot2 to land and the surrounding tooling to mature might be worthwhile.

What are the main use cases you have in mind? I may be able to sketch a concrete example of what I'm talking about.

@Byron
Copy link
Member Author

Byron commented Feb 15, 2023

Thanks so much for chiming in @jeffparsons, it's much appreciated!

I have updated the PR description to be more informative. Thus far I thought the target must be wasm32-unknown-unknown because the crate has to integrate with other crates that compile to that target as well.

I am looking forward to hearing how to best do that, I am definitely very green here and only have a minimal understanding of what needs to be done.

Thank you

@Byron Byron changed the title preliminary WASM support for git-pack preliminary wasm32 support for git-pack Feb 15, 2023
@Byron Byron force-pushed the git-pack-wasm branch 3 times, most recently from 455af8b to 4f213b4 Compare February 15, 2023 12:27
@Byron
Copy link
Member Author

Byron commented Feb 15, 2023

It looks like wasi (as vendor) and unknown (as vendor) don't make much of a difference with the pre-requisites of the respective git-pack types, which is very helpful. That way it should be possible to check for target_arch throughout the git-pack crate to opt-in certain parts of the code that can already work.

@Byron Byron force-pushed the git-pack-wasm branch 6 times, most recently from 5f079a3 to f151dda Compare February 15, 2023 16:55
@Byron Byron mentioned this pull request Feb 15, 2023
17 tasks
For now failure is allowed as no work was done, but this should confirm the crate can at least be compiled
to that target.

We try different targets, including WASI, for good measure, and already build crates that are naturally working.
…wasm32.

It's a breaking change because we also start using the `dep:` syntax for declaring
references to optional dependencies, which will prevent them from being automatically
available as features.

Besides that, it adds the `wasm` feature toggle to allow compiling to `wasm32` targets.
@Byron Byron merged commit 4bc19d1 into main Feb 15, 2023
@Byron Byron deleted the git-pack-wasm branch February 15, 2023 19:33
@jeffparsons
Copy link
Contributor

jeffparsons commented Feb 17, 2023

I still don't have a great understanding of how you're intending for this to be used (e.g. what crates it needs to integrate with, the shape of that integration, what assumptions about the world those other crates make), so to start I'll just summarize the main ways I imagine Gitoxide being used with Wasm and my best understanding of what each would look like, and maybe we can explore from there.

Hopefully there's something helpful in here... 🤞

(1) Gitoxide is consumed as a crate by other Rust code, targeting core Wasm

Actual compilation target might be wasm32-unknown-unknown, wasm32-unknown-emscripten, or wasm32-wasi, but you don't make any assumptions other than what's supported by wasm32-unknown-unknown — i.e. core Wasm, and nothing else. (The other targets are supersets of this one.)

Any IO your own code tries to perform will (IIRC) either error or panic; the parts of std that deal with IO are stubbed out just enough to allow programs to compile. This means that the IO has to be handled by something other than Gitoxide. E.g. Gitoxide might accept something that is Read, but otherwise leave it up to the program using Gitoxide to figure out how to provide that.

The other aspect of IO that has to be handled outside of Gitoxide here is calling to/from the host (and other stuff run by the Wasm host, e.g. JavaScript). There are things like wasm-bindgen that allow you to effectively layer an ABI on top of wasm32-unknown-unknown by generating code on the Rust side and JavaScript side, but I see these as a stopgap that won't need to exist for much longer.

If this is something you want to support, there's no harm in doing it, because support for everything else (Emscripten, WASI, etc.) can be additive.

(2) Gitoxide is consumed by JavaScript code via Emscripten on the web

Compilation target is wasm32-unknown-emscripten. This builds upon (1).

This is the one I know the least about. I also think of this as mostly a stopgap, but I don't know enough about what it has to offer to dismiss it out of hand. I imagine that once the WebAssembly Component Model (see below) matures that all the things Emscripten does for Wasm can slowly be split out into WebAssembly Components.

(3) Gitoxide is consumed as a crate by other Rust code, targeting WASI

Compilation target is wasm32-wasi. This builds upon (1).

Now you can actually use IO features from std, albeit not all of them right now. Rustc's current wasm32-wasi target is based on wasi_snapshot_preview1, which is soon to be superseded by wasi_snapshot_preview2. preview2 rebases WASI on top of the WebAssembly Component Model, and is not compatible with preview1. (Although there is a polyfill in the works to ease the transition.)

In this scenario, however, you don't care that everything will change under the hood when rustc moves over to targeting preview2. All you care is that a few more things in std will magically start working. (Sockets? Threads? — Whatever is ready when they ship preview2.)

(4) Gitoxide is consumed as a WebAssembly component

Compilation target is wasm32-wasi. You can't actually do this ergonomically yet, but should be able to Soon™. This builds upon (3).

In this final scenario you are supporting people who want to use Gitoxide from their language of choice, and who are either embedding a Wasm runtime in their program or shipping their own program as a Wasm module (possibly a Wasm Component) to be run in some other runtime, e.g., Wasmtime or a web browser.

You would presumably rely on WASI for IO (via std), but could also choose to ship a "no IO" version too if you wanted to do that for some reason. (WASI builds on top of Wasm Components, which build on top of "core Wasm".)

You would probably define your Wasm Component's interface using the WIT language, and then use something like cargo-component to define and build your Wasm Components in Rust, and something like jco if you want to help people use it on the web.

Wasm and the Component Model are defined in a really neatly layered and modular way, which lets you polyfill/virtualize a lot of things. E.g. if someone wants to use your code in some non-WASI environment, they could use one of the tools available for linking multiple Wasm components together to produce a single Core Wasm binary that, e.g., assumes the presence of Emscripten, or some other virtual system environment altogether. So not everyone needs to jump on supporting WASI or even the Component Model immediately for it to be useful. You could start writing WebAssembly Components very soon, and run them in web browsers years before those web browsers decide they want to introduce native support for WebAssembly Components.

Conclusion

If I were making this decision and I didn't have a specific reason to support Emscripten, then I would probably skip that sort of thing entirely. I would start out supporting just (1) and (3), and then as soon as rustc starts targeting preview2 I would start defining a Wasm Component (or collection of them with dependencies much like the Gitoxide crates) that people could either use directly in a runtime that supports them, or "repackaged" to support environments like the web. By this point Wasm Component registries will exist, and you could publish your Components to those and/or have people download them from GitHub Releases or whatever.

On the other hand, if you do have some existing Emscripten based project you'd like to support, or just want to have a web based demo up and running sooner rather than later, supporting Emscripten might not actually be much work — I'm just not familiar enough with it to make any meaningful comment.

@Byron
Copy link
Member Author

Byron commented Feb 17, 2023

❤️🙏

I have linked this analysis from the tracking the tracking ticket as well to be sure to have it should I venture further down this path. For now, this isn't planned though but I am sure it will eventually happen. 2024, maybe, I should have need for running more code in WASM as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants