-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add wasm-encoder API to directly encode instructions #1985
base: main
Are you sure you want to change the base?
Conversation
22b34d7
to
1bd473a
Compare
Thanks again so much for your work on this, I like how this turned out and you're definitely not the only one who feels this is more idiomatic when you've got a static list of instructions! My main thoughts on this are around the newtype So to be it's a bit of a tradeoff where what's earned with typed indices is type annotations of what's what, but what's lost with type annotations is ergonomics where the crate isn't currently really built to work with typed indicies. For example ideally adding a type to a type section would return a There's also a bit of a downside where there's some instructions like Overall personally I'd lean towards removing the typed indices in favor of a future refactoring to possibly include them later (but that'd likely be quite a big change). What do you think about that though? Do you think that this is worth the tradeoff of "not integrating well" of sorts with the rest of the crate? |
Thanks for the review Alex! Seems like we're mostly on the same page; I for sure agree that it feels weird for I went back and forth on the design for this PR a lot while I was prototyping it last week, and despite the mismatch, I still think that the right thing to do is to adopt this new newfunc.instruction(&Instruction::MemoryInit {
mem: init.memory,
data_index: init.segment,
}); vs. after: insn.memory_init(MemIdx(init.memory), DataIdx(init.segment)) So I still see this as a strict improvement because it's net-neutral ergonomically and net-positive for performance. Another angle: in constructing this PR, I constrained the design to be strictly backward-compatible. Using the newtype wrappers in other types in the public API would be a backward-incompatible change and I don't have a great intuition for how much the wasm-tools team wants to avoid those, so I just didn't do that in this PR. But if you feel that this change would make sense only after having incorporated these newtype wrappers into the rest of the API (entailing breaking changes) then I'd be fine with that. Regarding copy instructions, I agree that it's a bit unfortunate. I have a couple thoughts:
The only other instructions that have this issue are Sorry for the long response. Overall, I originally started working on this PR for the performance and you're the maintainer so I'll go with whatever you prefer on any of these ergonomics issues. Personally, though, I really want this new API to be strictly better than the old one whenever the instruction variant is static, and I don't see how to do that without these newtype |
By the way, if such a refactoring were to take place to use typed indices more consistently throughout wasm-tools, in what crate would those type definitions live? (I think part of the answer to this question would also include why |
You bring up all good points! I think personally though I'm not entirely convinced, but I'd also like to cc @fitzgen on this as well. Nick do you have thoughts on the use of newtypes here in wasm-encoder?
I don't doubt your conclusion here but I also personally place less weight on the transitionary nature of changes like this. I instead try to evaluate the end state in its own right, so comparing: insn.memory_init(MemIdx(init.memory), DataIdx(init.segment)) vs insn.memory_init(init.memory, init.segment) the latter is more desirable to me. Another space that I can fill in some context on:
Currently In that sense I'm not worried about changing existing APIs. For example I think it would be reasonable to change the APIs in To me the main benefit of typed indices though is that you worry about construction of the index in one location and otherwise generally don't worry about it. You get compiler errors when the wires are crossed by accident but for the most part you aren't responsible for constructing indices manually. That to me is the main drawback of this current state of affairs which is that the typed index has to be created manually every time. Ideally the indices are already stored at rest in their typed form instead.
Thinking more on this with what you've brought up as well, I think it's probably best to basically ignore the concern I'm thinking of. While it's possible to do something like
Heh this is an interesting question :) The requirements of Now that being said typed indices are far simpler than the representation of a function type as it's unconditionally a typed wrapper around a 32-bit integer. There's still questions about what exactly does the API look like but that's probably going to be the same for Basically tl;dr; I think it's reasonable to live in Ok well that's sort of an equally long reply, en garde! In any case though personally I still feel that I'd like to move in the direction of typed indices in the future but it's best done as a follow-up. The main benefit would be to avoid having end users manually construct indexes themselves and instead they're naturally created as part of the rest of the |
This all makes sense, thanks Alex! If you and Nick want me to remove the newtypes from this PR and then move forward to hopefully merge it, I'm happy to do that. Would you be open to me starting on a separate PR to use typed indices in wasm-encoder? Also, as an aside: even after weaving typed indices throughout the whole API, that still won't mean users never have to construct them themselves. For instance, with literal constants, they'd still have to write |
First: @samestep thanks for taking this PR on! Its great to have both an instruction enum and individual methods. As I think Alex mentioned on Zulip, you end up wanting both in the limit because which one you reach for depends on the exact context of your usage. I think typed indices in general are great, but agree with Alex that we should probably try to split that out into a separate follow up. That follow up should ideally add support for them to the whole workspace rather than just
Agreed that we don't want to have a separate That said, I do think that |
Following up on a Zulip conversation with @alexcrichton, this PR adds a new
wasm_encoder::InstructionSink
type, which has an encoding method for each variant ofwasm_encoder::Instruction
:The
impl Encode for Instruction
is then refactored to just delegate to these methods.See https://github.com/samestep/wasm-encoder-performance for more details; in particular, the primary motivation for this PR is an observed 30% difference in performance when encoding sequences of instructions that are known at compile time, such as in
crates/wit-component/src/gc.rs
. Subjectively I also consider the new API to be more ergonomic.The one tricky part in designing this PR was the fact that 65
Instruction
variants have named fields, so if the new API just usedu32
for everything like the old API did, it could be more error-prone than before. To address this, I introduced several new types like this:So, the benefit from named fields in nearly all those variants is now provided more directly by the typechecker, with better ergonomics in the case where the programmer already has a value of the more specific
u32
wrapper type in hand; and in the other cases, writing out the name of the type to wrap au32
is no less ergonomic than would be writing the name of the field in the first place, since often the field name is essentially the same as the type name.Before I switched to using these wrapper types, I considered instead turning each
Instruction
variant into its own type rather than its own function, and simply implementingEncode
for each such type. I decided against this because, unlike theInstructionSink
approach, there'd be no way to access all these new instruction encodings from aFunction
, except maybe by adding a newtrait InstructionVariant
or something like that, but that seemed like a whole can of worms that I didn't want to open. I find thisInstructionSink
approach to be far more ergonomic.In this same PR I've also refactored as many usages as possible to prefer the new API over the old one, because of the resulting improvements in performance and (in my opinion) ergonomics. If it would be preferable for me to undo that and just introduce the new API without refactoring, please let me know and I can undo those changes. Also please let me know if I should make any changes to the new API itself: the repository I linked above contains a script to generate the bulk of the new code, as well as ast-grep rules to do most of the refactoring in this repo, so it's now relatively easy to iterate on the design.