Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spec] Ambiguous if opcodes after the 0xFC and 0xFD prefix have multiple valid encodings #1526

Closed
MarkDerosier opened this issue Aug 24, 2022 · 3 comments

Comments

@MarkDerosier
Copy link

https://webassembly.github.io/spec/core/appendix/index-instructions.html
lists the binary opcode of "i32.trunc_sat_f64_u" as 0xFC 0x03,

However, https://webassembly.github.io/spec/core/binary/instructions.html#numeric-instructions

lists "i32.trunc_sat_f64_u" as being encoded as 0xFC 3:u32.

The u32 is a link that goes to the LEB128 page of the spec, https://webassembly.github.io/spec/core/binary/values.html#binary-int ,
which I think notes that 'trailing zeros' are allowed in the encoding. It explicitly mentions that 0x03 and 0x83 0x00 are well formed encodings of the value 3.

Since unsigned integers have multiple encodings in LEB128, can't "i32.trunc_sat_f64_u" be encoded as 0xFC 0x03, and 0xFC 0x83 0x00 (among more encodings with trailing zeros)?

When the spec uses a an integer constant encoded as a LEB128 u32, does it intend the shortest encoding?
Similarly, select_t requires a vector of length 1, but there are multiple ways to encode 1.

In other places in the spec, such as the alignment field in memarg of memory instructions https://webassembly.github.io/spec/core/binary/instructions.html#memory-instructions, it would be useful to note that although the integers encoded are small, and there are only a few of them, it can still be longer than 1 byte due to trailing zeros encodings.

Personally I would prefer if there was only one way to encode said instructions as it would simplify my parser.

@tlively
Copy link
Member

tlively commented Aug 24, 2022

Your reading of the opcodes after the prefix being LEB128 is correct, so yes, there are multiple ways to encode these operations. Hopefully this doesn't complicate your parser too much, since LEB128s appear in so many other locations in the binary format as well.

@rossberg
Copy link
Member

PR #1528 adds a clarifying note to the index.

@rossberg
Copy link
Member

Closing via #1528.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants