-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pack / unpack: a flexible data encoding scheme #9829

Comments
As an extension we could consider supporting structs to be encoded directly, marked with the key |
Nice! I think it's a good idea, and it makes sense to reuse the [sort of] common format the other languages have. |
Thinking about this perhaps we should consider making the above compile-time fixed-size only. This means removing the If the expected length and the generated output length is known at compile time, then
I don't see any immediate use case which would need that dynamic behaviour, in fact I would expect the dynamic behaviour to be handled by the caller. Also note that Python, Ruby, and Lua, each to some extent support dynamic behaviour, some stretching it to provide a "semi-regexp" format. We absolutely want to avoid that here. Should also consider replacing Some example code follows: Eth2 deposit contractbefore: bytes32 node = sha256(abi.encodePacked(
sha256(abi.encodePacked(pubkey_root, withdrawal_credentials)),
sha256(abi.encodePacked(amount, bytes24(0), signature_root))
)); after: bytes32 node = sha256(pack("c32 c32",
sha256(pack("c32 c32", pubkey_root, withdrawal_credentials)),
sha256(pack("<I8 >c24 c32", deposit_amount, bytes24(0), signature_root))
)); Note due to support for endianness, there is no need for the Could also choose to represent the padding as a literal, such as openzeppelin/ecdsabefore: // Check the signature length
if (signature.length != 65) {
revert("ECDSA: invalid signature length");
}
// Divide the signature in r, s and v variables
bytes32 r;
bytes32 s;
uint8 v;
// ecrecover takes the signature parameters, and the only way to get them
// currently is to use assembly.
// solhint-disable-next-line no-inline-assembly
assembly {
r := mload(add(signature, 0x20))
s := mload(add(signature, 0x40))
v := byte(0, mload(add(signature, 0x60)))
} after: (bytes32 r, bytes32 s, uint8 v) = unpack(signature, "c32 c32 B", (bytes32, bytes32, uint8)); |
As an alternate proposal (triggered by @ekpyron) we could consider is possible if we have #9170 (wasn't there an earlier issue for this? we discussed it as early as 2016). The above examples in order would look like this: // Decodes 32-bit, skipping 12 bytes, and 20 characters.
require(data.length == 36);
uint32 selector = uint32(bytes4(data[:4]));
address recipient = address(bytes20(data[24:])); bytes32 node = sha256(concat(
sha256(concat(pubkey_root, withdrawal_credentials)),
sha256(concat(bswap(bytes8(uint64(deposit_amount))), bytes24(0), signature_root)
)); Here we use require(signature.length == 65);
bytes32(signature[:32])
bytes32(signature[32:64])
uint8(signature[65]) This one is actually pretty neat. |
This is also related: #8772 |
The original proposal was written a while back and initially looked to support more dynamic features. If we make it entirely static, it probably is possible to drop the formatting string and rely on the types only -- which basically means this is a proposal to clarify which types |
Would it be possible to use template syntax for the format specifier, so that we can do proper type checking? I.e. This would maybe require the format to be something else than a string, but maybe not. |
I prefer the more verbose way to do it (with the added benefit that it is mainly in "userland"). The abbreviations are just very hard to remember correctly. The downside is that the offsets have to be matched, and I wonder if there is some other mechanism to do it that looks more like:
One drawback I can see is that people could be tempted to use the unpacking functions directly in function arguments here evaluation order is not always strictly adhered to. |

This issue has been marked as stale due to inactivity for the last 90 days. |
Hi everyone! This issue has been automatically closed due to inactivity. |

Exchanging data with contracts is a crucial feature, which has been mostly covered by the ABI and hidden by the language with decoding and encoding done opaquely. Various constructs, such as proxies and layer-2 solutions, demand more control over the data, and as a result new functionality was introduced to widen the control for the users:
abi.encode
,abi.encodePacked
, andabi.decode
.Important to note that
abi.encode
/abi.decode
operate on the "non-packed" ABI encoding specification, whileabi.encodePacked
allows for a weird ruleset of packed encoding.There have been several requests for more flexibility for decoding, including
abi.decodePacked
, which due to the ambiguity of the encoding cannot be done.I propose to take a cue from other languages and consider the widely used pack / unpack functions:
Introduce two functions:
pack(<format>, <values>...) -> (bytes memory data)
unpack(<data>, <format>, <types>...) -> <variables>...)
The
format
is a string literal consisting of the keys described below, where the space character between the keys are ignored. Thetypes
is the same format as used byabi.decode
, and the number of types must match the number of "captures" provided in theformat
. Both of these rules are so an encoder/decoder can be generated at compile time.These could be placed under
abi.
, some new namespace, or left top level.Example:
There are two more special symbols:
>
sets big-endian mode (this is the default)<
sets little-endian modeAdditional rules:
[0-9a-z][0-9a-z]
) is taken as a literal value to be matched_NN
) literal signals the number of bytes to skip (or zero-pad in case of packing)iN
andIN
mean thatN
is a decimal number literal which defines the number of bytes of that integer typecN
means thatN
number of bytes are expected, whereN
is a literal decimal numberc0
means that a zero terminated C-string is expected, furthermore it must be valid UTF-8One of many questions: should the decoder fail (revert) if the length of input is shorter than expected? I think so, though one could argue if that is calldata then it can be "safely" zero padded.
Example for decoding ABI encoded data:
This is only a rough draft and I think we have to be careful at selecting this initial list of supported keys. I think the above is a good starting point, but not something final. It could also eventually deprecate
abi.encodePacked
.The text was updated successfully, but these errors were encountered: