-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: transaction rlp encoding #1539
Conversation
145dd2b
to
5886366
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. This method is a lot cleaner! I didn't think wrapper solution could be this clean as in the past I got a weird taste from our
NodeId
and Enr
wrappers we used to have in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, looks great! I agree this is the better approach
buf.advance(1); | ||
BlobTransaction::decode(buf).map(Self::Blob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason besides stylistic preference to use buf.advance(1)
vs passing in &buf[1..]
?
I'm pretty neutral on the choice, just curious if I missed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, yes there is. Original implementation actually doesn't work correctly according to spec.
The documentation of the Decodable::decode says: buf
must be advanced past the decoded object.
Passing &mut &buf[1..]
would not modify buf
at all. That's why we have to:
- advance
buf
by 1 to skip transaction type - pass
buf
into*Transaction::decode
which will updatebuf
as it decodes the transaction
The "legacy transaction" is different in this regard, as it doesn't use 1 byte for transaction type (legacy reasons), but it's first byte is clearly different than any of the non-legacy types (this is handled by code in TransactionId::try_from
).
So decoding logic is to peek at the first byte, figure out if it is legacy or not, and proceed accordingly.
/// The RLP encoding of this type is done in a following way: | ||
/// | ||
/// - Legacy transactions are always encoded in the same way: `rlp(tx)`. | ||
/// - Other transactions are encoded as: `rlp(type + rlp(tx))` (where `+` represents concatenation) | ||
/// - `type` is a single byte represending the transaction type | ||
/// | ||
/// The basic type differs in the way it encodes non-legacy transaction. The basic type doesn't have | ||
/// extra RLP wrapper around it, meaning it's just: `type + rlp(tx)`. | ||
pub struct TransactionWithRlpHeader(pub Transaction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great doc
I understand the name TransactionWithRlpHeader
. I don't 100% love it, though I am not sure I've got a better one. I'll talk through my thought process for why, and offer some alternatives. Feel free to ignore it all, no response necessary. :)
For a first-time code reader to quickly grok the name, it sort of assumes that the reader is actively aware of the internal steps of RLP-encoding. (If it happens to be a byte-string, you can just toss a header in front of it, and then it's RLP-encoded).
In contrast, the docstring does a solid job of explaining: we re-encode typed transactions using RLP. This ensures that transactions are always valid RLP at the outermost layer. Whereas standard transaction encoding is explicitly not valid RLP for typed transactions. So maybe this struct could be named with this approach. Here are some names that might inspire you:
TransactionAlwaysRLP
DoublyRLPEncodedTypedTransaction
EnsureOuterRLPTransaction
TransactionForcedToRLPEncoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I also don't love my name, I like it more than alternatives that you provided.
I would also add that even with RLP header, encoding is weird and non-usual (compared to standard RLP encoding):
- the type of RLP header is different between legacy and non-legacy tx
- legacy tx has "list header"
- non-legacy tx has "byte string header"
- according to EIP-2718, non-legacy transaction doesn't have to be rlp encoded at all (see "Opaque byte array rather than an RLP array" on the linked page)
- currently they, but EIP leaves space for that to change in the future
Tangentially, this seemed like a good example for the conversation about merging standards. I don't have any conclusions here, just sharing: A heuristic I like to use for when a commit should be split in two is when the commit message includes
Squash&Merge would undo that clean separation, of course. We could split it into 2 PRs to get the same effect, though that wouldn't be my personal favorite flow. It feels like forcing some extra steps on us, just to use the github UI in a certain way. Also, the commits are clearly connected, and having them in the same PR does make some sense. (In this case, to validate that we are calculating the size correctly when using the new refactored So that brings us back to Rebase&Merge, though I think you said that doesn't include the PR #, which is something you value. I guess for me to get around that, I'd just rebase it locally and append the PR #s manually. Then you don't need the button at all. Though, I got the sense that some of us wouldn't like to use that flow locally, and maybe would prefer to just open a couple PRs and then Squash&Merge each one, which I don't see a problem with for them. As long as it doesn't take away my workflow. Happy to continue this async, as you notice examples and other pros/cons. I suppose if you really don't want to split this one into two commits, then my recommendation would be: look at it from a Release Notes and
and then put something like this in the body of the commit message:
|
While working on #1541, I realized that decoding doesn't handle some error case correctly (when "rlp list" contains extra bytes). This wasn't working correctly before either, so I would also be a fix. Regarding "merging standards", yes I totally agree that this should be 2 different PRs. I was just being lazy. I wanted to showcase how refactored code would be used for "size" as well, but in reality I should have just created another PR on top of this one. I consider the "feature" too small and of little importance to be necessarily separated into separate PR. But conceptually you are right, it should be separate commit and in my workflow that means separate PR. Separate PR also means a bit more extra work, but I think it comes with extra benefits:
With this being said, I removed "size" from this PR and will create new one once this one is merged in. |
An alternative approach to #1536. Now that I see both of them, I think I prefer this one a bit more.
What was wrong?
Transaction can be rlp encoded/decoded with (less common) or without (most common) additional rlp header.
Currently, we have to deal with less common case manually, which isn't great. As of now, we have two use cases for it: inside BlockBody and as block size inside eth_getBlockBy* calls (which is not implemented).
How was it fixed?
Added separate type
TransactionWithRlpHeader
that is just a wrapper aroundTransaction
with differentEncodable
/Decodable
implementation.To-Do