Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: transaction rlp encoding #1539

Merged
merged 3 commits into from
Oct 20, 2024

Conversation

morph-dev
Copy link
Collaborator

An alternative approach to #1536. Now that I see both of them, I think I prefer this one a bit more.

What was wrong?

Transaction can be rlp encoded/decoded with (less common) or without (most common) additional rlp header.

Currently, we have to deal with less common case manually, which isn't great. As of now, we have two use cases for it: inside BlockBody and as block size inside eth_getBlockBy* calls (which is not implemented).

How was it fixed?

Added separate type TransactionWithRlpHeader that is just a wrapper around Transaction with different Encodable/Decodable implementation.

To-Do

@morph-dev morph-dev self-assigned this Oct 19, 2024
Copy link
Member

@KolbyML KolbyML left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit: looks good. This method is a lot cleaner! I didn't think wrapper solution could be this clean as in the past I got a weird taste from our NodeId and Enr wrappers we used to have in the past.

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, looks great! I agree this is the better approach

ethportal-api/src/types/execution/block_body.rs Outdated Show resolved Hide resolved
Comment on lines +115 to +116
buf.advance(1);
BlobTransaction::decode(buf).map(Self::Blob)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason besides stylistic preference to use buf.advance(1) vs passing in &buf[1..]?

I'm pretty neutral on the choice, just curious if I missed anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, yes there is. Original implementation actually doesn't work correctly according to spec.

The documentation of the Decodable::decode says: buf must be advanced past the decoded object.

Passing &mut &buf[1..] would not modify buf at all. That's why we have to:

  1. advance buf by 1 to skip transaction type
  2. pass buf into *Transaction::decode which will update buf as it decodes the transaction

The "legacy transaction" is different in this regard, as it doesn't use 1 byte for transaction type (legacy reasons), but it's first byte is clearly different than any of the non-legacy types (this is handled by code in TransactionId::try_from).
So decoding logic is to peek at the first byte, figure out if it is legacy or not, and proceed accordingly.

Comment on lines +177 to +185
/// The RLP encoding of this type is done in a following way:
///
/// - Legacy transactions are always encoded in the same way: `rlp(tx)`.
/// - Other transactions are encoded as: `rlp(type + rlp(tx))` (where `+` represents concatenation)
/// - `type` is a single byte represending the transaction type
///
/// The basic type differs in the way it encodes non-legacy transaction. The basic type doesn't have
/// extra RLP wrapper around it, meaning it's just: `type + rlp(tx)`.
pub struct TransactionWithRlpHeader(pub Transaction);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great doc

I understand the name TransactionWithRlpHeader. I don't 100% love it, though I am not sure I've got a better one. I'll talk through my thought process for why, and offer some alternatives. Feel free to ignore it all, no response necessary. :)

For a first-time code reader to quickly grok the name, it sort of assumes that the reader is actively aware of the internal steps of RLP-encoding. (If it happens to be a byte-string, you can just toss a header in front of it, and then it's RLP-encoded).

In contrast, the docstring does a solid job of explaining: we re-encode typed transactions using RLP. This ensures that transactions are always valid RLP at the outermost layer. Whereas standard transaction encoding is explicitly not valid RLP for typed transactions. So maybe this struct could be named with this approach. Here are some names that might inspire you:

  • TransactionAlwaysRLP
  • DoublyRLPEncodedTypedTransaction
  • EnsureOuterRLPTransaction
  • TransactionForcedToRLPEncoding

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I also don't love my name, I like it more than alternatives that you provided.

I would also add that even with RLP header, encoding is weird and non-usual (compared to standard RLP encoding):

  • the type of RLP header is different between legacy and non-legacy tx
    • legacy tx has "list header"
    • non-legacy tx has "byte string header"
  • according to EIP-2718, non-legacy transaction doesn't have to be rlp encoded at all (see "Opaque byte array rather than an RLP array" on the linked page)
    • currently they, but EIP leaves space for that to change in the future

@carver
Copy link
Collaborator

carver commented Oct 20, 2024

Tangentially, this seemed like a good example for the conversation about merging standards. I don't have any conclusions here, just sharing:

A heuristic I like to use for when a commit should be split in two is when the commit message includes A + B, or if it would have two different conventional commit type:s. In this case, it seems to split cleanly into two commits:

refactor: cleanly handle differential transaction RLP encoding schemas
feat: add size to eth_getBlockBy*

Squash&Merge would undo that clean separation, of course. We could split it into 2 PRs to get the same effect, though that wouldn't be my personal favorite flow. It feels like forcing some extra steps on us, just to use the github UI in a certain way. Also, the commits are clearly connected, and having them in the same PR does make some sense. (In this case, to validate that we are calculating the size correctly when using the new refactored TransactionAlwaysRLP or whatever we call it)

So that brings us back to Rebase&Merge, though I think you said that doesn't include the PR #, which is something you value. I guess for me to get around that, I'd just rebase it locally and append the PR #s manually. Then you don't need the button at all.

Though, I got the sense that some of us wouldn't like to use that flow locally, and maybe would prefer to just open a couple PRs and then Squash&Merge each one, which I don't see a problem with for them. As long as it doesn't take away my workflow.

Happy to continue this async, as you notice examples and other pros/cons.


I suppose if you really don't want to split this one into two commits, then my recommendation would be: look at it from a Release Notes and git log --oneline point of view. Users watching release notes are going to be most interested in how using trin has changed as a result of the commit. refactor: is mostly a clue that devs should be interested and users can ignore it. But then they would miss the size change. So I'd call it just:

feat: add size to eth_getBlockBy* response

and then put something like this in the body of the commit message:

Includes a refactor: cleanly handle differential transaction RLP encoding standards

@morph-dev
Copy link
Collaborator Author

morph-dev commented Oct 20, 2024

While working on #1541, I realized that decoding doesn't handle some error case correctly (when "rlp list" contains extra bytes). This wasn't working correctly before either, so I would also be a fix.


Regarding "merging standards", yes I totally agree that this should be 2 different PRs. I was just being lazy.

I wanted to showcase how refactored code would be used for "size" as well, but in reality I should have just created another PR on top of this one. I consider the "feature" too small and of little importance to be necessarily separated into separate PR. But conceptually you are right, it should be separate commit and in my workflow that means separate PR.

Separate PR also means a bit more extra work, but I think it comes with extra benefits:

  1. cleaner git history, which I value more than a bit of extra work
    • this can also be achieved using manual rebase and manually adding PR number to the commit message, but that also requires extra manual work (manual rebase also tends to lose github comments)
  2. It's better to have separate PR and discuss only feature related change (e.g. should we add more tests, etc.)
    • with original approach, it's easy to omit this type of discussion because the size of "refactoring" dominates this PR in comparison to size of "feature"

With this being said, I removed "size" from this PR and will create new one once this one is merged in.

@morph-dev morph-dev changed the title refactor: transaction rlp encoding + size in eth_getBlockBy* refactor: transaction rlp encoding Oct 20, 2024
@morph-dev morph-dev merged commit 0866446 into ethereum:master Oct 20, 2024
9 checks passed
@morph-dev morph-dev deleted the rlp_transaction_2 branch October 20, 2024 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants