refactor: transaction rlp encoding #1539

morph-dev · 2024-10-19T17:47:34Z

An alternative approach to #1536. Now that I see both of them, I think I prefer this one a bit more.

What was wrong?

Transaction can be rlp encoded/decoded with (less common) or without (most common) additional rlp header.

Currently, we have to deal with less common case manually, which isn't great. As of now, we have two use cases for it: inside BlockBody and as block size inside eth_getBlockBy* calls (which is not implemented).

How was it fixed?

Added separate type TransactionWithRlpHeader that is just a wrapper around Transaction with different Encodable/Decodable implementation.

To-Do

Clean up commit history and use conventional commits.

KolbyML

looks good. This method is a lot cleaner! I didn't think wrapper solution could be this clean as in the past I got a weird taste from our NodeId and Enr wrappers we used to have in the past.

carver

Yup, looks great! I agree this is the better approach

ethportal-api/src/types/execution/block_body.rs

carver · 2024-10-20T05:04:26Z

ethportal-api/src/types/execution/transaction.rs

+                buf.advance(1);
+                BlobTransaction::decode(buf).map(Self::Blob)


Is there any reason besides stylistic preference to use buf.advance(1) vs passing in &buf[1..]?

I'm pretty neutral on the choice, just curious if I missed anything.

Actually, yes there is. Original implementation actually doesn't work correctly according to spec.

The documentation of the Decodable::decode says: buf must be advanced past the decoded object.

Passing &mut &buf[1..] would not modify buf at all. That's why we have to:

advance buf by 1 to skip transaction type

pass buf into *Transaction::decode which will update buf as it decodes the transaction

The "legacy transaction" is different in this regard, as it doesn't use 1 byte for transaction type (legacy reasons), but it's first byte is clearly different than any of the non-legacy types (this is handled by code in TransactionId::try_from).
So decoding logic is to peek at the first byte, figure out if it is legacy or not, and proceed accordingly.

carver · 2024-10-20T05:06:29Z

ethportal-api/src/types/execution/transaction.rs

+/// The RLP encoding of this type is done in a following way:
+///
+/// - Legacy transactions are always encoded in the same way: `rlp(tx)`.
+/// - Other transactions are encoded as: `rlp(type + rlp(tx))` (where `+` represents concatenation)
+///   - `type` is a single byte represending the transaction type
+///
+/// The basic type differs in the way it encodes non-legacy transaction. The basic type doesn't have
+/// extra RLP wrapper around it, meaning it's just: `type + rlp(tx)`.
+pub struct TransactionWithRlpHeader(pub Transaction);


Great doc

I understand the name TransactionWithRlpHeader. I don't 100% love it, though I am not sure I've got a better one. I'll talk through my thought process for why, and offer some alternatives. Feel free to ignore it all, no response necessary. :)

For a first-time code reader to quickly grok the name, it sort of assumes that the reader is actively aware of the internal steps of RLP-encoding. (If it happens to be a byte-string, you can just toss a header in front of it, and then it's RLP-encoded).

In contrast, the docstring does a solid job of explaining: we re-encode typed transactions using RLP. This ensures that transactions are always valid RLP at the outermost layer. Whereas standard transaction encoding is explicitly not valid RLP for typed transactions. So maybe this struct could be named with this approach. Here are some names that might inspire you:

TransactionAlwaysRLP

DoublyRLPEncodedTypedTransaction

EnsureOuterRLPTransaction

TransactionForcedToRLPEncoding

While I also don't love my name, I like it more than alternatives that you provided.

I would also add that even with RLP header, encoding is weird and non-usual (compared to standard RLP encoding):

the type of RLP header is different between legacy and non-legacy tx

legacy tx has "list header"

non-legacy tx has "byte string header"

according to EIP-2718, non-legacy transaction doesn't have to be rlp encoded at all (see "Opaque byte array rather than an RLP array" on the linked page)

currently they, but EIP leaves space for that to change in the future

carver · 2024-10-20T06:29:09Z

Tangentially, this seemed like a good example for the conversation about merging standards. I don't have any conclusions here, just sharing:

A heuristic I like to use for when a commit should be split in two is when the commit message includes A + B, or if it would have two different conventional commit type:s. In this case, it seems to split cleanly into two commits:

refactor: cleanly handle differential transaction RLP encoding schemas
feat: add size to eth_getBlockBy*

Squash&Merge would undo that clean separation, of course. We could split it into 2 PRs to get the same effect, though that wouldn't be my personal favorite flow. It feels like forcing some extra steps on us, just to use the github UI in a certain way. Also, the commits are clearly connected, and having them in the same PR does make some sense. (In this case, to validate that we are calculating the size correctly when using the new refactored TransactionAlwaysRLP or whatever we call it)

So that brings us back to Rebase&Merge, though I think you said that doesn't include the PR #, which is something you value. I guess for me to get around that, I'd just rebase it locally and append the PR #s manually. Then you don't need the button at all.

Though, I got the sense that some of us wouldn't like to use that flow locally, and maybe would prefer to just open a couple PRs and then Squash&Merge each one, which I don't see a problem with for them. As long as it doesn't take away my workflow.

Happy to continue this async, as you notice examples and other pros/cons.

I suppose if you really don't want to split this one into two commits, then my recommendation would be: look at it from a Release Notes and git log --oneline point of view. Users watching release notes are going to be most interested in how using trin has changed as a result of the commit. refactor: is mostly a clue that devs should be interested and users can ignore it. But then they would miss the size change. So I'd call it just:

feat: add size to eth_getBlockBy* response

and then put something like this in the body of the commit message:

Includes a refactor: cleanly handle differential transaction RLP encoding standards

morph-dev · 2024-10-20T09:28:05Z

While working on #1541, I realized that decoding doesn't handle some error case correctly (when "rlp list" contains extra bytes). This wasn't working correctly before either, so I would also be a fix.

Regarding "merging standards", yes I totally agree that this should be 2 different PRs. I was just being lazy.

I wanted to showcase how refactored code would be used for "size" as well, but in reality I should have just created another PR on top of this one. I consider the "feature" too small and of little importance to be necessarily separated into separate PR. But conceptually you are right, it should be separate commit and in my workflow that means separate PR.

Separate PR also means a bit more extra work, but I think it comes with extra benefits:

cleaner git history, which I value more than a bit of extra work
- this can also be achieved using manual rebase and manually adding PR number to the commit message, but that also requires extra manual work (manual rebase also tends to lose github comments)
It's better to have separate PR and discuss only feature related change (e.g. should we add more tests, etc.)
- with original approach, it's easy to omit this type of discussion because the size of "refactoring" dominates this PR in comparison to size of "feature"

With this being said, I removed "size" from this PR and will create new one once this one is merged in.

morph-dev requested review from carver, njgheorghita and KolbyML October 19, 2024 17:47

morph-dev self-assigned this Oct 19, 2024

refactor: transaction rlp encoding + size in eth_getBlockBy*

5886366

morph-dev force-pushed the rlp_transaction_2 branch from 145dd2b to 5886366 Compare October 19, 2024 17:49

KolbyML approved these changes Oct 19, 2024

View reviewed changes

carver approved these changes Oct 20, 2024

View reviewed changes

morph-dev mentioned this pull request Oct 20, 2024

refactor: transaction rlp encoding #1536

Closed

1 task

morph-dev added 2 commits October 20, 2024 12:08

fix: handle rlp decoding error (wrong list length)

18681cb

revert: size in eth_getBlockBy*

8ed6ed9

morph-dev changed the title refactor: transaction rlp encoding + size in eth_getBlockBy* refactor: transaction rlp encoding Oct 20, 2024

morph-dev merged commit 0866446 into ethereum:master Oct 20, 2024
9 checks passed

morph-dev deleted the rlp_transaction_2 branch October 20, 2024 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: transaction rlp encoding #1539

refactor: transaction rlp encoding #1539

morph-dev commented Oct 19, 2024

KolbyML left a comment •

edited

Loading

carver left a comment

carver Oct 20, 2024

morph-dev Oct 20, 2024

carver Oct 20, 2024

morph-dev Oct 20, 2024

carver commented Oct 20, 2024

morph-dev commented Oct 20, 2024 •

edited

Loading

refactor: transaction rlp encoding #1539

refactor: transaction rlp encoding #1539

Conversation

morph-dev commented Oct 19, 2024

What was wrong?

How was it fixed?

To-Do

KolbyML left a comment • edited Loading

Choose a reason for hiding this comment

carver left a comment

Choose a reason for hiding this comment

carver Oct 20, 2024

Choose a reason for hiding this comment

morph-dev Oct 20, 2024

Choose a reason for hiding this comment

carver Oct 20, 2024

Choose a reason for hiding this comment

morph-dev Oct 20, 2024

Choose a reason for hiding this comment

carver commented Oct 20, 2024

morph-dev commented Oct 20, 2024 • edited Loading

KolbyML left a comment •

edited

Loading

morph-dev commented Oct 20, 2024 •

edited

Loading