Add e2hs file format spec #368

njgheorghita · 2025-02-21T19:42:35Z

This is a proposal for a new file storage format to be used in the History network. The goals of this format...

unify pre & post merge data into a single format to simplify bridging logic
include proofs, so bridges don't need to manage overhead of generating proofs everytime they perform gossip
make post-merge receipts available, since no available era file formats contain receipts

This format will require additional architecture to generate new e2hs files for each finalized epoch, as well as some bridge work to support this format.

Maybe the only concern I can think of is that the new era files will be based on a portal-defined type, rather than more "established" native ethereum types. so, let's say we want to change HeaderWithProof at some point in the future, we will have to re-generate all of the files (though, imo after the recent union removal, this type has solidified).

Any feedback on the specifics or any pushback on using such a format would be great, there might be other short-comings to this idea that I haven't noticed.

KolbyML · 2025-02-21T19:53:35Z

history/e2hs.md

+    BlockTuple := CompressedHeader | CompressedBody | CompressedReceipts
+    -----
+    Version := {type: 0x3265,          data: nil}
+    CompressedHeader := {type: 0x03,   data: snappyFramed(rlp(HeaderWithProof))}


Since this is a new type it should use a unique type number

That's what I thought, but then I saw that era1 and e2ss use the same version number, so I wasn't sure what to do. But I agree it should be something unique

I think the only type that is reused is 0x03 for snappyFramed(rlp(header)), but it's the same for all 3 existing types: era, era1 and e2ss.

You define new type snappyFramed(rlp(HeaderWithProof)), so it should be different.

KolbyML · 2025-02-21T19:55:18Z

history/e2hs.md

+    CompressedHeader := {type: 0x03,   data: snappyFramed(rlp(HeaderWithProof))}
+    CompressedBody := {type: 0x04,     data: snappyFramed(rlp(BlockBody))}
+    CompressedReceipts := {type: 0x05, data: snappyFramed(rlp(Receipts))}
+    Accumulator := {type: 0x06,        data: hash_tree_root(List(block_hash, 8192))}


I would look into the shortfalls of historical roots and summaries, Beacon slots can be empty, so I am not sure if this ridge requirement would work for the other accumulators, so I am assuming this accumulator section is under specified?

Tbh I'm not entirely sure what the best approach is for this field. I'm not even sure it's necessary, since all the headers will have accompanying proofs that can be directly verified. I need to double-check exactly how it's handled, but even in era files they have a complete list of block_roots even if there are missed slots, so we can probably just copy their approach for handling missed slots. But I'll dig into this

KolbyML · 2025-02-21T20:01:15Z

Then my final question to everyone is does this spec belong in this repository. Here is a link which contains links to the specs of the 4 other pre-existing e2store formats https://eth-clients.github.io/history-endpoints/

Most links are to PR's to projects, or lined commit links. So I am not sure if this should belong in the portal-network-specs repo, if anything the specs should be consolidated into a new repo for e2store formats @arnetheduck @kdeme what do you guys think? There was also talks of better formalizing the types to avoid number reuse. I am interested in what people think on the matter.

njgheorghita · 2025-02-24T21:06:14Z

Also, just noting here that this spec will also need to handle the final, pre-merge truncated epoch, so that post-merge e2hs files align with era file boundaries

morph-dev

In general, I'm in favor of this direction

morph-dev · 2025-02-25T09:47:24Z

history/e2hs.md

+    BlockTuple := CompressedHeader | CompressedBody | CompressedReceipts
+    -----
+    Version := {type: 0x3265,          data: nil}
+    CompressedHeader := {type: 0x03,   data: snappyFramed(rlp(HeaderWithProof))}


I think the only type that is reused is 0x03 for snappyFramed(rlp(header)), but it's the same for all 3 existing types: era, era1 and e2ss.

You define new type snappyFramed(rlp(HeaderWithProof)), so it should be different.

morph-dev · 2025-02-25T09:54:00Z

history/e2hs.md

+    BlockTuple := CompressedHeader | CompressedBody | CompressedReceipts
+    -----
+    Version := {type: 0x3265,          data: nil}
+    CompressedHeader := {type: 0x03,   data: snappyFramed(rlp(HeaderWithProof))}


I would use snappyFramed(ssz(HeaderWithProof)), because we don't encode HeaderWithProof with rlp anywhere at the moment (but we do ssz encoding). And even if you would like to rlp encode it, i think it wouldn't work well.

I think BlockBody and Receipts are currently encoded both with rlp and ssz, depending on the context. But if we encode HeaderWithProof with ssz, I would also encode body and receitps with it as well (for consistency).

morph-dev · 2025-02-25T09:57:55Z

history/e2hs.md

+
+```
+    e2hs := Version | BlockTuple* | OtherEntry* | Accumulator | BlockIndex
+    BlockTuple := CompressedHeader | CompressedBody | CompressedReceipts


in comparison to era1 format, we lost total_difficulty. Is reasoning that we don't need it now that proofs are already part of the CompressedHeader?
Would we need it in some other context?

morph-dev · 2025-02-25T10:08:07Z

history/e2hs.md

+The file format is defined as:
+
+```
+    e2hs := Version | BlockTuple* | OtherEntry* | Accumulator | BlockIndex


What is OtherEntry?

morph-dev · 2025-02-25T10:11:37Z

Also, just noting here that this spec will also need to handle the final, pre-merge truncated epoch, so that post-merge e2hs files align with era file boundaries

Why do we have to align e2gs with era file boundaries? My understanding is that era is aligned with 8192 slots, not blocks.

So if we want one clean type, primarily for portal network usage, I would suggest that:

each file will have exactly 8192 blocks (not slots), potentially spanning over critical fork threshold (e.g merge)
encode everything as ssz
we remove accumulator and maybe some other fields if they are not needed (e.g. BlockIndex can be just starting block number)

Basically, it would be each of the HistoryContentValue types, encoded as they are in portal-spec (ssz). And potentially some other meta-data, like starting index, total_difficulty, etc.

njgheorghita · 2025-02-25T17:17:49Z

Why do we have to align e2hs with era file boundaries? My understanding is that era is aligned with 8192 slots, not blocks.

Ahh, that's a fair point. The only strong argument I can see for "aligning" with era files is to deal with HEAD - 8192 - or non-ephemeral latest (aka we want to be able to generate a new e2hs file as soon as each epoch is finalized). My assumption is that we want to gossip this content out to the network asap (specifically HeaderWithProofs as the Bodies & Receipts will already be gossiped). But, taking your comment into account... There is no way to "align" with era files 100% unless we switch to a slot-based period post-merge (not a good idea imo). And, in terms of handling the "latest" available HeaderWithProofs... we can just handle these cases in our "latest" bridge (idk, maybe expand the "latest" bridge to gossip all ephemeral data & the latest finalized epoch HeaderWithProofs), rather than force this storage format to accommodate the edge cases. A clean, simple storage format does seem preferable.

I'm ok with ssz-ing everything, I think @KolbyML might have preferred rlp-ing receipts/bodies in a side chat?

I'm a little unclear in my understanding of the purpose of the BlockIndex. In geth's era1 spec:

BlockIndex stores relative offsets to each compressed block entry.

Which I would understand as an index for each block is necessary for indexing directly to a specific block without iterating over the whole file. Though, looking at our era1 tooling, we don't really take advantage of this at all, and deser the whole file before indexing into a specific block tuple. This seems to work fine in our use case, but maybe it's worth leaving the indices in and improving our lookup logic?

I can't think of any reason why the Accumulator is necessary for our purposes. Each header already has an accompanying proof. It's kind of nice to have a hash in the filename, maybe this helps guard against people downloading "fake" e2hs files, but I don't think that's so true, since anyone can use a valid hash in the filename and invalid data in the file. Maybe we can update the hash in the file name to be a hash of the entire file contents? so it can be verified after being downloaded?

KolbyML · 2025-02-25T17:22:50Z

I'm ok with ssz-ing everything, I think @KolbyML might have preferred rlp-ing receipts/bodies in a side chat?

I just want to do whatever is less work. The nice thing about rlp is we know the format will never change. But maybe we will change our format for ssz encoding them?

I am fine either way, ssz-ing everything just seemed like it was more work, for little to no gain. But if there is a gain I think we should do it, it wasn't apparent to me there would be value, expessially as

Our ssz bodies and receipts are just wrappers around rlp, it seemed pretty pointless, when we already have the infrastructure in place, why create custom types for something where there is already a good solution i.e. why reinvent the wheel when nothing is broken.

For HeaderWithProof it makes way more sense for it to be in ssz, as we don't have code to encode/decode rlp for it. HeaderWithProof is far more ssz native, hence why I said what I did

morph-dev · 2025-02-25T18:11:47Z

Regarding BlockIndex, you are right. I just forgot their use case. But we should keep them in.

Regarding rlp vs ssz, I'm not strongly set one way or the other... The benefits of ssz:

Consistency between types
If they are ssz, they should be identical to HistoryContentValue. And can potentially be gossiped (seems to be the main use case) directly, even without decoding

add e2hs file format spec

e136678

KolbyML reviewed Feb 21, 2025

View reviewed changes

morph-dev reviewed Feb 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add e2hs file format spec #368

Add e2hs file format spec #368

njgheorghita commented Feb 21, 2025

KolbyML Feb 21, 2025

njgheorghita Feb 21, 2025

morph-dev Feb 25, 2025

KolbyML Feb 21, 2025 •

edited

Loading

njgheorghita Feb 21, 2025

KolbyML commented Feb 21, 2025

njgheorghita commented Feb 24, 2025

morph-dev left a comment

morph-dev Feb 25, 2025

morph-dev Feb 25, 2025

morph-dev Feb 25, 2025

morph-dev Feb 25, 2025

morph-dev commented Feb 25, 2025

njgheorghita commented Feb 25, 2025

KolbyML commented Feb 25, 2025 •

edited

Loading

morph-dev commented Feb 25, 2025

Add e2hs file format spec #368

Are you sure you want to change the base?

Add e2hs file format spec #368

Conversation

njgheorghita commented Feb 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KolbyML Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KolbyML commented Feb 21, 2025

njgheorghita commented Feb 24, 2025

morph-dev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morph-dev commented Feb 25, 2025

njgheorghita commented Feb 25, 2025

KolbyML commented Feb 25, 2025 • edited Loading

morph-dev commented Feb 25, 2025

KolbyML Feb 21, 2025 •

edited

Loading

KolbyML commented Feb 25, 2025 •

edited

Loading