Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core, eth: improve delivery speed on header requests #23105

Merged
merged 9 commits into from
Dec 7, 2021

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Jun 24, 2021

This PR reduces the amount of work we do when answering header queries, e.g. when a peer is syncing from us.

  • For some items, e.g block bodies, when we read the rlp-data from database, we plug it directly into the response package. We didn't do that for headers, but instead read headers-rlp, decode to types.Header, and re-encode to rlp. This PR changes that to keep it in RLP-form as much as possible.
  • When a node is syncing from us, it typically requests 192 contiguous headers. On master it has the following effect:
  1. For headers not in ancient: 2 db lookups. One for translating hash->number (even though the request is by number), and another for reading by hash (this latter one is sometimes cached).
  2. For headers in ancient: 1 file lookup/syscall for translating hash->number (even though the request is by number), and another for reading the header itself. After this, it also performes a hashing of the header, to ensure that the hash is what it expected.
    In this PR, I instead move the logic for "give me a sequence of blocks" into the lower layers, where the database can determine how and what to read from leveldb and/or ancients.

There are basically four types of requests; three of them are improved this way. The fourth, by hash going backwards, is more tricky to optimize. However, since we know that the gap is 0, we can look up by the parentHash, and stlil shave off all the number->hash lookups.

The gapped collection can be optimized similarly, as a follow-up, at least in three out of four cases.

@holiman
Copy link
Contributor Author

holiman commented Jun 24, 2021

This may actually make the performance a bit worse, in the case that a peer is 3-4 blocks behind us, and wants to catch up: requesting a few headers close to our tip (or beyond). Previously, we would have the headers in object-form in a header cache, whereas this PR makes them read from db. I'll look into it

@holiman
Copy link
Contributor Author

holiman commented Jun 24, 2021

This may actually make the performance a bit worse, in the case that a peer is 3-4 blocks behind us, and wants to catch up:

Pushed a commit to use the headerchain cache better.

core/headerchain.go Outdated Show resolved Hide resolved
core/headerchain.go Outdated Show resolved Hide resolved
core/rawdb/accessors_chain.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor Author

holiman commented Jul 6, 2021

Triage discussion: probably safe, but we should hold with this until after the London release.

core/types/block.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor Author

holiman commented Jul 6, 2021

TODO (not in this PR), add basefee check to sanityCheck

core/types/block.go Outdated Show resolved Hide resolved
core/types/block.go Outdated Show resolved Hide resolved
@karalabe karalabe added this to the 1.10.7 milestone Aug 3, 2021
@karalabe karalabe self-assigned this Aug 3, 2021
@karalabe karalabe modified the milestones: 1.10.7, 1.10.8 Aug 10, 2021
@holiman holiman force-pushed the faster_headers branch 2 times, most recently from 7cecb9c to 5d34b01 Compare August 13, 2021 13:00
@holiman
Copy link
Contributor Author

holiman commented Aug 13, 2021

Rebased on top of the eth/66 changes that got merged. This still does not make use of the faster ancient accessors though

@fjl fjl modified the milestones: 1.10.8, 1.10.9 Aug 24, 2021
@holiman
Copy link
Contributor Author

holiman commented Sep 8, 2021

Rebased again. Also, tests are added to check the correctness of the split leveldb/ancient loader

@holiman
Copy link
Contributor Author

holiman commented Sep 21, 2021

This is now rebased on top of #23566, which should go in first

@fjl fjl modified the milestones: 1.10.9, 1.10.10 Sep 29, 2021
@holiman holiman modified the milestones: 1.10.11, 1.10.12 Oct 20, 2021
core/types/block.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor Author

holiman commented Oct 25, 2021

Rebased + robustified

@karalabe karalabe modified the milestones: 1.10.12, 1.10.13 Nov 8, 2021
@holiman holiman modified the milestones: 1.10.13, 1.10.14 Nov 24, 2021
@holiman
Copy link
Contributor Author

holiman commented Nov 29, 2021

Rebased (Part V)

@holiman
Copy link
Contributor Author

holiman commented Nov 30, 2021

I tested this a bit on a live node. On my node, I made it

  • When the 'new' code path was requested, it did both a old-style lookup and a new-style lookup
  • When doing so, it randomized: sometimes doing old style first, sometimes doing new-style first,
  • After pulling both answers, it compared the byte output of the two, to see if they matched.

Example:

INFO [11-30|11:28:29.065] Served block headers                     newtime=1.877586ms  oldtime=8.107161ms  order="new version first" reverse=false skip=0 amount=192 origin="{Hash:0x0000000000000000000000000000000000000000000000000000000000000000 Number:8166928}"

I then collected some stats about the performance.

// Old version first
// Total old: 1.420655265s, Total new: 131.683183ms, count 263, avg old: 5.401731ms, avg new: 500.696µs
// New version first
// Total old: 1.440066122s, Total new: 116.201786ms, count 253, avg old: 5.69196ms, avg new: 459.295µs

It doesn't matter a whole lot which handler was first, for serving 192 headers on an intel NUC with SSD, the current handler takes ~5.5ms, the one in this PR takes about 0.5ms, so one order of magnitude faster.

Also, no difference in the two responses have been found so far

@holiman
Copy link
Contributor Author

holiman commented Nov 30, 2021

After letting it run for a bit longer:

Total old: 7.335648596s, Total new: 709.078148ms, count 1208, avg old: 6.072556ms, avg new: 586.985µs

@fjl fjl self-assigned this Dec 7, 2021
@fjl fjl merged commit db03faa into ethereum:master Dec 7, 2021
ivanshukhov pushed a commit to Kapoio/go-kapoio that referenced this pull request Dec 11, 2021
This PR reduces the amount of work we do when answering header queries, e.g. when a peer
is syncing from us.

For some items, e.g block bodies, when we read the rlp-data from database, we plug it
directly into the response package. We didn't do that for headers, but instead read
headers-rlp, decode to types.Header, and re-encode to rlp. This PR changes that to keep it
in RLP-form as much as possible. When a node is syncing from us, it typically requests 192
contiguous headers. On master it has the following effect:

- For headers not in ancient: 2 db lookups. One for translating hash->number (even though
  the request is by number), and another for reading by hash (this latter one is sometimes
  cached).

- For headers in ancient: 1 file lookup/syscall for translating hash->number (even though
  the request is by number), and another for reading the header itself. After this, it
  also performes a hashing of the header, to ensure that the hash is what it expected. In
  this PR, I instead move the logic for "give me a sequence of blocks" into the lower
  layers, where the database can determine how and what to read from leveldb and/or
  ancients.

There are basically four types of requests; three of them are improved this way. The
fourth, by hash going backwards, is more tricky to optimize. However, since we know that
the gap is 0, we can look up by the parentHash, and stlil shave off all the number->hash
lookups.

The gapped collection can be optimized similarly, as a follow-up, at least in three out of
four cases.

Co-authored-by: Felix Lange <[email protected]>

(cherry picked from commit db03faa)
JacekGlen pushed a commit to JacekGlen/go-ethereum that referenced this pull request May 26, 2022
This PR reduces the amount of work we do when answering header queries, e.g. when a peer
is syncing from us.

For some items, e.g block bodies, when we read the rlp-data from database, we plug it
directly into the response package. We didn't do that for headers, but instead read
headers-rlp, decode to types.Header, and re-encode to rlp. This PR changes that to keep it
in RLP-form as much as possible. When a node is syncing from us, it typically requests 192
contiguous headers. On master it has the following effect:

- For headers not in ancient: 2 db lookups. One for translating hash->number (even though
  the request is by number), and another for reading by hash (this latter one is sometimes
  cached).
  
- For headers in ancient: 1 file lookup/syscall for translating hash->number (even though
  the request is by number), and another for reading the header itself. After this, it
  also performes a hashing of the header, to ensure that the hash is what it expected. In
  this PR, I instead move the logic for "give me a sequence of blocks" into the lower
  layers, where the database can determine how and what to read from leveldb and/or
  ancients.

There are basically four types of requests; three of them are improved this way. The
fourth, by hash going backwards, is more tricky to optimize. However, since we know that
the gap is 0, we can look up by the parentHash, and stlil shave off all the number->hash
lookups.

The gapped collection can be optimized similarly, as a follow-up, at least in three out of
four cases.

Co-authored-by: Felix Lange <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants