core: prefetch next block state concurrently #19328

karalabe · 2019-03-25T10:49:12Z

I need the clean cache fix (#19307) in first, then this PR can be benchmarked on top.

holiman · 2019-03-25T11:54:29Z

Interesting idea, looking forward to some benchmarks -- might go either way, since we'll effectively execute most things twice.

I'm not 100% sure that there won't be any concurrent access errors for some cache somewhere, personally.

karalabe · 2019-03-25T12:21:55Z

There should be no error, because this essentially emulates doing a CALL for each transaction and throwing away the results. If this fails, our RPC APIs are also borked. Or fast sync.

As for the benchmarks, I'm genuinely curious what happens. Restarted mon08/09 with your PoC cleaner PR (09) and this one on top (08) in full sync mode. Lets see what happens.

Matthalp-zz · 2019-03-26T17:58:57Z

I expect this to produce a pretty nice performance win! Doing a few pprof runs a few days ago showed that account storage trie nodes were the biggest bottleneck throughout block processing. Performing speculative execution of a block should be able to do a good job prefetching a lot of these nodes. I do wonder how doing the speculation at a transaction-level granularity compares to a block-level granularity.

I played around with a similar idea for doing prefetching just for the transaction sender account data, recipient account data, and the recipient contract code and didn't get outstanding results. This was for a few reason:
(1) I tried to prefetch the data for too many blocks at once that basically evicted too much good data from the cache. I think your approach with peek() will not run into this issue.
(2) It focused on the account state trie when the bottleneck is the individual account storage trie.

My only knit would be to call it a Prefetcher instead of a Precacher.

karalabe · 2019-03-27T08:51:40Z

It's a bit hard to quantify the win, I guess the amount of read/write cache directly competes with the optimizations introduced here. For a 4GB archive sync, I think it took until block 4.xM until this PR was visible. I'm running a --cache=2048 full sync now too, which is also a tad better, but not too relevantly. I do think this PR is a good idea, but we need to take a closer look as to exactly what happens in the trie loads to tune it properly.

Regarding the bottlenecks, we've added some new metrics (a bit flawed, but useful) that show exactly what evm execution spends its time on.

This PR on an archive sync:

This PR on full sync:

karalabe · 2019-03-29T08:31:05Z

3 day mark followup

After crossing over Byzantium (and Ethereum picking up tx volume), this PR seems to produce around a 25% performance gain. Lets see how it evolves towards the head of the chain, but looks promising. Hoped for a bit more really, but not complaining like this either. Perhaps we need some closer investigation as to exactly what's the bottleneck now (just to see if this PR indeed cannot do more, or if it can be tweaked further).

One thing to investigate is how frequently the concurrent execution is aborted prematurely. It might happen that moving the interruption around a bit gives it more time to cache more useful data. Maybe not, we need a number on it.

@matthalp I did transaction level granularity a while back, but that doesn't seem to have worked too well because a single tx is relatively light. So concurrently processing them frequently hits issues where the "current" one is done fast, so the background execution is just pointless. The block version is less "optimal" but it's a bit more stable imho.

An alternative would be to try a mixture of both, concurrently prefetch hopefully useful data from the peeked next block, and at the same time prefetch probably useful data from the current block's future transactions.

That said, benchmark, benchmark, benchmark :P It's a PITA to do these optimizations.

holiman · 2019-03-29T11:13:20Z

It generally looks good, but I think I would prefer a CLI switch to disable this functionality. There might be reasons to not run this, such as,

On a CPU- constrained device, would prefer somewhat slower blocks rather than executing everything twice,
While debugging, less paralellism
If there's some internal flaw/corruption, we might want to rule out this paralell executor as a source

karalabe · 2019-04-01T07:58:19Z

Final state before VMs went OOD: PR was about 18 hours ahead of master at about 25% faster, lowering the average block execution time from 200ms to 150.

karalabe · 2019-04-01T08:53:02Z

@holiman PTAL

holiman

LGTM

Matthalp-zz

LGTM

karalabe requested a review from holiman as a code owner March 25, 2019 10:49

karalabe force-pushed the preload branch 4 times, most recently from 20a6f37 to 63b8e87 Compare March 25, 2019 13:40

karalabe force-pushed the preload branch from 63b8e87 to a4a1cc6 Compare March 26, 2019 18:05

karalabe force-pushed the preload branch from a4a1cc6 to 84cb599 Compare March 29, 2019 09:59

karalabe changed the title ~~core: pre-cache followup block concurrently~~ core: prefetch next block state concurrently Mar 29, 2019

karalabe added this to the 1.9.0 milestone Mar 29, 2019

karalabe force-pushed the preload branch from 84cb599 to e3dd164 Compare March 29, 2019 10:45

karalabe added 2 commits April 1, 2019 11:06

core: prefetch next block state concurrently

bb9631c

cmd, core, eth: support disabling the concurrent state prefetcher

ed34a5e

karalabe force-pushed the preload branch from e3dd164 to ed34a5e Compare April 1, 2019 08:52

holiman approved these changes Apr 2, 2019

View reviewed changes

Matthalp-zz approved these changes Apr 2, 2019

View reviewed changes

karalabe merged commit e14f8a4 into ethereum:master Apr 2, 2019

ramilexe mentioned this pull request Jun 17, 2019

Can not apply transactions with incremented nonce when NoPrefetch=false #19723

Closed

ehnuje mentioned this pull request Dec 30, 2020

blockchain: first implementation of state prefetcher klaytn/klaytn#832

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: prefetch next block state concurrently #19328

core: prefetch next block state concurrently #19328

karalabe commented Mar 25, 2019 •

edited

Loading

holiman commented Mar 25, 2019

karalabe commented Mar 25, 2019

Matthalp-zz commented Mar 26, 2019

karalabe commented Mar 27, 2019

karalabe commented Mar 29, 2019

holiman commented Mar 29, 2019

karalabe commented Apr 1, 2019

karalabe commented Apr 1, 2019

holiman left a comment

Matthalp-zz left a comment

core: prefetch next block state concurrently #19328

core: prefetch next block state concurrently #19328

Conversation

karalabe commented Mar 25, 2019 • edited Loading

holiman commented Mar 25, 2019

karalabe commented Mar 25, 2019

Matthalp-zz commented Mar 26, 2019

karalabe commented Mar 27, 2019

karalabe commented Mar 29, 2019

holiman commented Mar 29, 2019

karalabe commented Apr 1, 2019

karalabe commented Apr 1, 2019

holiman left a comment

Choose a reason for hiding this comment

Matthalp-zz left a comment

Choose a reason for hiding this comment

karalabe commented Mar 25, 2019 •

edited

Loading