Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: prefetch next block state concurrently #19328

Merged
merged 2 commits into from
Apr 2, 2019

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Mar 25, 2019

I need the clean cache fix (#19307) in first, then this PR can be benchmarked on top.

@karalabe karalabe requested a review from holiman as a code owner March 25, 2019 10:49
@holiman
Copy link
Contributor

holiman commented Mar 25, 2019

Interesting idea, looking forward to some benchmarks -- might go either way, since we'll effectively execute most things twice.

I'm not 100% sure that there won't be any concurrent access errors for some cache somewhere, personally.

@karalabe
Copy link
Member Author

There should be no error, because this essentially emulates doing a CALL for each transaction and throwing away the results. If this fails, our RPC APIs are also borked. Or fast sync.

As for the benchmarks, I'm genuinely curious what happens. Restarted mon08/09 with your PoC cleaner PR (09) and this one on top (08) in full sync mode. Lets see what happens.

@karalabe karalabe force-pushed the preload branch 4 times, most recently from 20a6f37 to 63b8e87 Compare March 25, 2019 13:40
@Matthalp-zz
Copy link
Contributor

I expect this to produce a pretty nice performance win! Doing a few pprof runs a few days ago showed that account storage trie nodes were the biggest bottleneck throughout block processing. Performing speculative execution of a block should be able to do a good job prefetching a lot of these nodes. I do wonder how doing the speculation at a transaction-level granularity compares to a block-level granularity.

I played around with a similar idea for doing prefetching just for the transaction sender account data, recipient account data, and the recipient contract code and didn't get outstanding results. This was for a few reason:
(1) I tried to prefetch the data for too many blocks at once that basically evicted too much good data from the cache. I think your approach with peek() will not run into this issue.
(2) It focused on the account state trie when the bottleneck is the individual account storage trie.

My only knit would be to call it a Prefetcher instead of a Precacher.

@karalabe
Copy link
Member Author

It's a bit hard to quantify the win, I guess the amount of read/write cache directly competes with the optimizations introduced here. For a 4GB archive sync, I think it took until block 4.xM until this PR was visible. I'm running a --cache=2048 full sync now too, which is also a tad better, but not too relevantly. I do think this PR is a good idea, but we need to take a closer look as to exactly what happens in the trie loads to tune it properly.

Regarding the bottlenecks, we've added some new metrics (a bit flawed, but useful) that show exactly what evm execution spends its time on.

This PR on an archive sync:

Screenshot from 2019-03-27 10-49-11

This PR on full sync:

Screenshot from 2019-03-27 10-49-30

@karalabe
Copy link
Member Author

3 day mark followup

Screenshot from 2019-03-29 10-23-16

After crossing over Byzantium (and Ethereum picking up tx volume), this PR seems to produce around a 25% performance gain. Lets see how it evolves towards the head of the chain, but looks promising. Hoped for a bit more really, but not complaining like this either. Perhaps we need some closer investigation as to exactly what's the bottleneck now (just to see if this PR indeed cannot do more, or if it can be tweaked further).

One thing to investigate is how frequently the concurrent execution is aborted prematurely. It might happen that moving the interruption around a bit gives it more time to cache more useful data. Maybe not, we need a number on it.

@matthalp I did transaction level granularity a while back, but that doesn't seem to have worked too well because a single tx is relatively light. So concurrently processing them frequently hits issues where the "current" one is done fast, so the background execution is just pointless. The block version is less "optimal" but it's a bit more stable imho.

An alternative would be to try a mixture of both, concurrently prefetch hopefully useful data from the peeked next block, and at the same time prefetch probably useful data from the current block's future transactions.

That said, benchmark, benchmark, benchmark :P It's a PITA to do these optimizations.

@karalabe karalabe changed the title core: pre-cache followup block concurrently core: prefetch next block state concurrently Mar 29, 2019
@karalabe karalabe added this to the 1.9.0 milestone Mar 29, 2019
@holiman
Copy link
Contributor

holiman commented Mar 29, 2019

It generally looks good, but I think I would prefer a CLI switch to disable this functionality. There might be reasons to not run this, such as,

  • On a CPU- constrained device, would prefer somewhat slower blocks rather than executing everything twice,
  • While debugging, less paralellism
  • If there's some internal flaw/corruption, we might want to rule out this paralell executor as a source

@karalabe
Copy link
Member Author

karalabe commented Apr 1, 2019

Final state before VMs went OOD: PR was about 18 hours ahead of master at about 25% faster, lowering the average block execution time from 200ms to 150.

Screenshot from 2019-04-01 10-57-30

@karalabe
Copy link
Member Author

karalabe commented Apr 1, 2019

@holiman PTAL

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@Matthalp-zz Matthalp-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants