Lodestar tree hashing performance #2206

dapplion · 2021-03-19T11:35:52Z

Purpose: profiling our state transition function computing the hashTreeRoot of state and blocks has shown to take significant chunk of total time.

I've compared a benchmark that Lighthouse has on their tree hash lib with our tree backed types. There seems to be some room for improvement on our side. I'm trying to get a benchmark of their hashing lib vs our to help narrow down where is the performance difference coming from.

Rust bench: https://github.com/sigp/lighthouse/blob/stable/consensus/tree_hash/benches/benches.rs

Forced single core (ms)

test	LH (no_cache)	LH (empty_cache)	Lodestar
minimal/100000	511.34	517.57	3144.3 (x6.1)
minimal/300000	1441.1	1550.1	8547.3 (x5.5)
mainnet/100000	574.10	524.31	2746.4 (x5.2)
mainnet/300000	1498.5	1529.3	8815.0 (x5.7)

Multi-threading (8 cores) (ms)

test	LH (no_cache)	LH (empty_cache)	Lodestar
minimal/100000	531.99	227.35	3144.3
minimal/300000	1303.2	675.44	8547.3
mainnet/100000	479.16	267.50	2746.4
mainnet/300000	1417.3	711.00	8815.0

Note on profiling: Doing a profile of hashing is not useful because the resolution is too low. By default the sampling is 1ms, while a fast sha256 implementation can do 1000 hashes in 1ms. I've tried setting --cpu-prof interval 1 to increase the sampling frequency x1000 but seems to have no effect on the resulting profile.

The text was updated successfully, but these errors were encountered:

dapplion · 2021-03-19T11:44:31Z

To profile our hashTreeRoot I've modified persistent-merkle-tree/hash.ts:4 in three ways:

id	code	avg ms for mainnet/100000 (20 runs)
original	`return SHA256.digest(Buffer.concat([a, b]));`	2922.8
no concat	`return SHA256.digest(precomputed64b);`	2615.6
no hash	`return a;`	34.55

dapplion · 2021-03-21T16:27:44Z

After more tests with different approaches seems that Buffer.concat is that fastest way to go. Alternatives:

Writing roots a, b directly to the wasm memory buffer.

Has almost same performance if not a bit worse, +5%.
I have not tested memory efficiency between both approaches

updateRoot(a, b) {
    const INPUT_LENGTH = this.ctx.INPUT_LENGTH;
    const input = new Uint8Array(this.ctx.memory.buffer, this.ctx.input.value, INPUT_LENGTH);
    input.set(a);
    input.set(b, 32);
    this.ctx.update(this.ctx.input.value, 64);
    return this;
}

dapplion · 2021-03-21T16:29:47Z

Just as curiosity I compared as-sha256 with bcrypto/sha256, by reverting this commit ChainSafe/persistent-merkle-tree@8460f03

lib	avg ms for mainnet/100000 (20 runs)
as-sha256 + concat	2862.8
bcrypto + concat	2114.85

dapplion · 2021-03-21T19:41:49Z

Benchmarked Lighthouse's hashing lib ring in my environment:

let start = ProcessTime::now();
for i in 0..1000000 {
    let res = hash32_concat(&hash_a, &hash_b);
}
let cpu_time: Duration = start.elapsed();
println!(" {:?}", cpu_time);

Results compared to JS to concat 2 32bytes + hash:

Rust ring: 2.41 sec for 1000000 hashes: 2.41 μs / hash32_concat.
JS as-sha256: 2.47 sec for 1000000 hashes: 2.47 μs / hash32_concat.

So the concat + hash power is the same.

I've counted the num of hashes computed in each node to get the root of genesis-mainnet-100000 (15587377 bytes)

Lodestar: 1009104 calls to persistent-merkle-tree.hash()
Lighthouse: 1309081 calls to hash() (400000) and hash32_concat() (909081)

In Lodestar case the numbers are consistent with the hashing power:

JS as-sha256: 2.77 μs / hash (genesis-mainnet-100000: 1009104 hashes, 2800ms)
JS bcrypto: 2.08 μs / hash (genesis-mainnet-100000: 1009104 hashes, 2100ms)

Then, how can Lighthouse compute the root of the state x5 times faster if the hashing function is not faster, and appears to compute a similar number of hashes? Note that my knowledge of the inner workings of Lighthouse's libs is limited, so I must be missing something. However:

I've added some println!() in Lighthouse and they have 0 cache hits in their cache arena.
I've run all tests and benchmarks with taskset -c 0 <command> to force single core

stale · 2021-06-02T17:18:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

dapplion · 2021-06-03T07:49:38Z

Assigning to me so it doesn't get closed. I stopped the tests on the previous comment, if anyone wants to continue investigating, go for it.

dapplion · 2023-05-26T15:52:10Z

Closing since this issue has no actionable items

dapplion mentioned this issue Mar 21, 2021

Investigate inconsistent performance of hashTreeRoot() ChainSafe/ssz#78

Open

stale bot added the bot:stale label Jun 2, 2021

dapplion self-assigned this Jun 3, 2021

stale bot removed the bot:stale label Jun 3, 2021

dapplion added the prio-low This is nice to have. label Jun 3, 2021

dapplion added the scope-performance Performance issue and ideas to improve performance. label Jun 11, 2021

This was referenced Nov 23, 2021

using Vector capabilities of the CPU for sha256 in ssz merkelization of lists ChainSafe/ssz#213

Open

Performance Optimizations Tracker #3466

Closed

dapplion removed their assignment May 12, 2022

dapplion closed this as completed May 26, 2023

wemeetagain mentioned this issue Mar 20, 2024

Investigating Tree Hashing in Lodestar ChainSafe/ssz#355

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lodestar tree hashing performance #2206

Lodestar tree hashing performance #2206

dapplion commented Mar 19, 2021 •

edited

Loading

dapplion commented Mar 19, 2021 •

edited

Loading

dapplion commented Mar 21, 2021 •

edited

Loading

dapplion commented Mar 21, 2021 •

edited

Loading

dapplion commented Mar 21, 2021 •

edited

Loading

stale bot commented Jun 2, 2021

dapplion commented Jun 3, 2021

dapplion commented May 26, 2023

Lodestar tree hashing performance #2206

Lodestar tree hashing performance #2206

Comments

dapplion commented Mar 19, 2021 • edited Loading

dapplion commented Mar 19, 2021 • edited Loading

dapplion commented Mar 21, 2021 • edited Loading

dapplion commented Mar 21, 2021 • edited Loading

dapplion commented Mar 21, 2021 • edited Loading

stale bot commented Jun 2, 2021

dapplion commented Jun 3, 2021

dapplion commented May 26, 2023

dapplion commented Mar 19, 2021 •

edited

Loading

dapplion commented Mar 19, 2021 •

edited

Loading

dapplion commented Mar 21, 2021 •

edited

Loading

dapplion commented Mar 21, 2021 •

edited

Loading

dapplion commented Mar 21, 2021 •

edited

Loading