Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lodestar tree hashing performance #2206

Closed
dapplion opened this issue Mar 19, 2021 · 7 comments
Closed

Lodestar tree hashing performance #2206

dapplion opened this issue Mar 19, 2021 · 7 comments
Labels
prio-low This is nice to have. scope-performance Performance issue and ideas to improve performance.

Comments

@dapplion
Copy link
Contributor

dapplion commented Mar 19, 2021

Purpose: profiling our state transition function computing the hashTreeRoot of state and blocks has shown to take significant chunk of total time.

I've compared a benchmark that Lighthouse has on their tree hash lib with our tree backed types. There seems to be some room for improvement on our side. I'm trying to get a benchmark of their hashing lib vs our to help narrow down where is the performance difference coming from.

Forced single core (ms)

test LH (no_cache) LH (empty_cache) Lodestar
minimal/100000 511.34 517.57 3144.3 (x6.1)
minimal/300000 1441.1 1550.1 8547.3 (x5.5)
mainnet/100000 574.10 524.31 2746.4 (x5.2)
mainnet/300000 1498.5 1529.3 8815.0 (x5.7)

Multi-threading (8 cores) (ms)

test LH (no_cache) LH (empty_cache) Lodestar
minimal/100000 531.99 227.35 3144.3
minimal/300000 1303.2 675.44 8547.3
mainnet/100000 479.16 267.50 2746.4
mainnet/300000 1417.3 711.00 8815.0

Note on profiling: Doing a profile of hashing is not useful because the resolution is too low. By default the sampling is 1ms, while a fast sha256 implementation can do 1000 hashes in 1ms. I've tried setting --cpu-prof interval 1 to increase the sampling frequency x1000 but seems to have no effect on the resulting profile.

@dapplion
Copy link
Contributor Author

dapplion commented Mar 19, 2021

To profile our hashTreeRoot I've modified persistent-merkle-tree/hash.ts:4 in three ways:

id code avg ms for mainnet/100000 (20 runs)
original return SHA256.digest(Buffer.concat([a, b])); 2922.8
no concat return SHA256.digest(precomputed64b); 2615.6
no hash return a; 34.55

@dapplion
Copy link
Contributor Author

dapplion commented Mar 21, 2021

After more tests with different approaches seems that Buffer.concat is that fastest way to go. Alternatives:

  • Writing roots a, b directly to the wasm memory buffer.

Has almost same performance if not a bit worse, +5%.
I have not tested memory efficiency between both approaches

updateRoot(a, b) {
    const INPUT_LENGTH = this.ctx.INPUT_LENGTH;
    const input = new Uint8Array(this.ctx.memory.buffer, this.ctx.input.value, INPUT_LENGTH);
    input.set(a);
    input.set(b, 32);
    this.ctx.update(this.ctx.input.value, 64);
    return this;
}

@dapplion
Copy link
Contributor Author

dapplion commented Mar 21, 2021

Just as curiosity I compared as-sha256 with bcrypto/sha256, by reverting this commit ChainSafe/persistent-merkle-tree@8460f03

lib avg ms for mainnet/100000 (20 runs)
as-sha256 + concat 2862.8
bcrypto + concat 2114.85

@dapplion
Copy link
Contributor Author

dapplion commented Mar 21, 2021

Benchmarked Lighthouse's hashing lib ring in my environment:

let start = ProcessTime::now();
for i in 0..1000000 {
    let res = hash32_concat(&hash_a, &hash_b);
}
let cpu_time: Duration = start.elapsed();
println!(" {:?}", cpu_time);

Results compared to JS to concat 2 32bytes + hash:

  • Rust ring: 2.41 sec for 1000000 hashes: 2.41 μs / hash32_concat.
  • JS as-sha256: 2.47 sec for 1000000 hashes: 2.47 μs / hash32_concat.

So the concat + hash power is the same.

I've counted the num of hashes computed in each node to get the root of genesis-mainnet-100000 (15587377 bytes)

  • Lodestar: 1009104 calls to persistent-merkle-tree.hash()
  • Lighthouse: 1309081 calls to hash() (400000) and hash32_concat() (909081)

In Lodestar case the numbers are consistent with the hashing power:

  • JS as-sha256: 2.77 μs / hash (genesis-mainnet-100000: 1009104 hashes, 2800ms)
  • JS bcrypto: 2.08 μs / hash (genesis-mainnet-100000: 1009104 hashes, 2100ms)

Then, how can Lighthouse compute the root of the state x5 times faster if the hashing function is not faster, and appears to compute a similar number of hashes? Note that my knowledge of the inner workings of Lighthouse's libs is limited, so I must be missing something. However:

  • I've added some println!() in Lighthouse and they have 0 cache hits in their cache arena.
  • I've run all tests and benchmarks with taskset -c 0 <command> to force single core

@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the bot:stale label Jun 2, 2021
@dapplion dapplion self-assigned this Jun 3, 2021
@stale stale bot removed the bot:stale label Jun 3, 2021
@dapplion
Copy link
Contributor Author

dapplion commented Jun 3, 2021

Assigning to me so it doesn't get closed. I stopped the tests on the previous comment, if anyone wants to continue investigating, go for it.

@dapplion dapplion added the prio-low This is nice to have. label Jun 3, 2021
@dapplion dapplion added the scope-performance Performance issue and ideas to improve performance. label Jun 11, 2021
@dapplion dapplion removed their assignment May 12, 2022
@dapplion
Copy link
Contributor Author

Closing since this issue has no actionable items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-low This is nice to have. scope-performance Performance issue and ideas to improve performance.
Projects
None yet
Development

No branches or pull requests

1 participant