Skip to content
This repository has been archived by the owner on May 18, 2018. It is now read-only.

Use tiny-keccak instead of wrapping the one-file C-lib #1

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dvdplm
Copy link
Contributor

@dvdplm dvdplm commented May 16, 2018

@debris the background here is that during work on extracting the patricia_trie crate and making it usable with different hashing algorithms, I realised that this crate was using both the in-repo one-file-C-lib (and unsafe code to call it) and your pure Rust impl.

Why not switch it over to use tiny-keccak all-together? Performance?

Benchmarks

Original code, implements keccak() in C:

running 3 tests
test bench_keccak_256_with_empty_input   ... bench:         440 ns/iter (+/- 34)
test bench_keccak_256_with_large_input   ... bench:      13,807 ns/iter (+/- 1,099) = 296 MB/s
test bench_keccak_256_with_typical_input ... bench:         455 ns/iter (+/- 48) = 112 MB/s

This branch, implements keccak() in Rust:

running 3 tests
test bench_keccak_256_with_empty_input   ... bench:         568 ns/iter (+/- 46)
test bench_keccak_256_with_large_input   ... bench:      17,258 ns/iter (+/- 900) = 237 MB/s
test bench_keccak_256_with_typical_input ... bench:         579 ns/iter (+/- 41) = 88 MB/s

@dvdplm dvdplm requested a review from debris May 16, 2018 13:11
@dvdplm dvdplm self-assigned this May 16, 2018
@dvdplm dvdplm added the enhancement New feature or request label May 16, 2018
@dvdplm dvdplm requested a review from rphmeier May 16, 2018 13:19
Copy link
Contributor

@debris debris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not switch it over to use tiny-keccak all-together? Performance?

Imo, we can switch to tiny-keccak as rust becomes faster with almost every single release. Yes, it was only because of the performance.

I'm just not sure about creating this repo to separate keccak-hash crate. It seems a bit redundant to me :p

@folsen
Copy link

folsen commented May 17, 2018

@debris @dvdplm the performance here still kinda matters though, looking at those benchmarks the C impl is like 20% faster, and keccak is everywhere. It's literally the foundation of every single thing done everywhere :P 20% is a big deal in that context.

@dvdplm
Copy link
Contributor Author

dvdplm commented May 17, 2018

I'm just not sure about creating this repo to separate keccak-hash crate. It seems a bit redundant to me :p

I agree that without the C shenanigans it's mostly a collection of utilities. Atm patricia_trie depends on it though, and if we're to extract that (so Polkadot can use it) it needs to be extracted too.

When that work is completed and patricia_trie is generic over the hashing function used, we can revisit this and – and here I'm speculating – move the generally useful pieces into tiny-keccak (perhaps only keccak_pipe()?) and the code that uses ethereum data types – H256 – back into parity itself (or wherever it fits).

So, just to make sure, can I merge this?

/cc @rphmeier

@dvdplm
Copy link
Contributor Author

dvdplm commented May 17, 2018

@folsen it is indeed used in many places but most of the time it seems like it's used in vicinity to db lookup code so it's quite possible that 480ns vs 520ns might not matter much. I'm happy to do some more sleuthing to prove/disprove that ofc (do you know of a good piece of code I should start looking at? where we're keccak-ing a lot?)

dvdplm added a commit to dvdplm/tiny-keccak that referenced this pull request May 17, 2018
Not unrolling the outer loop seems to speed up hashing quite significally:

Original (unrolled):
```
running 3 tests
test bench_keccak_256_with_empty_input   ... bench:         557 ns/iter (+/- 46)
test bench_keccak_256_with_large_input   ... bench:      17,288 ns/iter (+/- 1,871) = 236 MB/s
test bench_keccak_256_with_typical_input ... bench:         577 ns/iter (+/- 28) = 88 MB/s
```

This branch (not unrolled):
```
running 3 tests
test bench_keccak_256_with_empty_input   ... bench:         487 ns/iter (+/- 25)
test bench_keccak_256_with_large_input   ... bench:      14,645 ns/iter (+/- 675) = 279 MB/s
test bench_keccak_256_with_typical_input ... bench:         495 ns/iter (+/- 32) = 103 MB/s
```

"Inspired" by https://github.com/RustCrypto/sponges/blob/master/keccak/src/lib.rs#L138

Running benchmarks from the `keccak-hash` crate so we can compare to the numbers [here](paritytech/keccak-hash#1).
@folsen
Copy link

folsen commented May 17, 2018

@dvdplm keccak512 is used in ethash, so is called like a thousand times to validate a block, and block validation actually consumes a lot during sync. keccak256 also used in the EcRecover builtin so can potentially be used by any smart contract however many times they want.

The big thing I'd say is ethash though, show me a benchmark of running ethash 5 million times with the C impl vs the Rust impl and if it only adds a few minutes then I'd say it's acceptable ^^

@dvdplm
Copy link
Contributor Author

dvdplm commented May 17, 2018

show me a benchmark of running ethash 5 million times with the C impl vs the Rust impl

Here's what the benchmarks in ethash looks like – not precisely what you asked for but close!

Pure rust keccak (with this patch)

test benchmarks::bench_light_compute_memmap              ... bench:     979,648 ns/iter (+/- 221,733)
test benchmarks::bench_light_compute_memory              ... bench:   1,031,110 ns/iter (+/- 210,092)
test benchmarks::bench_light_from_file_round_trip_memmap ... bench:   7,024,897 ns/iter (+/- 1,029,950)
test benchmarks::bench_light_from_file_round_trip_memory ... bench:  10,312,023 ns/iter (+/- 626,333)

Original C-impl

test benchmarks::bench_light_compute_memmap              ... bench:     956,573 ns/iter (+/- 154,330)
test benchmarks::bench_light_compute_memory              ... bench:     951,320 ns/iter (+/- 122,300)
test benchmarks::bench_light_from_file_round_trip_memmap ... bench:   7,140,199 ns/iter (+/- 1,534,008)
test benchmarks::bench_light_from_file_round_trip_memory ... bench:  10,568,266 ns/iter (+/- 646,625)

The variability for both implementations is big, which is somewhat disturbing, but if the numbers measure anything useful then I'd say they perform very similarly.

As a sanity check I ran time cargo test --release on both versions and the numbers line up, with the C impl being slightly faster, real 0m3.187s vs rusts real 0m3.367s (best out of three runs).

@newpavlov
Copy link

Shameless plug: consider moving to sha3 crate from RustCrypto?

Performance will be the same (after tiny-keccak PR with unrolling changes lands that is), plus it will work correctly on BE architectures, and in future we plan to wrap assembly implementations from KeccakCodePackage to achieve the best performance.

@folsen
Copy link

folsen commented May 18, 2018

@dvdplm yeah, the difference in ethash really doesn't look that bad, you've convinced me :P
After this change we should do a simple regression test for performance anyway to make sure but probably it's fine.

…cked – mostly for compatibility with ethash
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants