Use a more efficient hash function in the Merkle tree #4

AntoineRondelet · 2019-03-28T18:34:41Z

See this interesting thread: zcash/zcash#2234
See MiMC: https://eprint.iacr.org/2016/492.pdf

rrtoledo · 2019-04-10T15:53:52Z

This reminds me (Pedersen commitments) of Waters Signature and Smooth Hash Functions from "Round-Optimal Privacy-Preserving Protocols with Smooth Projective Hash Functions" by Blazy et al. [1].
More interestingly, this paper is based on "Programmable Hash Functions and Their Applications" by Hofheinz et al. [2] which introduces the notion of "programmability" which helps at determining the security of a group hash function, i.e. the collision resistance.

[1] https://link.springer.com/content/pdf/10.1007/978-3-642-28914-9_6.pdf
[2] https://link.springer.com/content/pdf/10.1007/978-3-540-85174-5_2.pdf

p.s. in "Short Blind Signatures" by Blazy et al. [3], the authors discuss about a generalization of Waters signatures and group hash function with input from a Field Zp and not from GF(2).

[3] https://pdfs.semanticscholar.org/4d1e/34e9d2da9a334e3986c69ef217c2f20a93e9.pdf

HarryR · 2019-04-13T19:17:07Z

Ethsnarks has implemented two relevant hash functions suitable for SNARK circuits:

rrtoledo · 2019-04-15T15:31:25Z

Note: It could also be worth looking at accumulators and proofs of inclusion.

HarryR · 2019-04-15T16:42:52Z

There are a handful of problems with accumulators (that aren't merkle trees):

For RSA accumulators this requires you to do large field arithmetic inside the SNARK circuit
For accumulators using bilinear pairings you need a pairing-friendly curve inside the SNARK circuit
Trusted setup for RSA accumulators
Primality tests inside a SNARK circuit are very expensive making RSA accumulators infeasible impossible in practice
There must be easier methods to hash data than using an accumulator

There is an open problem though, related to RSA accumulators inside a SNARK - when you don't need to do primality tests inside the SNARK - how do you efficiently perform operations like exponentiation over a field a 2048bit RSA modulus?

Say we have F_p which is the SNARK scalar field, we then need a way of representing elements within a field of a known RSA modulus as polynomials within F_{p^k} of degree k - 1.

See:

rrtoledo · 2019-04-16T10:36:57Z

Agreed, but have you looked at ECC based accumulator (so far I have only found one [1]) ?

If there was more research in this area it could be interesting to see how well this can be used in conjunction to Pedersen hashes.

[1] "Distillation Codes and Ap-plications to DoS Resistant Multicast Authentication" : http://people.eecs.berkeley.edu/~tygar/papers/Distillation_code.pdf

HarryR · 2019-04-16T11:46:50Z

Could you elaborate on how error-correction codes could be used as an accumulator? The paper you linked seems to use a Merkle-tree, a UOWHF and RSA signatures as part of the construct, rather than using error-correction codes as a type of hash.

A good candidate for a UOWHF is something abstractly similar to Poy1305, which requires one constraint per message, however it is unsuitable for many uses (such as general purpose cryptographic hashing) because it doesn't offer pre-image collision resistance - but as a MAC or data integrity tag where the key is known by only the sender and receiver it can be used securely as it (and pretty much all other UOWHFs) only offer target collision resistance.

One question I'm still pondering is whether you could use Poly1305 together with Merkle-Damgard, Davies-Meyers or Shoup constructions to gain preimage resistance and other desirable properties which would be suitable for a Merkle-tree, and would result in a very low number of constraints (4-5 per element being hashed, rather than ~200.

Currently, MiMC with exponent of 7 and 46 rounds requires ~190 constraints per element being hashed, and Pedersen Hash requires many many more (but is still comparatively very cheap compared to SHA256 and Blake). Reducing that to 5-10 per element would result in sub-second proving time for many simple circuits that do 50-100 hashes (which is awesome).

See:

Hash Functions: From Merkle-Damgard to Shoup which analyses the Shoup construct using UOWHFs

rrtoledo · 2019-04-16T12:07:06Z

My bad, I meant elliptic curve based accumulators (articulating them with Pedersen hashes should make more sense). This paper is cited quite a few times about its EC accumulator but there isn't much at all on it actually.

HarryR · 2019-04-16T12:25:47Z

All of the elliptic curve based accumulators I've seen seem to rely on bilinear pairings and have a trusted setup phase. Bilinear pairings are not very efficient within a SNARK (although Daira, O1-labs etc. are working on that) and require the construction of a pairing-friendly curve which can operate within a zkSNARK circuit.

The only options so far seem to be the MNT4/MNT6 curve cycles, which are not compatible with Ethereum, if ditching Ethereum compatibility is an option you can switch over to a different curve, then you can do pairings inside a SNARK circuit.

However, it will be more expensive (in terms of the number of constraints) to prove membership of an item in the accumulator when using one of the bilinear pairings accumulator schemes, especially compared to MiMC and the ZCash Pedersen Hash scheme.

Accumulators using bilinear pairings:

rrtoledo · 2019-04-17T09:42:13Z

Thks for the list, I went through it and looked at the RSA and others. I hoped there was more papers in the domain but it does not appear very popular.

I have found some description of the EC-based (non bilinear pairing) accumulator; it is rather simple but not useful as it is a static accumulator (c.f.
Performances of Cryptographic Accumulators):
"Karlof et al.in [16] use elliptic curve to build an accumulator. To accumulate the values (scalars) they are multiplied with the public-key (i.e.a scalar times
Performances of Cryptographic Accumulators7the base point of the curve). Witness generation follows the same algorithm but excludes the corresponding value. Verification is simple and involves checking for equality if the multiplication of the witness and the value equals the accumulated value."

HarryR · 2019-04-17T10:18:06Z

I like that paper, it starts off very specific and thorough with the initial definitions, but then gets very very wishy washy.

However, the reference for Karlof et al, in [16] use elliptic curve to build an accumulator... is wrong, nowhere in the Distillation Codes and Applications to DoS Resistant Multicast Authentication paper do they use elliptic curves.

§4.1.1 As discussed by Bari´c and Pfitzmann [1] a stronger notion of security called
collision-freeness is achieved when the values added to an accumulator are primes
or prime representatives of non-prime values.

I assume they mean that a HashToPrime function is used to map non-prime values to distinct items which can be accumulated.

§4.1.2

They specify no mechanism for their elliptic curve based accumulator without bilinear pairings, but go into enough detail about RSA and symmetric accumulators (aka Bloom filters) for me to understand what they mean, and the only paper they referenced regarding elliptic curve based accumulators does not use elliptic curves in any way... nor is there any code to back up their benchmarks (lol, academia...)

My understanding is that outside of Bloom filters and Merkle trees, there is no trapdoor-free accumulator which is both space efficient and has a sub log(n) witness size.

There are interesting properties for elliptic curve group signatures though, which could possibly be abused to prove membership of a single bit with a constant sized witness of one or two group elements without using bilinear pairings... From which you could do something like a bloom filter. I will ponder on that one (check the MuSig paper, but it wouldn't be a dynamic accumulator...)

rrtoledo · 2019-04-17T11:14:31Z

However, the reference for Karlof et al, in [16] use elliptic curve to build an accumulator... is wrong, nowhere in the Distillation Codes and Applications to DoS Resistant Multicast Authentication paper do they use elliptic curves.

I completely agree, I spent quite some time looking at other citations / full papers... but couldn't find the actual paper.

nor is there any code to back up their benchmarks (lol, academia...)

Academia is not that bad... Code-wise, it is getting better (technically I'm still currently academia).

My understanding is that outside of Bloom filters and Merkle trees, there is no trapdoor-free accumulator which is both space efficient and has a sub log(n) witness size.

Agreed. But that is not a surprise, for soundness you need a linear cost (in the number commits) somewhere. I don't see that as much as a problem as for Merkle tree you do have to store the hashes of all commitments. If we can have better proof (computation wise for snakes) versus hopefully not too big overhead in space that could be acceptable.

There are interesting properties for elliptic curve group signatures though [...]

I was thinking about that yesterday night (if we could reuse standard group signature membership process as a simple accumulator). We would need a partially dynamic group signature for our purpose though (addition but no revocation of group members) (https://eprint.iacr.org/2016/368.pdf)

(check the MuSig paper, but it wouldn't be a dynamic accumulator...)

What is that paper ?

HarryR · 2019-04-17T21:10:45Z

MuSig:

Some papers referenced by 2016/368:

BSZ05; Foundations of Group Signatures: The Case of Dynamic Groups - joining the group is an interactive process, it requires an initial trusted setup and semi-trusted interactive group manager. Opening the group signatures requires access to the public keys of every member of the group - so... doesn't really work as an accumulator with fast / lightweight verification.
Gro06; Simulation-Sound NIZK Proofs for a Practical Language and Constant Size Group Signatures - Uses bilinear pairings
BW06; Compact Group Signatures Without Random Oracles - Uses bilinear pairings

TBH I think the only options are:

Bloom filter: not succinct or secure, requires pushing around hundreds of megabytes to gigabytes of data, easy to brute force
RSA Accumulator: can be done inside a zkSNARK with ~100k constraints using xjsnark, can use the RSA factoring challenge numbers to make trusted setup easier
Bilinear Accumulator: not compatible with Ethereum without great expense (millions of constraints, unless you have a cycle of curves like MNT4/MNT6)
Merkle tree: trustless, very cheap hash functions (MiMC, Pedersen Hash) allow for ~1-2 second proving time with a depth of 30.

The best at the moment is a Merkle Tree with MiMC, followed by Pedersen Hash. Although Pedersen Hash cannot be efficiently computed on EVM, which leaves MiMC (~20k gas, ~200 constraints) and SHA256 (~10 gas, ~27k constraints)

riemann89 · 2019-05-13T09:08:58Z

Hi @HarryR thanks for your insights, really appreciated. Where I can find more info about 20k gas cost of MiMc you mentioned? Thanks.

HarryR · 2019-05-15T00:24:12Z

@riemann89 see https://github.com/HarryR/ethsnarks/blob/master/contracts/MiMC.sol

rrtoledo · 2019-05-17T12:32:39Z

Hi Harry @riemann89 and I had a look at your implementation and we are not sure to understand your algorithm. We cannot see any capacity, rates and squeezing phases. Could you explain us the logic of your hash more in detail? :)

HarryR · 2019-05-17T14:34:08Z

Hi @rrtoledo

This uses the Myaguchi-Preneel one-way compression function construction.

This ticket has relevant details on decision, tradeoffs etc. HarryR/ethsnarks#119

We weren't able to use the sponge construction, because without being able to choose the prime for the field the rate required to have a good capacity would be far too low, or would require 10x-100x more constraints to achieve an acceptable security level (which, would be a bit pointless no?)

HarryR · 2019-05-17T14:39:12Z

There are two other candidates for efficient ciphers / hash functions which are worth investigating:

I am interested in how fast they would be on EVM, alternatives using the SIS problem and lattice cryptography could make an interesting homomorphic primitive that can be used as a CRHF: http://www.wisdom.weizmann.ac.il/~oded/COL/cfh.pdf

While the number of constraints is low (just n polynomial evaluations), its not practical on EVM and is very memory intensive etc, e.g. m would be ~30k and nm 32 byte field elements is quite huuge. But you can use this as a basis for sigma protocols too rather than just hashing, e.g. Silur/SIS-gadget#2

rrtoledo · 2019-05-17T14:49:54Z

I haven't read yet Vision and co. We'll take some time next week and come back to you. :)

As for Poseidon and co., it does look good indeed. Using affine functions for the permutation should really reduce the gas cost. What bothers me however is that the Hades strategy paper is not available, I'd like to read it to understand better the security guarantees.

rrtoledo · 2019-05-17T15:02:01Z

We weren't able to use the sponge construction, because without being able to choose the prime for the field the rate required to have a good capacity would be far too low, or would require 10x-100x more constraints to achieve an acceptable security level (which, would be a bit pointless no?)

I am not sure to understand well why you say we cannot choose the prime.
I thought that we could choose any prime, preferably of size lower than 256 bits to work well with ethereum. Then it's rather straightforward: we want a security s=128, this gives us a t=64, hence n= 257, r=128 and c=129 (if using MiMCHASHn/n). If we want to hash 512 bits, we would need 4 absorption rounds and 256/64 = 4 rounds of squeezing to get a hash of 256 bits.

HarryR · 2019-05-17T16:16:47Z

With zkSNARKs on Ethereum we're stuck with a specific prime, not just any prime that fits into a 256bit word. That is, unless EIP-1829 or EIP-1962 get included. The prime we're stuck with is about ~253 bits. (the curve's scalar field).

So, with 4 absorption rounds and 4 rounds of squeezing, with an extra 2 field->bits decompositions on the inputs, 4 bits->field recomposition on absorption, then an additional 4 field->bits decompositions on squeezing and one bits->field recomposition for the resultant field element. Each field->bits requires 254 constraints. So... if MiMC with an exponent of 7 and 91 rounds requires ~365 constraints. To hash two field elements to one field element, 8 rounds of MiMC (2920 constraints), ~512 constraints for input->bits, ~1024 constraints for outputs->bits, is about 4500 constraints.

That's still about 6x better than SHA256 on a pure constraints-vs-constraints, but the constraints for MiMC are more complex (there's not a linear constraints vs CPU time). And it's still 2.5x worse than the Pedersen hash scheme that ZCash uses. Oh, and on EVM that's going to cost you a lot more than 20k gas, although it may still be cheaper than Pedersen hash.

The goal, at the time, was to find a scheme which can be used a CRHF for a Merkle-tree, which avoids field<->bit decompositions, has fewer constraints & faster proving time than ZCash's Pedersen hash, and can compute a full ~30 depth Merkle-tree path on EVM in about 0.5m gas. MiMC+OWF seemed to hit all those goals.

rrtoledo · 2019-07-03T10:34:45Z

@HarryR As you have worked quite extensively on MiMC I wonder if you can answer a few questions I have on my mind.

When talking about mimc_hash it seems you are refering to an iteration of MP compression function (over mimc encryption scheme). Why don't you call this Merkle Damgard ? Is it because you do not do the padding or finalization phase ? In that case, don't we loose some security (the main attack I can see (length extension) should not apply in our case study) ?

I was wondering in the case of using MiMC Miyaguchi-Prenel as a PRF. Specifically, in ZCash the tags 0x00, 0x01... are used to create non overlapping PRFs. We were wondering if we could get the same (non overlapping) property by using different ivs for the round constants (c_o = sha3(iv), c_i = sha3(c_{i-1}) ) of MiMC underlying block cipher. We'll trying to formalize this, do you see any security problems with that?
I was wondering about it as I am not sure about the PRFness of the MiMC Miyaguchi-Prenel Merkle-Damgard construction. I believe MD has its own problems and proving that MiMC MP MD is a PRF is not that straightforward (or requires the padding / finalization).

HarryR · 2019-07-03T11:19:54Z

Why don't you call this Merkle Damgard ? Is it because you do not do the padding or finalization phase ?

Yes, there is no padding or finalisation to prevent length extension attacks. You would need to wrap your input string with some kind of padding and length prefix / finalisation to gain that property.

I figured that using different IVs for every instance of the hash function is sufficient to prevent those kinds of attacks because each individual use has its own domain and a fixed number of inputs. This also means there's less overhead in terms of the number of constraints.

Specifically, in ZCash the tags 0x00, 0x01... are used to create non overlapping PRFs. We were wondering if we could get the same (non overlapping) property by using different ivs for the round constants (c_o = sha3(iv), c_i = sha3(c_{i-1}) )

Yes, using a different set of round constants will create a separate domain for the hash. This would be equivalent to changing the seed used for the round constants.

Using a different IV also creates a different-ish domain, but in a weaker sense. (well, it's a different starting point, from within the same domain, so the work to interpolate one domain can be re-used with different IVs, but with different round constants you need to perform the work for each different set of constants)

AntoineRondelet · 2019-08-08T17:19:59Z

Closing this issues as it has been merged to develop

AntoineRondelet added enhancement New feature or request optimization Optimization task zk-SNARK arithmetic-circuit/R1CS Task related to the R1CS programs labels Mar 28, 2019

AntoineRondelet assigned rrtoledo and riemann89 Apr 25, 2019

riemann89 mentioned this issue Jun 27, 2019

Mimc integration #31

Merged

rrtoledo mentioned this issue Aug 2, 2019

Mimc tree #34

Merged

AntoineRondelet closed this as completed Aug 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a more efficient hash function in the Merkle tree #4

Use a more efficient hash function in the Merkle tree #4

AntoineRondelet commented Mar 28, 2019 •

edited

Loading

rrtoledo commented Apr 10, 2019 •

edited

Loading

HarryR commented Apr 13, 2019

rrtoledo commented Apr 15, 2019

HarryR commented Apr 15, 2019 •

edited

Loading

rrtoledo commented Apr 16, 2019 •

edited

Loading

HarryR commented Apr 16, 2019 •

edited

Loading

rrtoledo commented Apr 16, 2019

HarryR commented Apr 16, 2019 •

edited

Loading

rrtoledo commented Apr 17, 2019

HarryR commented Apr 17, 2019 •

edited

Loading

rrtoledo commented Apr 17, 2019

HarryR commented Apr 17, 2019

riemann89 commented May 13, 2019

HarryR commented May 15, 2019

rrtoledo commented May 17, 2019 •

edited by riemann89

Loading

HarryR commented May 17, 2019

HarryR commented May 17, 2019 •

edited

Loading

rrtoledo commented May 17, 2019

rrtoledo commented May 17, 2019 •

edited

Loading

HarryR commented May 17, 2019

rrtoledo commented Jul 3, 2019

HarryR commented Jul 3, 2019 •

edited

Loading

AntoineRondelet commented Aug 8, 2019

Use a more efficient hash function in the Merkle tree #4

Use a more efficient hash function in the Merkle tree #4

Comments

AntoineRondelet commented Mar 28, 2019 • edited Loading

rrtoledo commented Apr 10, 2019 • edited Loading

HarryR commented Apr 13, 2019

rrtoledo commented Apr 15, 2019

HarryR commented Apr 15, 2019 • edited Loading

rrtoledo commented Apr 16, 2019 • edited Loading

HarryR commented Apr 16, 2019 • edited Loading

rrtoledo commented Apr 16, 2019

HarryR commented Apr 16, 2019 • edited Loading

rrtoledo commented Apr 17, 2019

HarryR commented Apr 17, 2019 • edited Loading

rrtoledo commented Apr 17, 2019

HarryR commented Apr 17, 2019

riemann89 commented May 13, 2019

HarryR commented May 15, 2019

rrtoledo commented May 17, 2019 • edited by riemann89 Loading

HarryR commented May 17, 2019

HarryR commented May 17, 2019 • edited Loading

rrtoledo commented May 17, 2019

rrtoledo commented May 17, 2019 • edited Loading

HarryR commented May 17, 2019

rrtoledo commented Jul 3, 2019

HarryR commented Jul 3, 2019 • edited Loading

AntoineRondelet commented Aug 8, 2019

AntoineRondelet commented Mar 28, 2019 •

edited

Loading

rrtoledo commented Apr 10, 2019 •

edited

Loading

HarryR commented Apr 15, 2019 •

edited

Loading

rrtoledo commented Apr 16, 2019 •

edited

Loading

HarryR commented Apr 16, 2019 •

edited

Loading

HarryR commented Apr 16, 2019 •

edited

Loading

HarryR commented Apr 17, 2019 •

edited

Loading

rrtoledo commented May 17, 2019 •

edited by riemann89

Loading

HarryR commented May 17, 2019 •

edited

Loading

rrtoledo commented May 17, 2019 •

edited

Loading

HarryR commented Jul 3, 2019 •

edited

Loading