-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a more efficient hash function in the Merkle tree #4
Comments
This reminds me (Pedersen commitments) of Waters Signature and Smooth Hash Functions from "Round-Optimal Privacy-Preserving Protocols with Smooth Projective Hash Functions" by Blazy et al. [1]. [1] https://link.springer.com/content/pdf/10.1007/978-3-642-28914-9_6.pdf p.s. in "Short Blind Signatures" by Blazy et al. [3], the authors discuss about a generalization of Waters signatures and group hash function with input from a Field Zp and not from GF(2). [3] https://pdfs.semanticscholar.org/4d1e/34e9d2da9a334e3986c69ef217c2f20a93e9.pdf |
Ethsnarks has implemented two relevant hash functions suitable for SNARK circuits: |
Note: It could also be worth looking at accumulators and proofs of inclusion. |
There are a handful of problems with accumulators (that aren't merkle trees):
There is an open problem though, related to RSA accumulators inside a SNARK - when you don't need to do primality tests inside the SNARK - how do you efficiently perform operations like exponentiation over a field a 2048bit RSA modulus? Say we have See: |
Agreed, but have you looked at ECC based accumulator (so far I have only found one [1]) ? If there was more research in this area it could be interesting to see how well this can be used in conjunction to Pedersen hashes. [1] "Distillation Codes and Ap-plications to DoS Resistant Multicast Authentication" : http://people.eecs.berkeley.edu/~tygar/papers/Distillation_code.pdf |
Could you elaborate on how error-correction codes could be used as an accumulator? The paper you linked seems to use a Merkle-tree, a UOWHF and RSA signatures as part of the construct, rather than using error-correction codes as a type of hash. A good candidate for a UOWHF is something abstractly similar to Poy1305, which requires one constraint per message, however it is unsuitable for many uses (such as general purpose cryptographic hashing) because it doesn't offer pre-image collision resistance - but as a MAC or data integrity tag where the key is known by only the sender and receiver it can be used securely as it (and pretty much all other UOWHFs) only offer target collision resistance. One question I'm still pondering is whether you could use Poly1305 together with Merkle-Damgard, Davies-Meyers or Shoup constructions to gain preimage resistance and other desirable properties which would be suitable for a Merkle-tree, and would result in a very low number of constraints (4-5 per element being hashed, rather than ~200. Currently, MiMC with exponent of 7 and 46 rounds requires ~190 constraints per element being hashed, and Pedersen Hash requires many many more (but is still comparatively very cheap compared to SHA256 and Blake). Reducing that to 5-10 per element would result in sub-second proving time for many simple circuits that do 50-100 hashes (which is awesome). See:
|
My bad, I meant elliptic curve based accumulators (articulating them with Pedersen hashes should make more sense). This paper is cited quite a few times about its EC accumulator but there isn't much at all on it actually. |
All of the elliptic curve based accumulators I've seen seem to rely on bilinear pairings and have a trusted setup phase. Bilinear pairings are not very efficient within a SNARK (although Daira, O1-labs etc. are working on that) and require the construction of a pairing-friendly curve which can operate within a zkSNARK circuit. The only options so far seem to be the MNT4/MNT6 curve cycles, which are not compatible with Ethereum, if ditching Ethereum compatibility is an option you can switch over to a different curve, then you can do pairings inside a SNARK circuit. However, it will be more expensive (in terms of the number of constraints) to prove membership of an item in the accumulator when using one of the bilinear pairings accumulator schemes, especially compared to MiMC and the ZCash Pedersen Hash scheme. Accumulators using bilinear pairings:
|
Thks for the list, I went through it and looked at the RSA and others. I hoped there was more papers in the domain but it does not appear very popular. I have found some description of the EC-based (non bilinear pairing) accumulator; it is rather simple but not useful as it is a static accumulator (c.f. |
I like that paper, it starts off very specific and thorough with the initial definitions, but then gets very very wishy washy. However, the reference for
I assume they mean that a
They specify no mechanism for their elliptic curve based accumulator without bilinear pairings, but go into enough detail about RSA and symmetric accumulators (aka Bloom filters) for me to understand what they mean, and the only paper they referenced regarding elliptic curve based accumulators does not use elliptic curves in any way... nor is there any code to back up their benchmarks (lol, academia...) My understanding is that outside of Bloom filters and Merkle trees, there is no trapdoor-free accumulator which is both space efficient and has a sub There are interesting properties for elliptic curve group signatures though, which could possibly be abused to prove membership of a single bit with a constant sized witness of one or two group elements without using bilinear pairings... From which you could do something like a bloom filter. I will ponder on that one (check the MuSig paper, but it wouldn't be a dynamic accumulator...) |
I completely agree, I spent quite some time looking at other citations / full papers... but couldn't find the actual paper.
Academia is not that bad... Code-wise, it is getting better (technically I'm still currently academia).
Agreed. But that is not a surprise, for soundness you need a linear cost (in the number commits) somewhere. I don't see that as much as a problem as for Merkle tree you do have to store the hashes of all commitments. If we can have better proof (computation wise for snakes) versus hopefully not too big overhead in space that could be acceptable.
I was thinking about that yesterday night (if we could reuse standard group signature membership process as a simple accumulator). We would need a partially dynamic group signature for our purpose though (addition but no revocation of group members) (https://eprint.iacr.org/2016/368.pdf)
What is that paper ? |
MuSig:
Some papers referenced by 2016/368:
TBH I think the only options are:
The best at the moment is a Merkle Tree with MiMC, followed by Pedersen Hash. Although Pedersen Hash cannot be efficiently computed on EVM, which leaves MiMC (~20k gas, ~200 constraints) and SHA256 (~10 gas, ~27k constraints) |
Hi @HarryR thanks for your insights, really appreciated. Where I can find more info about 20k gas cost of MiMc you mentioned? Thanks. |
Hi Harry @riemann89 and I had a look at your implementation and we are not sure to understand your algorithm. We cannot see any capacity, rates and squeezing phases. Could you explain us the logic of your hash more in detail? :) |
Hi @rrtoledo This uses the Myaguchi-Preneel one-way compression function construction. This ticket has relevant details on decision, tradeoffs etc. HarryR/ethsnarks#119 We weren't able to use the sponge construction, because without being able to choose the prime for the field the rate required to have a good capacity would be far too low, or would require 10x-100x more constraints to achieve an acceptable security level (which, would be a bit pointless no?) |
There are two other candidates for efficient ciphers / hash functions which are worth investigating: I am interested in how fast they would be on EVM, alternatives using the SIS problem and lattice cryptography could make an interesting homomorphic primitive that can be used as a CRHF: http://www.wisdom.weizmann.ac.il/~oded/COL/cfh.pdf While the number of constraints is low (just |
I haven't read yet Vision and co. We'll take some time next week and come back to you. :) As for Poseidon and co., it does look good indeed. Using affine functions for the permutation should really reduce the gas cost. What bothers me however is that the Hades strategy paper is not available, I'd like to read it to understand better the security guarantees. |
I am not sure to understand well why you say we cannot choose the prime. |
With zkSNARKs on Ethereum we're stuck with a specific prime, not just any prime that fits into a 256bit word. That is, unless EIP-1829 or EIP-1962 get included. The prime we're stuck with is about ~253 bits. (the curve's scalar field). So, with 4 absorption rounds and 4 rounds of squeezing, with an extra 2 That's still about 6x better than SHA256 on a pure constraints-vs-constraints, but the constraints for MiMC are more complex (there's not a linear constraints vs CPU time). And it's still 2.5x worse than the Pedersen hash scheme that ZCash uses. Oh, and on EVM that's going to cost you a lot more than 20k gas, although it may still be cheaper than Pedersen hash. The goal, at the time, was to find a scheme which can be used a CRHF for a Merkle-tree, which avoids |
@HarryR As you have worked quite extensively on MiMC I wonder if you can answer a few questions I have on my mind. When talking about mimc_hash it seems you are refering to an iteration of MP compression function (over mimc encryption scheme). Why don't you call this Merkle Damgard ? Is it because you do not do the padding or finalization phase ? In that case, don't we loose some security (the main attack I can see (length extension) should not apply in our case study) ? I was wondering in the case of using MiMC Miyaguchi-Prenel as a PRF. Specifically, in ZCash the tags |
Yes, there is no padding or finalisation to prevent length extension attacks. You would need to wrap your input string with some kind of padding and length prefix / finalisation to gain that property. I figured that using different IVs for every instance of the hash function is sufficient to prevent those kinds of attacks because each individual use has its own domain and a fixed number of inputs. This also means there's less overhead in terms of the number of constraints.
Yes, using a different set of round constants will create a separate domain for the hash. This would be equivalent to changing the seed used for the round constants. Using a different IV also creates a different-ish domain, but in a weaker sense. (well, it's a different starting point, from within the same domain, so the work to interpolate one domain can be re-used with different IVs, but with different round constants you need to perform the work for each different set of constants) |
Closing this issues as it has been merged to develop |
See this interesting thread: zcash/zcash#2234
See MiMC: https://eprint.iacr.org/2016/492.pdf
The text was updated successfully, but these errors were encountered: