Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Effectively affine precomp A for _ecmult without inversion! #171

Closed
wants to merge 2 commits into from

Conversation

peterdettman
Copy link
Contributor

  • Builds on top of Use Co-Z arithmetic for precomputations #41 (Co-Z arithmetic for precomputation).
  • Instead of inversion, scales all precomp A points to the same Z (usually != 1).
  • Store the precomp A as affine by discarding the z coordinate (whilst noting the single Z value for later), which is equivalent to taking them as affine points on an isogenous curve.
  • In secp256k1_ecmult, we treat the accumulator ("r") as operating on this isogenous curve.
  • Doubling formula needs no changes, adding of precomp A points is a simple _gej_add_ge_var.
  • G precomp remains the same, but each add of these requires an extra 1M to bring the accumulator temporarily back to the "default isogeny". For simplicity here I've modified _gej_add_ge_var to support this.
  • The z value of the final result needs to be scaled (1M) also.
  • Optimal at WINDOW_A=5.

Anyway, the upshot of this is an ~5% performance improvement for bench_verify (64bit, endo=yes), over and above the Co-Z precomputation itself, so >8% total vs master. Rough math suggests it's saving ~116M vs the Co-Z arithmetic PR, which appears to make any inversion approach obsolete.

Questions welcome, as I'm not sure how to explain this in a straight-forward way (as far as I know this is a novel idea). I guess it's important to understand why the isogeny works out so neatly; it's a rather nice property of secp256k1 that stems from it having a==0 in the curve equation; otherwise, operating on an isogenous curve would require changes to the doubling formula. I recommend e.g. http://joye.site88.net/papers/TJ10coordblind.pdf, where this is discussed in the context of blinding (which reminds me I was meaning to PR a demo of that), albeit the a==0 case is not explicitly called out. It may be easier just to think of it as playing games with the z values...

EDIT: I'll leave it as I wrote it, but I think the above should be talking about isomorphisms rather than isogenies.

@gmaxwell
Copy link
Contributor

It intutively makes sense to me. This is quite awesome. I'd already thought about using an isogenous projection for blinding (having seen it mentioned in a list of blinding techinues)-- but doing scalar blinding seems more interesting for most of our need for blinding, never would have thought of it for this.

Darn ECDSA: With schnorr this would make our verification completely inversion free.

@peterdettman
Copy link
Contributor Author

Perhaps a better reference: http://joye.site88.net/papers/JT01edpa.pdf

@gmaxwell Im still a bit giddy that this works so well. I had the z-ratio trick and the isomorphism stuff in mind for a while, and yesterday it just gelled for me. I wasn't quite sure how it would work out , but I coded it up and... 1M per G add? Awesome, indeed.

I actually now think it will work fine for a!=0 curves too, using "modified Jacobian" coordinates to avoid increasing the cost of doubling.

Why not both scalar and point blinding? Simple point blinding amounts to just an extra 2M per scalar mult, assuming the G precomp points have been moved to some random curve isomorphism in setup.

@gmaxwell
Copy link
Contributor

Why not both scalar and point blinding?

Not unresonable, just updating the point bilinding takes a fair amount more computation and we already have the unknown-discrete-log blinding. I'm all for implementing all reasonably simple, reasonably fast countermeasures for the secret data handling cases. Though actual immunity to things like high resolution power sidechannel analysis seem hopeless on general purposes computers, thats no reason to make an attacker's life easier.

@sipa
Copy link
Contributor

sipa commented Dec 23, 2014

This is just awesome. It seems we're around 8% slower than Ed25519 here with it (without using GLV, with gmp).

I'm going to play around with the math, try to see if I can derive that it's correct :)

@peterdettman
Copy link
Contributor Author

FYI, I managed to apply this same technique to a P-521 random-base-mult implementation (http://indigo.ie/~mscott/ws521.cpp), with a similar ~8% performance boost. That curve has a==-3, which has to be scaled for the isomorphism, so to retain doubling performance (3M+5S) the formula has to be changed to a modified-Jacobian scheme. The adds (104) then cost 7M+6S instead of 11M+5S. This saving is offset by (4M+S) per precomputed point (15) to make them "co-Z".

So we can tentatively conclude that this is applicable to any short Weierstrass form curve, with (only) a==0 being particularly efficient. I don't know enough about other curve forms to comment yet.

The question arises whether the scheme can still be used for general (aP + bQ + cR + ...), and indeed I see no fundamental obstacle, just an extra few steps involved in the precomputation to arrange for all precomp-tables to be globally co-Z.

- Selected Co-Z formulas from "Scalar Multiplication on Weierstraß Elliptic Curves from Co-Z Arithmetic" (Goundar, Joye, et. al.) added as group methods with new type sep256k1_coz_t.
- Co-Z methods used for A and G point precomputations.
- WINDOW_A size increased to 6 since the precomputation is much faster per-point.
- DBLU cost: 3M+4S, ZADDU cost: 5M+2S.
- Take advantage of z-ratios from Co-Z to speed up table inversion.
@sipa
Copy link
Contributor

sipa commented Dec 27, 2014

I'm going to add some comments and unit tests on top of this, if you're done making changes.

@sipa
Copy link
Contributor

sipa commented Feb 13, 2015

See rebased/refactored version in #210.

@sipa sipa closed this Feb 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants