Effectively affine precomp A for _ecmult without inversion! #171

peterdettman · 2014-12-21T09:02:11Z

Builds on top of Use Co-Z arithmetic for precomputations #41 (Co-Z arithmetic for precomputation).
Instead of inversion, scales all precomp A points to the same Z (usually != 1).
Store the precomp A as affine by discarding the z coordinate (whilst noting the single Z value for later), which is equivalent to taking them as affine points on an isogenous curve.
In secp256k1_ecmult, we treat the accumulator ("r") as operating on this isogenous curve.
Doubling formula needs no changes, adding of precomp A points is a simple _gej_add_ge_var.
G precomp remains the same, but each add of these requires an extra 1M to bring the accumulator temporarily back to the "default isogeny". For simplicity here I've modified _gej_add_ge_var to support this.
The z value of the final result needs to be scaled (1M) also.
Optimal at WINDOW_A=5.

Anyway, the upshot of this is an ~5% performance improvement for bench_verify (64bit, endo=yes), over and above the Co-Z precomputation itself, so >8% total vs master. Rough math suggests it's saving ~116M vs the Co-Z arithmetic PR, which appears to make any inversion approach obsolete.

Questions welcome, as I'm not sure how to explain this in a straight-forward way (as far as I know this is a novel idea). I guess it's important to understand why the isogeny works out so neatly; it's a rather nice property of secp256k1 that stems from it having a==0 in the curve equation; otherwise, operating on an isogenous curve would require changes to the doubling formula. I recommend e.g. http://joye.site88.net/papers/TJ10coordblind.pdf, where this is discussed in the context of blinding (which reminds me I was meaning to PR a demo of that), albeit the a==0 case is not explicitly called out. It may be easier just to think of it as playing games with the z values...

EDIT: I'll leave it as I wrote it, but I think the above should be talking about isomorphisms rather than isogenies.

gmaxwell · 2014-12-21T09:05:46Z

It intutively makes sense to me. This is quite awesome. I'd already thought about using an isogenous projection for blinding (having seen it mentioned in a list of blinding techinues)-- but doing scalar blinding seems more interesting for most of our need for blinding, never would have thought of it for this.

Darn ECDSA: With schnorr this would make our verification completely inversion free.

peterdettman · 2014-12-22T02:38:36Z

Perhaps a better reference: http://joye.site88.net/papers/JT01edpa.pdf

@gmaxwell Im still a bit giddy that this works so well. I had the z-ratio trick and the isomorphism stuff in mind for a while, and yesterday it just gelled for me. I wasn't quite sure how it would work out , but I coded it up and... 1M per G add? Awesome, indeed.

I actually now think it will work fine for a!=0 curves too, using "modified Jacobian" coordinates to avoid increasing the cost of doubling.

Why not both scalar and point blinding? Simple point blinding amounts to just an extra 2M per scalar mult, assuming the G precomp points have been moved to some random curve isomorphism in setup.

gmaxwell · 2014-12-22T18:50:08Z

Why not both scalar and point blinding?

Not unresonable, just updating the point bilinding takes a fair amount more computation and we already have the unknown-discrete-log blinding. I'm all for implementing all reasonably simple, reasonably fast countermeasures for the secret data handling cases. Though actual immunity to things like high resolution power sidechannel analysis seem hopeless on general purposes computers, thats no reason to make an attacker's life easier.

sipa · 2014-12-23T13:42:46Z

This is just awesome. It seems we're around 8% slower than Ed25519 here with it (without using GLV, with gmp).

I'm going to play around with the math, try to see if I can derive that it's correct :)

peterdettman · 2014-12-24T02:22:07Z

FYI, I managed to apply this same technique to a P-521 random-base-mult implementation (http://indigo.ie/~mscott/ws521.cpp), with a similar ~8% performance boost. That curve has a==-3, which has to be scaled for the isomorphism, so to retain doubling performance (3M+5S) the formula has to be changed to a modified-Jacobian scheme. The adds (104) then cost 7M+6S instead of 11M+5S. This saving is offset by (4M+S) per precomputed point (15) to make them "co-Z".

So we can tentatively conclude that this is applicable to any short Weierstrass form curve, with (only) a==0 being particularly efficient. I don't know enough about other curve forms to comment yet.

The question arises whether the scheme can still be used for general (aP + bQ + cR + ...), and indeed I see no fundamental obstacle, just an extra few steps involved in the precomputation to arrange for all precomp-tables to be globally co-Z.

- Selected Co-Z formulas from "Scalar Multiplication on Weierstraß Elliptic Curves from Co-Z Arithmetic" (Goundar, Joye, et. al.) added as group methods with new type sep256k1_coz_t. - Co-Z methods used for A and G point precomputations. - WINDOW_A size increased to 6 since the precomputation is much faster per-point. - DBLU cost: 3M+4S, ZADDU cost: 5M+2S. - Take advantage of z-ratios from Co-Z to speed up table inversion.

sipa · 2014-12-27T14:34:56Z

I'm going to add some comments and unit tests on top of this, if you're done making changes.

sipa · 2015-02-13T00:35:26Z

See rebased/refactored version in #210.

peterdettman force-pushed the ecmult-coz branch from aaa8b73 to b9dd72f Compare December 23, 2014 11:11

peterdettman added 2 commits December 24, 2014 16:00

Demonstrate "split-Z" secp256k1_ecmult

7d3e82f

peterdettman force-pushed the ecmult-coz branch from b9dd72f to 7d3e82f Compare December 24, 2014 09:04

sipa mentioned this pull request Dec 29, 2014

Co-Z + effective affine precomputation + tests #174

Closed

sipa closed this Feb 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effectively affine precomp A for _ecmult without inversion! #171

Effectively affine precomp A for _ecmult without inversion! #171

peterdettman commented Dec 21, 2014

gmaxwell commented Dec 21, 2014

peterdettman commented Dec 22, 2014

gmaxwell commented Dec 22, 2014

sipa commented Dec 23, 2014

peterdettman commented Dec 24, 2014

sipa commented Dec 27, 2014

sipa commented Feb 13, 2015

Effectively affine precomp A for _ecmult without inversion! #171

Effectively affine precomp A for _ecmult without inversion! #171

Conversation

peterdettman commented Dec 21, 2014

gmaxwell commented Dec 21, 2014

peterdettman commented Dec 22, 2014

gmaxwell commented Dec 22, 2014

sipa commented Dec 23, 2014

peterdettman commented Dec 24, 2014

sipa commented Dec 27, 2014

sipa commented Feb 13, 2015