-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Field/Modular arithmetic design to support efficient arithmetic on any moduli (including even) #2
Comments
sounds all very nice (not sure what the advantage of this approach - ergonomics? performance? correctness? - but I guess it is worth exploring), I found it odd (as a trained mathematician) to hear about |
Oops good point, I was on the implementation details and you're right. I'll change Ergonomics-wise, we can expose just PrimeField and the Modular type since they are likely to be the most used for starter. Performance-wise it should be a win since RNS bases are very often used to accelerate cryptography in hardware ASIC/FPGA. Correctness-wise, what do you mean? I'm not out to write an incorrect library 😅 |
👍 asbolutely, defintely sounded like a unintended slip while focusing on other stuff 😄
yes, you are right that field of power of a prime do exist but they are not the ring of integers module a prime power
seems appropriate
absolutely trust you on that :)
mmh, probably I meant something more like safe-ness? the concept I was trying to express was helping the user of the library to avoid using the wrong algorithm for a given context (similar but not exactly like we protect the user to sum a float and an int) anyway sounds this approach could give an improvement on all fronts! |
Oh that would be just internal details, external API would not expose all that, especially initially when we figure stuff out. |
Some benchmarks with a 381-bit moduli. Note: My modular reduction is constant-time for the algorithm (i.e. always does iterations to cover the worst cases, even if # To be placed in the "./build/" folder
# of unmerged branch
# https://github.com/mratsim/constantine/tree/mont_repr_refactor
import
../benchmarks/[bench_blueprint],
../constantine/config/[curves, type_ff],
../constantine/arithmetic/[bigints, finite_fields],
../constantine/io/io_bigints,
../helpers/prng_unsafe
proc report(op, desc: string, start, stop: MonoTime, startClk, stopClk: int64, iters: int) =
let ns = inNanoseconds((stop-start) div iters)
let throughput = 1e9 / float64(ns)
when SupportsGetTicks:
echo &"{op:<70} {desc:<18} {throughput:>15.3f} ops/s {ns:>9} ns/op {(stopClk - startClk) div iters:>9} CPU cycles (approx)"
else:
echo &"{op:<70} {desc:<18} {throughput:>15.3f} ops/s {ns:>9} ns/op"
template bench(op: string, desc: string, iters: int, body: untyped): untyped =
block:
measure(iters, startTime, stopTime, startClk, stopClk, body)
report(op, desc, startTime, stopTime, startClk, stopClk, iters)
# warmup
proc warmup*() =
# Warmup - make sure cpu is on max perf
let start = cpuTime()
var foo = 123
for i in 0 ..< 300_000_000:
foo += i*i mod 456
foo = foo mod 789
# Compiler shouldn't optimize away the results as cpuTime rely on sideeffects
let stop = cpuTime()
echo &"Warmup: {stop - start:>4.4f} s, result {foo} (displayed to avoid compiler optimizing warmup away)\n"
warmup()
proc mulModBench*(iters: int) =
# BLS12-381 prime
const m = BigInt[381].fromHex"0x1a0111ea397fe69a4b1ba7b6434bacd764774b84f38512bf6730d2a0f6b0f6241eabfffeb153ffffb9feffffffffaaab"
var a = rng.random_unsafe(BigInt[381])
a.reduce(a, m)
var b = rng.random_unsafe(BigInt[381])
b.reduce(b, m)
var r: BigInt[381]
var t: BigInt[762]
# mod has no branches but use hardware DIV
bench("Mul + constant-time+hardware mod (a.k.a always worst-case)", "BLS12-381", iters):
t.prod(a, b)
r.reduce(t, m)
let x = rng.random_unsafe(Fp[BLS12_381])
let y = rng.random_unsafe(Fp[BLS12_381])
var z: Fp[BLS12_381]
bench("Montgomery multiplication", "BLS12-381", iters):
z.prod(x, y)
bench("BigInt <-> Montgomery roundtrip", "BLS12-381", iters):
r.fromField(z)
z.fromBig(r)
mulModBench(10000) |
See Implementing fast modular exponentiation - a guide - status-im/nim-stint#126 And implementation: mratsim/constantine#242 (comment) Speed is at least 85% of GMP of modular exponentiation, the most CPU intensive modular arithmetic primitive. It confirms that splitting between odd modulus and power-of-two modulus and recombining when needed via the Chinese Remainder Theorem is the way to go. |
edited: Field == Modular arithmetic for prime modulus only
Some ideas of modular arithmetic design.
Many algorithms are only applicable to prime moduli (Legendre symbol, Fermat's Little Theorem, Square roots ...) or odd moduli (Jacobi symbol, GCD and Modular inversion via extended Euclid, Montgomery arithmetic). There are also specialized routines for binary extension fields (mod 2ᵏ) and ternary extension fields (mod 3ᵏ) but there is nothing for generic even moduli.
What we can do is represent generic modulus on a dual base power-of-two 2ᵏ and odd similar to the Residue Number System representation. The results of all operations can be recombined using the Chinese Remainder Theorem.
Concretely we would have
Operations would then have the form:
With the convention for
sum
for operation that modify a buffer andadd
for out-of-place functions.Alternatively we can have the Modulus object be carried by Modular elements to allow convenient operators:
This would require making it a ref object.
The text was updated successfully, but these errors were encountered: