Fast modular inversion #172

mratsim · 2022-02-08T21:47:16Z

This implements fast constant-time modular inversion.

Preliminary benchmarks, without Assembly

On BLS12-381, this is almost 8x faster than Niels Möller algorithm (constant-time inversion in GMP) and Fermat's Little Theorem inversion with addition chains.

mratsim · 2022-02-08T22:34:39Z

Discussion of chosen algorithm

There are 3 papers on fast inversion in the past 3 years:

Bernstein-Yang inversion:

References:
- Original Bernstein-Yang paper, https://eprint.iacr.org/2019/266
- Formal verification by Hvass-Aranha-Spitters, https://eprint.iacr.org/2021/549

Implementations:

In formally-verified fiat-crypto:

https://github.com/mit-plv/fiat-crypto/blob/e612a32/src/Arithmetic/BYInv.v

constantine/formal_verification/bls12_381_q_64.c

Line 2623 in c02e6bd

    
           void fiat_bls12_381_q_divstep(uint64_t* out1, uint64_t out2[6], uint64_t out3[6], uint64_t out4[6], uint64_t out5[6], uint64_t arg1, const uint64_t arg2[6], const uint64_t arg3[6], const uint64_t arg4[6], const uint64_t arg5[6]) {

In Relic by Aranha: https://github.com/relic-toolkit/relic/blob/6d29b27/src/fp/relic_fp_inv.c#L553
In Bitcoin's secp256k1:
- doc: https://github.com/bitcoin-core/secp256k1/blob/0775283/doc/safegcd_implementation.md
- impl: https://github.com/bitcoin-core/secp256k1/blob/0775283/src/modinv64_impl.h

Pornin's inversion:

References:
- https://eprint.iacr.org/2020/972.pdf
  - https://github.com/pornin/bingcd
Discussion on SIMD optimization [Optim] SIMD for Pornin's GCD inverse supranational/blst#62
Implementations:
- In BLST: https://github.com/supranational/blst/blob/164ce62/src/asm/ctx_inverse_mod_384-x86_64.pl#L14-L63
- In Gnark: https://github.com/ConsenSys/gnark-crypto/blob/b04e1f3/ecc/bls12-381/fp/element.go#L1213-L1354

Discussion

This PR implements Bernstein-Yang inversion, there is a sketch of Pornin's inversion at:

Correctly and efficiently implementing Pornin's for generic primes is actually tricky:

L22: (u, v) ← (uf₀ + vg₀ mod m, uf₁ + vg₁ mod m)
This requires efficient modular reduction. This is true for Generalized Mersenne Primes
like secp256k1 or ED25519 but not BLS12-381.
Given that Pornin's approach uses divsteps 31 instead of Bernstein 62 (on 64-bit)
a slow reduction will have twice the impact.
BLST's authors delayed the modular reduction but this triggered
an edge case in fuzzing: supranational/blst@fd45352#commitcomment-66068518
In the past there was another edge case raised:
- supranational/blst@3533291
An efficient implementation requires:
1. Assembly for cmov in inner loop, leading zero count
- Note: it requires lzcount as clz(0) is undefined.
- De Bruijn log2 cannot be used for constant-time unless we scan the whole table
  to foil cache timing attacks
- Generic impl is 10% total cost: WIP: "safegcd" field and scalar inversion bitcoin-core/secp256k1#767 (comment)
1. fast or delayed/batched modular reduction
2. an extra bit in the high word for negative integers, making it unsuitable for secp256k1 or P256
  when using a saturated representation.

In particular the inner loop needs to be as streamlined as possible, the lack of cmov and lzcount being platform-dependent makes the inner loop slow in pure Nim/C.
Regarding point 2, delayed/batched modular reduction alone can be done, however Pornin's method relies on an approximation of inputs that needs to be corrected at regular interval and at the computation end. Given the edge cases that popped up in BLST, delaying modular reduction AND correcting the approximation AND doing that constant-time seems fraught with peril.

…on trick

mratsim added 4 commits February 6, 2022 18:36

split modular inversion in its own file

e2d72a5

Stash fast GCD inversion https://eprint.iacr.org/2020/972.pdf

3c994dd

Stash Pornin's bingcd -> issue with inner modular reduction

7e19065

Implement Bernstein-Yang inversion

c590ed5

mratsim added 10 commits February 9, 2022 00:17

Avoid Nim checks on signed integers (32-bit runtime issue)

de07f08

cleanup: remove old inversion impls

330820f

cleanup: static moduli, move div2

c8e31ca

small comments (skip ci)

657231a

comment cleanup (skip ci)

a385e2a

fix total iterations on 32-bit

c1efa2a

Add batch conversion to affine coordinates using simultaneous inversi…

a62fd21

…on trick

fix conditional setZero and batchAffine conversion

7c703b6

cleanup unneeded branches following affine conversion unification

83bfae6

Fix batchAffine with zero inputs and add fuzz failure to test suite

a39958c

mratsim linked an issue Feb 10, 2022 that may be closed by this pull request

Implement fast inversion for public data #40

Closed

mratsim merged commit 53c4db7 into master Feb 10, 2022

mratsim deleted the fast-inv branch February 11, 2022 10:39

This was referenced Feb 21, 2022

Simultaneous inversion #49

Closed

Github Action WIndows failure #56

Closed

mratsim mentioned this pull request Oct 29, 2022

Add EIP-5843: EVM Modular Arithmetic Extensions ethereum/EIPs#5843

Closed

mratsim mentioned this pull request Jun 15, 2023

Implement faster inversion privacy-scaling-explorations/halo2curves#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast modular inversion #172

Fast modular inversion #172

mratsim commented Feb 8, 2022

mratsim commented Feb 8, 2022 •

edited

Loading

Fast modular inversion #172

Fast modular inversion #172

Conversation

mratsim commented Feb 8, 2022

mratsim commented Feb 8, 2022 • edited Loading

Discussion of chosen algorithm

Bernstein-Yang inversion:

Pornin's inversion:

Discussion

mratsim commented Feb 8, 2022 •

edited

Loading