Add Support for Preallocating Entry Slice in Batch Verifier #16

patrick-ogrady · 2023-12-08T23:17:39Z

Before:

goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
BenchmarkCreateBatch/1-12  	 5439469	       214.8 ns/op	      96 B/op	       2 allocs/op
BenchmarkCreateBatch/8-12  	  704911	      1714 ns/op	    1104 B/op	      12 allocs/op
BenchmarkCreateBatch/64-12 	   90453	     13074 ns/op	    9680 B/op	      71 allocs/op
BenchmarkCreateBatch/1024-12         	    5697	    204843 ns/op	  155089 B/op	    1035 allocs/op
BenchmarkCreateBatch/4096-12         	    1401	    859972 ns/op	  892369 B/op	    4111 allocs/op
BenchmarkCreateBatch/16384-12        	     316	   3727873 ns/op	 5094872 B/op	   16405 allocs/op
PASS
ok  	github.com/hdevalence/ed25519consensus	10.901s

After (~71% reduction in bytes allocated, ~13% reduction in runtime on batch of 16384):

goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
BenchmarkCreatePreallocatedBatch/1-12  	 5431094	       213.9 ns/op	      96 B/op	       2 allocs/op
BenchmarkCreatePreallocatedBatch/8-12  	  762730	      1585 ns/op	     704 B/op	       9 allocs/op
BenchmarkCreatePreallocatedBatch/64-12 	   95875	     12557 ns/op	    6144 B/op	      65 allocs/op
BenchmarkCreatePreallocatedBatch/1024-12         	    5967	    199347 ns/op	   90112 B/op	    1025 allocs/op
BenchmarkCreatePreallocatedBatch/4096-12         	    1501	    805202 ns/op	  360449 B/op	    4097 allocs/op
BenchmarkCreatePreallocatedBatch/16384-12        	     374	   3237073 ns/op	 1441795 B/op	   16385 allocs/op
PASS
ok  	github.com/hdevalence/ed25519consensus	10.878s

codecov-commenter · 2023-12-10T18:47:34Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (7232f88) 77.52% compared to head (a53962b) 70.00%.

❗ Current head a53962b differs from pull request most recent head 78b9a57. Consider uploading reports for the commit 78b9a57 to get more accurate results

Files	Patch %	Lines
batch.go	64.00%	5 Missing and 4 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
- Coverage   77.52%   70.00%   -7.53%     
==========================================
  Files           2        2              
  Lines          89       90       +1     
==========================================
- Hits           69       63       -6     
- Misses         10       15       +5     
- Partials       10       12       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

FiloSottile · 2024-01-08T17:20:51Z

Thank you for your patience! I pushed a few further changes that cut the allocations down and let the Add inputs avoid escaping to the heap as well. Let me know if they look good and I'll merge and cut a release.

go: go1.21.5
goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/op       │   sec/op     vs base               │
Batch/1-8                         69.62µ ± 1%   70.02µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         243.3µ ± 1%   243.9µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        1.579m ± 1%   1.578m ± 0%       ~ (p=0.739 n=10)
Batch/1024-8                      24.60m ± 4%   24.59m ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      101.1m ± 0%   101.2m ± 2%       ~ (p=0.436 n=10)
Batch/16384-8                     480.2m ± 2%   479.3m ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.239 n=10)
PreallocatedBatch/8-8             243.0µ ± 1%   242.4µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            1.607m ± 6%   1.577m ± 3%  -1.81% (p=0.015 n=10)
PreallocatedBatch/1024-8          24.61m ± 2%   24.56m ± 0%  -0.23% (p=0.002 n=10)
PreallocatedBatch/4096-8          101.1m ± 0%   101.2m ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         479.8m ± 6%   479.3m ± 2%       ~ (p=0.436 n=10)
Verification-8                    48.40µ ± 2%   48.26µ ± 1%       ~ (p=0.436 n=10)
geomean                           3.913m        3.907m       -0.16%

                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/sig      │   sec/sig    vs base               │
Batch/1-8                         69.62µ ± 1%   70.01µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         30.42µ ± 1%   30.48µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        24.66µ ± 1%   24.66µ ± 0%       ~ (p=0.697 n=10)
Batch/1024-8                      24.02µ ± 4%   24.01µ ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      24.69µ ± 0%   24.70µ ± 2%       ~ (p=0.448 n=10)
Batch/16384-8                     29.31µ ± 2%   29.25µ ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.247 n=10)
PreallocatedBatch/8-8             30.37µ ± 1%   30.30µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            25.10µ ± 6%   24.65µ ± 3%  -1.81% (p=0.014 n=10)
PreallocatedBatch/1024-8          24.03µ ± 2%   23.98µ ± 0%  -0.22% (p=0.002 n=10)
PreallocatedBatch/4096-8          24.68µ ± 0%   24.72µ ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         29.29µ ± 6%   29.25µ ± 2%       ~ (p=0.424 n=10)
geomean                           31.17µ        31.12µ       -0.15%

                          │ v0.1.0-4-g31ef22e │        v0.1.0-4-g31ef22e-dirty         │
                          │       B/op        │     B/op      vs base                  │
Batch/1-8                      5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
Batch/8-8                      31.17Ki ± 0%     32.45Ki ± 0%   +4.11% (p=0.000 n=10)
Batch/64-8                     247.7Ki ± 0%     257.4Ki ± 0%   +3.90% (p=0.000 n=10)
Batch/1024-8                   3.636Mi ± 0%     4.006Mi ± 0%  +10.18% (p=0.000 n=10)
Batch/4096-8                   14.71Mi ± 0%     16.28Mi ± 0%  +10.69% (p=0.000 n=10)
Batch/16384-8                  60.16Mi ± 0%     68.72Mi ± 0%  +14.24% (p=0.000 n=10)
PreallocatedBatch/1-8          5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
PreallocatedBatch/8-8          30.78Ki ± 0%     31.25Ki ± 0%   +1.52% (p=0.000 n=10)
PreallocatedBatch/64-8         244.3Ki ± 0%     246.9Ki ± 0%   +1.09% (p=0.000 n=10)
PreallocatedBatch/1024-8       3.574Mi ± 0%     3.621Mi ± 0%   +1.31% (p=0.000 n=10)
PreallocatedBatch/4096-8       14.20Mi ± 0%     14.37Mi ± 0%   +1.16% (p=0.000 n=10)
PreallocatedBatch/16384-8      56.67Mi ± 0%     57.31Mi ± 0%   +1.13% (p=0.000 n=10)
Verification-8                   0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²                  +3.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                          │ v0.1.0-4-g31ef22e │       v0.1.0-4-g31ef22e-dirty        │
                          │     allocs/op     │ allocs/op   vs base                  │
Batch/1-8                        9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
Batch/8-8                        26.00 ± 0%     11.00 ± 0%  -57.69% (p=0.000 n=10)
Batch/64-8                      141.00 ± 0%     14.00 ± 0%  -90.07% (p=0.000 n=10)
Batch/1024-8                   2065.00 ± 0%     19.00 ± 0%  -99.08% (p=0.000 n=10)
Batch/4096-8                   8213.00 ± 0%     23.00 ± 0%  -99.72% (p=0.000 n=10)
Batch/16384-8                 32795.00 ± 0%     29.00 ± 0%  -99.91% (p=0.000 n=10)
PreallocatedBatch/1-8            9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
PreallocatedBatch/8-8           23.000 ± 0%     8.000 ± 0%  -65.22% (p=0.000 n=10)
PreallocatedBatch/64-8         135.000 ± 0%     8.000 ± 0%  -94.07% (p=0.000 n=10)
PreallocatedBatch/1024-8      2055.000 ± 0%     8.000 ± 0%  -99.61% (p=0.000 n=10)
PreallocatedBatch/4096-8      8199.000 ± 0%     8.000 ± 0%  -99.90% (p=0.000 n=10)
PreallocatedBatch/16384-8    32775.000 ± 0%     8.000 ± 0%  -99.98% (p=0.000 n=10)
Verification-8                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²               -97.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

patrick-ogrady · 2024-01-09T21:33:44Z

Thank you for your patience! I pushed a few further changes that cut the allocations down and let the Add inputs avoid escaping to the heap as well. Let me know if they look good and I'll merge and cut a release.

go: go1.21.5
goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/op       │   sec/op     vs base               │
Batch/1-8                         69.62µ ± 1%   70.02µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         243.3µ ± 1%   243.9µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        1.579m ± 1%   1.578m ± 0%       ~ (p=0.739 n=10)
Batch/1024-8                      24.60m ± 4%   24.59m ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      101.1m ± 0%   101.2m ± 2%       ~ (p=0.436 n=10)
Batch/16384-8                     480.2m ± 2%   479.3m ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.239 n=10)
PreallocatedBatch/8-8             243.0µ ± 1%   242.4µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            1.607m ± 6%   1.577m ± 3%  -1.81% (p=0.015 n=10)
PreallocatedBatch/1024-8          24.61m ± 2%   24.56m ± 0%  -0.23% (p=0.002 n=10)
PreallocatedBatch/4096-8          101.1m ± 0%   101.2m ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         479.8m ± 6%   479.3m ± 2%       ~ (p=0.436 n=10)
Verification-8                    48.40µ ± 2%   48.26µ ± 1%       ~ (p=0.436 n=10)
geomean                           3.913m        3.907m       -0.16%

                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/sig      │   sec/sig    vs base               │
Batch/1-8                         69.62µ ± 1%   70.01µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         30.42µ ± 1%   30.48µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        24.66µ ± 1%   24.66µ ± 0%       ~ (p=0.697 n=10)
Batch/1024-8                      24.02µ ± 4%   24.01µ ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      24.69µ ± 0%   24.70µ ± 2%       ~ (p=0.448 n=10)
Batch/16384-8                     29.31µ ± 2%   29.25µ ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.247 n=10)
PreallocatedBatch/8-8             30.37µ ± 1%   30.30µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            25.10µ ± 6%   24.65µ ± 3%  -1.81% (p=0.014 n=10)
PreallocatedBatch/1024-8          24.03µ ± 2%   23.98µ ± 0%  -0.22% (p=0.002 n=10)
PreallocatedBatch/4096-8          24.68µ ± 0%   24.72µ ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         29.29µ ± 6%   29.25µ ± 2%       ~ (p=0.424 n=10)
geomean                           31.17µ        31.12µ       -0.15%

                          │ v0.1.0-4-g31ef22e │        v0.1.0-4-g31ef22e-dirty         │
                          │       B/op        │     B/op      vs base                  │
Batch/1-8                      5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
Batch/8-8                      31.17Ki ± 0%     32.45Ki ± 0%   +4.11% (p=0.000 n=10)
Batch/64-8                     247.7Ki ± 0%     257.4Ki ± 0%   +3.90% (p=0.000 n=10)
Batch/1024-8                   3.636Mi ± 0%     4.006Mi ± 0%  +10.18% (p=0.000 n=10)
Batch/4096-8                   14.71Mi ± 0%     16.28Mi ± 0%  +10.69% (p=0.000 n=10)
Batch/16384-8                  60.16Mi ± 0%     68.72Mi ± 0%  +14.24% (p=0.000 n=10)
PreallocatedBatch/1-8          5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
PreallocatedBatch/8-8          30.78Ki ± 0%     31.25Ki ± 0%   +1.52% (p=0.000 n=10)
PreallocatedBatch/64-8         244.3Ki ± 0%     246.9Ki ± 0%   +1.09% (p=0.000 n=10)
PreallocatedBatch/1024-8       3.574Mi ± 0%     3.621Mi ± 0%   +1.31% (p=0.000 n=10)
PreallocatedBatch/4096-8       14.20Mi ± 0%     14.37Mi ± 0%   +1.16% (p=0.000 n=10)
PreallocatedBatch/16384-8      56.67Mi ± 0%     57.31Mi ± 0%   +1.13% (p=0.000 n=10)
Verification-8                   0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²                  +3.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                          │ v0.1.0-4-g31ef22e │       v0.1.0-4-g31ef22e-dirty        │
                          │     allocs/op     │ allocs/op   vs base                  │
Batch/1-8                        9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
Batch/8-8                        26.00 ± 0%     11.00 ± 0%  -57.69% (p=0.000 n=10)
Batch/64-8                      141.00 ± 0%     14.00 ± 0%  -90.07% (p=0.000 n=10)
Batch/1024-8                   2065.00 ± 0%     19.00 ± 0%  -99.08% (p=0.000 n=10)
Batch/4096-8                   8213.00 ± 0%     23.00 ± 0%  -99.72% (p=0.000 n=10)
Batch/16384-8                 32795.00 ± 0%     29.00 ± 0%  -99.91% (p=0.000 n=10)
PreallocatedBatch/1-8            9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
PreallocatedBatch/8-8           23.000 ± 0%     8.000 ± 0%  -65.22% (p=0.000 n=10)
PreallocatedBatch/64-8         135.000 ± 0%     8.000 ± 0%  -94.07% (p=0.000 n=10)
PreallocatedBatch/1024-8      2055.000 ± 0%     8.000 ± 0%  -99.61% (p=0.000 n=10)
PreallocatedBatch/4096-8      8199.000 ± 0%     8.000 ± 0%  -99.90% (p=0.000 n=10)
PreallocatedBatch/16384-8    32775.000 ± 0%     8.000 ± 0%  -99.98% (p=0.000 n=10)
Verification-8                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²               -97.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Awesome work on the further optimization (especially avoiding the heap on Add) ❤️ . I tried to tread as lightly as I could on my pass but maybe I should've been more ambitious lol...thanks for taking the time!

Only nit, I preferred to see the batch creation time separate from the batch verification time (just to get a better sense of where time was spent but eh).

patrick-ogrady · 2024-01-09T21:38:31Z

batch_test.go

+				b.StartTimer()
+				v := NewBatchVerifier()
+				for j := 0; j < n; j++ {
+					v.Add(pub, msg, sig)


Previously, I created a new public key per item. I felt that was a better representation of what is actually going on in a typical batch (it may not affect the current code but could affect future optimization effectiveness).

Done! I was a bit worried about the overhead of calling StartTimer/StopTimer in such a hot loop, but the benchmarks look good.

I don't want to pregenerate all the keys and signatures, because assigning them to a slice causes them to escape, making it harder to observe the escape analysis of the arguments of Add.

FiloSottile · 2024-01-18T17:37:16Z

Only nit, I preferred to see the batch creation time separate from the batch verification time (just to get a better sense of where time was spent but eh).

I generally prefer to make benchmarks for whole operations that consumers are likely to perform discretely. That avoids over-indexing on a sub-benchmark once it's fast enough and it doesn't move the overall needle much. Different parts that contribute to the running time can be visualized with flame graphs. In this case, I don't think any user will do batch creation without batch verification, or reuse the batch for multiple verifications, so I think an overall benchmark is the right metric.

patrick-ogrady added 3 commits December 8, 2023 15:13

Add support for preallocating batch

97fc320

update README

16586cd

cleanup var declaration

d8a485d

patrick-ogrady mentioned this pull request Dec 8, 2023

[crypto/ed25519] Migrate to hdevalence/ed25519consensus ava-labs/hypersdk#645

Merged

1 task

patrick-ogrady mentioned this pull request Dec 12, 2023

[ed25519] Re-Add Support for Batch Pre-Allocation ava-labs/hypersdk#652

Open

patrick-ogrady commented Jan 9, 2024

View reviewed changes

FiloSottile added 2 commits January 18, 2024 17:17

ed25519consensus: improve batch benchmarks

d877647

ed25519consensus: reduce batch allocations

78b9a57

FiloSottile force-pushed the preallocate-batch branch from 85dd807 to 78b9a57 Compare January 18, 2024 17:34

FiloSottile merged commit 15731de into hdevalence:main Jan 18, 2024
1 check passed

patrick-ogrady mentioned this pull request Jan 21, 2024

[crypto/ed25519] Preallocate batch size for verification ava-labs/hypersdk#695

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for Preallocating Entry Slice in Batch Verifier #16

Add Support for Preallocating Entry Slice in Batch Verifier #16

patrick-ogrady commented Dec 8, 2023 •

edited

Loading

codecov-commenter commented Dec 10, 2023 •

edited

Loading

FiloSottile commented Jan 8, 2024

patrick-ogrady commented Jan 9, 2024 •

edited

Loading

patrick-ogrady Jan 9, 2024

FiloSottile Jan 18, 2024

FiloSottile commented Jan 18, 2024

Add Support for Preallocating Entry Slice in Batch Verifier #16

Add Support for Preallocating Entry Slice in Batch Verifier #16

Conversation

patrick-ogrady commented Dec 8, 2023 • edited Loading

codecov-commenter commented Dec 10, 2023 • edited Loading

Codecov Report

FiloSottile commented Jan 8, 2024

patrick-ogrady commented Jan 9, 2024 • edited Loading

patrick-ogrady Jan 9, 2024

Choose a reason for hiding this comment

FiloSottile Jan 18, 2024

Choose a reason for hiding this comment

FiloSottile commented Jan 18, 2024

patrick-ogrady commented Dec 8, 2023 •

edited

Loading

codecov-commenter commented Dec 10, 2023 •

edited

Loading

patrick-ogrady commented Jan 9, 2024 •

edited

Loading