Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Preallocating Entry Slice in Batch Verifier #16

Merged
merged 5 commits into from
Jan 18, 2024

Conversation

patrick-ogrady
Copy link
Contributor

@patrick-ogrady patrick-ogrady commented Dec 8, 2023

Before:

goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
BenchmarkCreateBatch/1-12  	 5439469	       214.8 ns/op	      96 B/op	       2 allocs/op
BenchmarkCreateBatch/8-12  	  704911	      1714 ns/op	    1104 B/op	      12 allocs/op
BenchmarkCreateBatch/64-12 	   90453	     13074 ns/op	    9680 B/op	      71 allocs/op
BenchmarkCreateBatch/1024-12         	    5697	    204843 ns/op	  155089 B/op	    1035 allocs/op
BenchmarkCreateBatch/4096-12         	    1401	    859972 ns/op	  892369 B/op	    4111 allocs/op
BenchmarkCreateBatch/16384-12        	     316	   3727873 ns/op	 5094872 B/op	   16405 allocs/op
PASS
ok  	github.com/hdevalence/ed25519consensus	10.901s

After (~71% reduction in bytes allocated, ~13% reduction in runtime on batch of 16384):

goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
BenchmarkCreatePreallocatedBatch/1-12  	 5431094	       213.9 ns/op	      96 B/op	       2 allocs/op
BenchmarkCreatePreallocatedBatch/8-12  	  762730	      1585 ns/op	     704 B/op	       9 allocs/op
BenchmarkCreatePreallocatedBatch/64-12 	   95875	     12557 ns/op	    6144 B/op	      65 allocs/op
BenchmarkCreatePreallocatedBatch/1024-12         	    5967	    199347 ns/op	   90112 B/op	    1025 allocs/op
BenchmarkCreatePreallocatedBatch/4096-12         	    1501	    805202 ns/op	  360449 B/op	    4097 allocs/op
BenchmarkCreatePreallocatedBatch/16384-12        	     374	   3237073 ns/op	 1441795 B/op	   16385 allocs/op
PASS
ok  	github.com/hdevalence/ed25519consensus	10.878s

@codecov-commenter
Copy link

codecov-commenter commented Dec 10, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (7232f88) 77.52% compared to head (a53962b) 70.00%.

❗ Current head a53962b differs from pull request most recent head 78b9a57. Consider uploading reports for the commit 78b9a57 to get more accurate results

Files Patch % Lines
batch.go 64.00% 5 Missing and 4 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
- Coverage   77.52%   70.00%   -7.53%     
==========================================
  Files           2        2              
  Lines          89       90       +1     
==========================================
- Hits           69       63       -6     
- Misses         10       15       +5     
- Partials       10       12       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@FiloSottile
Copy link
Collaborator

Thank you for your patience! I pushed a few further changes that cut the allocations down and let the Add inputs avoid escaping to the heap as well. Let me know if they look good and I'll merge and cut a release.

go: go1.21.5
goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/op       │   sec/op     vs base               │
Batch/1-8                         69.62µ ± 1%   70.02µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         243.3µ ± 1%   243.9µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        1.579m ± 1%   1.578m ± 0%       ~ (p=0.739 n=10)
Batch/1024-8                      24.60m ± 4%   24.59m ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      101.1m ± 0%   101.2m ± 2%       ~ (p=0.436 n=10)
Batch/16384-8                     480.2m ± 2%   479.3m ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.239 n=10)
PreallocatedBatch/8-8             243.0µ ± 1%   242.4µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            1.607m ± 6%   1.577m ± 3%  -1.81% (p=0.015 n=10)
PreallocatedBatch/1024-8          24.61m ± 2%   24.56m ± 0%  -0.23% (p=0.002 n=10)
PreallocatedBatch/4096-8          101.1m ± 0%   101.2m ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         479.8m ± 6%   479.3m ± 2%       ~ (p=0.436 n=10)
Verification-8                    48.40µ ± 2%   48.26µ ± 1%       ~ (p=0.436 n=10)
geomean                           3.913m        3.907m       -0.16%

                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/sig      │   sec/sig    vs base               │
Batch/1-8                         69.62µ ± 1%   70.01µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         30.42µ ± 1%   30.48µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        24.66µ ± 1%   24.66µ ± 0%       ~ (p=0.697 n=10)
Batch/1024-8                      24.02µ ± 4%   24.01µ ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      24.69µ ± 0%   24.70µ ± 2%       ~ (p=0.448 n=10)
Batch/16384-8                     29.31µ ± 2%   29.25µ ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.247 n=10)
PreallocatedBatch/8-8             30.37µ ± 1%   30.30µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            25.10µ ± 6%   24.65µ ± 3%  -1.81% (p=0.014 n=10)
PreallocatedBatch/1024-8          24.03µ ± 2%   23.98µ ± 0%  -0.22% (p=0.002 n=10)
PreallocatedBatch/4096-8          24.68µ ± 0%   24.72µ ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         29.29µ ± 6%   29.25µ ± 2%       ~ (p=0.424 n=10)
geomean                           31.17µ        31.12µ       -0.15%

                          │ v0.1.0-4-g31ef22e │        v0.1.0-4-g31ef22e-dirty         │
                          │       B/op        │     B/op      vs base                  │
Batch/1-8                      5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
Batch/8-8                      31.17Ki ± 0%     32.45Ki ± 0%   +4.11% (p=0.000 n=10)
Batch/64-8                     247.7Ki ± 0%     257.4Ki ± 0%   +3.90% (p=0.000 n=10)
Batch/1024-8                   3.636Mi ± 0%     4.006Mi ± 0%  +10.18% (p=0.000 n=10)
Batch/4096-8                   14.71Mi ± 0%     16.28Mi ± 0%  +10.69% (p=0.000 n=10)
Batch/16384-8                  60.16Mi ± 0%     68.72Mi ± 0%  +14.24% (p=0.000 n=10)
PreallocatedBatch/1-8          5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
PreallocatedBatch/8-8          30.78Ki ± 0%     31.25Ki ± 0%   +1.52% (p=0.000 n=10)
PreallocatedBatch/64-8         244.3Ki ± 0%     246.9Ki ± 0%   +1.09% (p=0.000 n=10)
PreallocatedBatch/1024-8       3.574Mi ± 0%     3.621Mi ± 0%   +1.31% (p=0.000 n=10)
PreallocatedBatch/4096-8       14.20Mi ± 0%     14.37Mi ± 0%   +1.16% (p=0.000 n=10)
PreallocatedBatch/16384-8      56.67Mi ± 0%     57.31Mi ± 0%   +1.13% (p=0.000 n=10)
Verification-8                   0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²                  +3.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                          │ v0.1.0-4-g31ef22e │       v0.1.0-4-g31ef22e-dirty        │
                          │     allocs/op     │ allocs/op   vs base                  │
Batch/1-8                        9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
Batch/8-8                        26.00 ± 0%     11.00 ± 0%  -57.69% (p=0.000 n=10)
Batch/64-8                      141.00 ± 0%     14.00 ± 0%  -90.07% (p=0.000 n=10)
Batch/1024-8                   2065.00 ± 0%     19.00 ± 0%  -99.08% (p=0.000 n=10)
Batch/4096-8                   8213.00 ± 0%     23.00 ± 0%  -99.72% (p=0.000 n=10)
Batch/16384-8                 32795.00 ± 0%     29.00 ± 0%  -99.91% (p=0.000 n=10)
PreallocatedBatch/1-8            9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
PreallocatedBatch/8-8           23.000 ± 0%     8.000 ± 0%  -65.22% (p=0.000 n=10)
PreallocatedBatch/64-8         135.000 ± 0%     8.000 ± 0%  -94.07% (p=0.000 n=10)
PreallocatedBatch/1024-8      2055.000 ± 0%     8.000 ± 0%  -99.61% (p=0.000 n=10)
PreallocatedBatch/4096-8      8199.000 ± 0%     8.000 ± 0%  -99.90% (p=0.000 n=10)
PreallocatedBatch/16384-8    32775.000 ± 0%     8.000 ± 0%  -99.98% (p=0.000 n=10)
Verification-8                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²               -97.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@patrick-ogrady
Copy link
Contributor Author

patrick-ogrady commented Jan 9, 2024

Thank you for your patience! I pushed a few further changes that cut the allocations down and let the Add inputs avoid escaping to the heap as well. Let me know if they look good and I'll merge and cut a release.

go: go1.21.5
goos: darwin
goarch: arm64
pkg: github.com/hdevalence/ed25519consensus
                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/op       │   sec/op     vs base               │
Batch/1-8                         69.62µ ± 1%   70.02µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         243.3µ ± 1%   243.9µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        1.579m ± 1%   1.578m ± 0%       ~ (p=0.739 n=10)
Batch/1024-8                      24.60m ± 4%   24.59m ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      101.1m ± 0%   101.2m ± 2%       ~ (p=0.436 n=10)
Batch/16384-8                     480.2m ± 2%   479.3m ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.239 n=10)
PreallocatedBatch/8-8             243.0µ ± 1%   242.4µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            1.607m ± 6%   1.577m ± 3%  -1.81% (p=0.015 n=10)
PreallocatedBatch/1024-8          24.61m ± 2%   24.56m ± 0%  -0.23% (p=0.002 n=10)
PreallocatedBatch/4096-8          101.1m ± 0%   101.2m ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         479.8m ± 6%   479.3m ± 2%       ~ (p=0.436 n=10)
Verification-8                    48.40µ ± 2%   48.26µ ± 1%       ~ (p=0.436 n=10)
geomean                           3.913m        3.907m       -0.16%

                          │ v0.1.0-4-g31ef22e │      v0.1.0-4-g31ef22e-dirty       │
                          │      sec/sig      │   sec/sig    vs base               │
Batch/1-8                         69.62µ ± 1%   70.01µ ± 0%  +0.56% (p=0.011 n=10)
Batch/8-8                         30.42µ ± 1%   30.48µ ± 0%       ~ (p=0.631 n=10)
Batch/64-8                        24.66µ ± 1%   24.66µ ± 0%       ~ (p=0.697 n=10)
Batch/1024-8                      24.02µ ± 4%   24.01µ ± 3%       ~ (p=0.684 n=10)
Batch/4096-8                      24.69µ ± 0%   24.70µ ± 2%       ~ (p=0.448 n=10)
Batch/16384-8                     29.31µ ± 2%   29.25µ ± 4%       ~ (p=0.529 n=10)
PreallocatedBatch/1-8             69.95µ ± 0%   69.88µ ± 0%       ~ (p=0.247 n=10)
PreallocatedBatch/8-8             30.37µ ± 1%   30.30µ ± 1%       ~ (p=0.631 n=10)
PreallocatedBatch/64-8            25.10µ ± 6%   24.65µ ± 3%  -1.81% (p=0.014 n=10)
PreallocatedBatch/1024-8          24.03µ ± 2%   23.98µ ± 0%  -0.22% (p=0.002 n=10)
PreallocatedBatch/4096-8          24.68µ ± 0%   24.72µ ± 3%       ~ (p=0.739 n=10)
PreallocatedBatch/16384-8         29.29µ ± 6%   29.25µ ± 2%       ~ (p=0.424 n=10)
geomean                           31.17µ        31.12µ       -0.15%

                          │ v0.1.0-4-g31ef22e │        v0.1.0-4-g31ef22e-dirty         │
                          │       B/op        │     B/op      vs base                  │
Batch/1-8                      5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
Batch/8-8                      31.17Ki ± 0%     32.45Ki ± 0%   +4.11% (p=0.000 n=10)
Batch/64-8                     247.7Ki ± 0%     257.4Ki ± 0%   +3.90% (p=0.000 n=10)
Batch/1024-8                   3.636Mi ± 0%     4.006Mi ± 0%  +10.18% (p=0.000 n=10)
Batch/4096-8                   14.71Mi ± 0%     16.28Mi ± 0%  +10.69% (p=0.000 n=10)
Batch/16384-8                  60.16Mi ± 0%     68.72Mi ± 0%  +14.24% (p=0.000 n=10)
PreallocatedBatch/1-8          5.516Ki ± 0%     5.594Ki ± 0%   +1.42% (p=0.000 n=10)
PreallocatedBatch/8-8          30.78Ki ± 0%     31.25Ki ± 0%   +1.52% (p=0.000 n=10)
PreallocatedBatch/64-8         244.3Ki ± 0%     246.9Ki ± 0%   +1.09% (p=0.000 n=10)
PreallocatedBatch/1024-8       3.574Mi ± 0%     3.621Mi ± 0%   +1.31% (p=0.000 n=10)
PreallocatedBatch/4096-8       14.20Mi ± 0%     14.37Mi ± 0%   +1.16% (p=0.000 n=10)
PreallocatedBatch/16384-8      56.67Mi ± 0%     57.31Mi ± 0%   +1.13% (p=0.000 n=10)
Verification-8                   0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²                  +3.92%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                          │ v0.1.0-4-g31ef22e │       v0.1.0-4-g31ef22e-dirty        │
                          │     allocs/op     │ allocs/op   vs base                  │
Batch/1-8                        9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
Batch/8-8                        26.00 ± 0%     11.00 ± 0%  -57.69% (p=0.000 n=10)
Batch/64-8                      141.00 ± 0%     14.00 ± 0%  -90.07% (p=0.000 n=10)
Batch/1024-8                   2065.00 ± 0%     19.00 ± 0%  -99.08% (p=0.000 n=10)
Batch/4096-8                   8213.00 ± 0%     23.00 ± 0%  -99.72% (p=0.000 n=10)
Batch/16384-8                 32795.00 ± 0%     29.00 ± 0%  -99.91% (p=0.000 n=10)
PreallocatedBatch/1-8            9.000 ± 0%     8.000 ± 0%  -11.11% (p=0.000 n=10)
PreallocatedBatch/8-8           23.000 ± 0%     8.000 ± 0%  -65.22% (p=0.000 n=10)
PreallocatedBatch/64-8         135.000 ± 0%     8.000 ± 0%  -94.07% (p=0.000 n=10)
PreallocatedBatch/1024-8      2055.000 ± 0%     8.000 ± 0%  -99.61% (p=0.000 n=10)
PreallocatedBatch/4096-8      8199.000 ± 0%     8.000 ± 0%  -99.90% (p=0.000 n=10)
PreallocatedBatch/16384-8    32775.000 ± 0%     8.000 ± 0%  -99.98% (p=0.000 n=10)
Verification-8                   0.000 ± 0%     0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                     ²               -97.02%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Awesome work on the further optimization (especially avoiding the heap on Add) ❤️ . I tried to tread as lightly as I could on my pass but maybe I should've been more ambitious lol...thanks for taking the time!

Only nit, I preferred to see the batch creation time separate from the batch verification time (just to get a better sense of where time was spent but eh).

b.StartTimer()
v := NewBatchVerifier()
for j := 0; j < n; j++ {
v.Add(pub, msg, sig)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, I created a new public key per item. I felt that was a better representation of what is actually going on in a typical batch (it may not affect the current code but could affect future optimization effectiveness).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I was a bit worried about the overhead of calling StartTimer/StopTimer in such a hot loop, but the benchmarks look good.

I don't want to pregenerate all the keys and signatures, because assigning them to a slice causes them to escape, making it harder to observe the escape analysis of the arguments of Add.

@FiloSottile
Copy link
Collaborator

Only nit, I preferred to see the batch creation time separate from the batch verification time (just to get a better sense of where time was spent but eh).

I generally prefer to make benchmarks for whole operations that consumers are likely to perform discretely. That avoids over-indexing on a sub-benchmark once it's fast enough and it doesn't move the overall needle much. Different parts that contribute to the running time can be visualized with flame graphs. In this case, I don't think any user will do batch creation without batch verification, or reuse the batch for multiple verifications, so I think an overall benchmark is the right metric.

@FiloSottile FiloSottile merged commit 15731de into hdevalence:main Jan 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants