Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: adds avx512 vector ops for koalabear and babybear fields #568

Merged
merged 65 commits into from
Dec 19, 2024

Conversation

gbotrel
Copy link
Collaborator

@gbotrel gbotrel commented Dec 8, 2024

Description

The assembly is readable and a breeze to work with after doing the same things with multi words modulus.

Probably couple of perf aberration to correct and optimization to do.

Need to compare vec::mul with the awesome (and well documented :) ) work in Plonky3: https://github.com/Plonky3/Plonky3/blob/20256720b683897b634393dadcf8afab43101cb7/monty-31/src/x86_64_avx512/packing.rs#L319

Benchmark example

benchmark                                      old ns/op     new ns/op     delta
BenchmarkVectorOps/add_256-32                  141           14.7          -89.58%
BenchmarkVectorOps/sub_256-32                  282           14.8          -94.75%
BenchmarkVectorOps/scalarMul_256-32            299           62.8          -79.02%
BenchmarkVectorOps/sum_256-32                  224           30.5          -86.36%
BenchmarkVectorOps/innerProduct_256-32         496           74.9          -84.91%
BenchmarkVectorOps/mul_256-32                  299           66.8          -77.66%
BenchmarkVectorOps/add_512-32                  285           25.8          -90.96%
BenchmarkVectorOps/sub_512-32                  568           26.0          -95.43%
BenchmarkVectorOps/scalarMul_512-32            601           125           -79.23%
BenchmarkVectorOps/sum_512-32                  479           41.3          -91.39%
BenchmarkVectorOps/innerProduct_512-32         993           128           -87.12%
BenchmarkVectorOps/mul_512-32                  601           137           -77.22%
BenchmarkVectorOps/add_65536-32                39938         7742          -80.61%
BenchmarkVectorOps/sub_65536-32                80858         7730          -90.44%
BenchmarkVectorOps/scalarMul_65536-32          83196         16021         -80.74%
BenchmarkVectorOps/sum_65536-32                58330         2849          -95.12%
BenchmarkVectorOps/innerProduct_65536-32       133302        13518         -89.86%
BenchmarkVectorOps/mul_65536-32                86508         16781         -80.60%
BenchmarkVectorOps/add_524288-32               318606        68041         -78.64%
BenchmarkVectorOps/sub_524288-32               639760        68476         -89.30%
BenchmarkVectorOps/scalarMul_524288-32         664143        127940        -80.74%
BenchmarkVectorOps/sum_524288-32               464263        23079         -95.03%
BenchmarkVectorOps/innerProduct_524288-32      1068978       108061        -89.89%
BenchmarkVectorOps/mul_524288-32               689172        133001        -80.70%
BenchmarkVectorOps/add_1048576-32              638737        138234        -78.36%
BenchmarkVectorOps/sub_1048576-32              1282222       138241        -89.22%
BenchmarkVectorOps/scalarMul_1048576-32        1331820       256007        -80.78%
BenchmarkVectorOps/sum_1048576-32              924642        46221         -95.00%
BenchmarkVectorOps/innerProduct_1048576-32     2138163       215971        -89.90%
BenchmarkVectorOps/mul_1048576-32              1379577       266948        -80.65%
BenchmarkVectorOps/add_2097152-32              1298483       403597        -68.92%
BenchmarkVectorOps/sub_2097152-32              2599115       398351        -84.67%
BenchmarkVectorOps/scalarMul_2097152-32        2675315       514020        -80.79%
BenchmarkVectorOps/sum_2097152-32              1846965       92459         -94.99%
BenchmarkVectorOps/innerProduct_2097152-32     4279783       433708        -89.87%
BenchmarkVectorOps/mul_2097152-32              2801868       557975        -80.09%

@gbotrel gbotrel requested review from Tabaie and yelhousni December 8, 2024 17:55
Base automatically changed from experiment/31bits to master December 10, 2024 19:00
@gbotrel gbotrel merged commit cb03f64 into master Dec 19, 2024
5 checks passed
@gbotrel gbotrel deleted the perf/f31_avx branch December 19, 2024 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants