Wrapping-up benchmarks #1093

KtorZ · 2025-02-08T18:01:44Z

Preview

CLI

Usage: aiken bench [OPTIONS] [DIRECTORY] [ENV]

Arguments:
  [DIRECTORY]  Path to project
  [ENV]        Environment to use for benchmarking

Options:
      --seed <SEED>                          An initial seed to initialize the pseudo-random generator for 
                                             benchmarks
      
      --max-size <MAX_SIZE>                  The maximum size to benchmark with. Note that this does 
                                             not necessarily equates the number of measurements actually 
                                             performed but controls the maximum size given to a Sampler 
                                             [default: 30]
                                             
  -m, --match-benchmarks <MATCH_BENCHMARKS>  Only run benchmarks if they match any of these strings
  
  -e, --exact-match                          This is meant to be used with `--match-benchmarks`. It forces 
                                             benchmark names to match exactly
  
  -f, --trace-filter <TRACE_FILTER>          Filter traces to be included in the generated program(s). 
                                             [possible values: user-defined, compiler-generated, all]
                                             
  -t, --trace-level <TRACE_LEVEL>            Choose the verbosity level of traces: 
                                             [default: silent] 
                                             [possible values: silent, compact, verbose]

Defining benchmarks

Running benchmarks

bytearray_length.json

{
  "benchmarks": [
    {
      "name": "bytearray_length",
      "module": "tests",
      "measures": [
        {
          "size": 0,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 1,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 2,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 3,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 4,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 5,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 6,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 7,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 8,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 9,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 10,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 11,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 12,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 13,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 14,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 15,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 16,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 17,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 18,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 19,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 20,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 21,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 22,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 23,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 24,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 25,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 26,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 27,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 28,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 29,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 30,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 31,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 32,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 33,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 34,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 35,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 36,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 37,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 38,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 39,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 40,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 41,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 42,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 43,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 44,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 45,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 46,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 47,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 48,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 49,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 50,
          "memory": 942,
          "cpu": 170342
        }
      ]
    }
  ],
  "seed": 91686802
}

bytearray_comparison.json

{
  "benchmarks": [
    {
      "name": "bytearray_comparison",
      "module": "tests",
      "measures": [
        {
          "size": 0,
          "memory": 3293,
          "cpu": 823816
        },
        {
          "size": 1,
          "memory": 3293,
          "cpu": 824386
        },
        {
          "size": 2,
          "memory": 3293,
          "cpu": 824994
        },
        {
          "size": 3,
          "memory": 3293,
          "cpu": 825602
        },
        {
          "size": 4,
          "memory": 3293,
          "cpu": 826210
        },
        {
          "size": 5,
          "memory": 3293,
          "cpu": 826818
        },
        {
          "size": 6,
          "memory": 3293,
          "cpu": 827426
        },
        {
          "size": 7,
          "memory": 3293,
          "cpu": 828034
        },
        {
          "size": 8,
          "memory": 3293,
          "cpu": 828642
        },
        {
          "size": 9,
          "memory": 3293,
          "cpu": 829250
        },
        {
          "size": 10,
          "memory": 3293,
          "cpu": 829858
        },
        {
          "size": 11,
          "memory": 3293,
          "cpu": 830466
        },
        {
          "size": 12,
          "memory": 3293,
          "cpu": 831074
        },
        {
          "size": 13,
          "memory": 3293,
          "cpu": 831682
        },
        {
          "size": 14,
          "memory": 3293,
          "cpu": 832290
        },
        {
          "size": 15,
          "memory": 3293,
          "cpu": 832898
        },
        {
          "size": 16,
          "memory": 3293,
          "cpu": 833506
        },
        {
          "size": 17,
          "memory": 3293,
          "cpu": 834114
        },
        {
          "size": 18,
          "memory": 3293,
          "cpu": 834722
        },
        {
          "size": 19,
          "memory": 3293,
          "cpu": 835330
        },
        {
          "size": 20,
          "memory": 3293,
          "cpu": 835938
        },
        {
          "size": 21,
          "memory": 3293,
          "cpu": 836546
        },
        {
          "size": 22,
          "memory": 3293,
          "cpu": 837154
        },
        {
          "size": 23,
          "memory": 3293,
          "cpu": 837762
        },
        {
          "size": 24,
          "memory": 3293,
          "cpu": 838370
        },
        {
          "size": 25,
          "memory": 3293,
          "cpu": 838978
        },
        {
          "size": 26,
          "memory": 3293,
          "cpu": 839586
        },
        {
          "size": 27,
          "memory": 3293,
          "cpu": 840194
        },
        {
          "size": 28,
          "memory": 3293,
          "cpu": 840802
        },
        {
          "size": 29,
          "memory": 3293,
          "cpu": 841410
        },
        {
          "size": 30,
          "memory": 3293,
          "cpu": 842018
        },
        {
          "size": 31,
          "memory": 3293,
          "cpu": 842626
        },
        {
          "size": 32,
          "memory": 3293,
          "cpu": 843234
        },
        {
          "size": 33,
          "memory": 3293,
          "cpu": 843842
        },
        {
          "size": 34,
          "memory": 3293,
          "cpu": 844450
        },
        {
          "size": 35,
          "memory": 3293,
          "cpu": 845058
        },
        {
          "size": 36,
          "memory": 3293,
          "cpu": 845666
        },
        {
          "size": 37,
          "memory": 3293,
          "cpu": 846274
        },
        {
          "size": 38,
          "memory": 3293,
          "cpu": 846882
        },
        {
          "size": 39,
          "memory": 3293,
          "cpu": 847490
        },
        {
          "size": 40,
          "memory": 3293,
          "cpu": 848098
        },
        {
          "size": 41,
          "memory": 3293,
          "cpu": 848706
        },
        {
          "size": 42,
          "memory": 3293,
          "cpu": 849314
        },
        {
          "size": 43,
          "memory": 3293,
          "cpu": 849922
        },
        {
          "size": 44,
          "memory": 3293,
          "cpu": 850530
        },
        {
          "size": 45,
          "memory": 3293,
          "cpu": 851138
        },
        {
          "size": 46,
          "memory": 3293,
          "cpu": 851746
        },
        {
          "size": 47,
          "memory": 3293,
          "cpu": 852354
        },
        {
          "size": 48,
          "memory": 3293,
          "cpu": 852962
        },
        {
          "size": 49,
          "memory": 3293,
          "cpu": 853570
        },
        {
          "size": 50,
          "memory": 3293,
          "cpu": 854178
        }
      ]
    }
  ],
  "seed": 3601959169
}

Error cases

In sampler
In bench

Changelog

📍 remove unnecessary intermediate variables
Introduced in some previous commits, so basically reverting that.
📍 remove duplicate entry in CHANGELOG
likely due to an wrong merge conflict resolution.
📍 actually fail if a (seeded) sampler return None
This is not supposed to happen, as only replayed sampler/fuzzer can stop.
📍 minor aesthetic changes in test framework.
📍 refactor and fix benchmark type-checking
Fixes:
- Do not allow bench with no arguments; this causes a compiler panic down the line otherwise.
- Do not force the return value to be a boolean or void. We do not actually control what's returned by benchmark, so anything really works here.
Refactor:
- Re-use code between test and bench type-checking; especially the bits related to gathering information about the via arguments. There's quite a lot and simply copy-pasting everything will likely cause issues and discrepency at the first change.
📍 Add additional test to check for Sampler alias formatting.
📍 fixup aesthetics
📍 more aesthetic changes.
In particular, using a concrete enum instead of a string to avoid an unnecessary incomplete pattern-match, and remove superfluous comments.
📍 fuse together bench & test runners, and collect all bench measures.
This commit removes some duplication between bench and test runners, as well as fixing the results coming out of running benchmarks.

Running benchmarks is expected to yield multiple measures, for each of the iteration. For now, it'll suffice to show results for each size; but eventually, we'll possibly try to interpolate results with different curves and pick the best candidate.
📍 rework sizing of benchmarks, taking measures at different points
The idea is to get a good sample of measures from running benchmarks with various sizes, so one can get an idea of how well a function performs at various sizes.

Given that size can be made arbitrarily large, and that we currently report all benchmarks, I installed a fibonacci heuristic to gatherdata points from 0 to the max size using an increasing stepping.

Defined as a trait as I already anticipate we might need different sizing strategy, likely driven by the user via a command-line option; but for now, this will do.
📍 remove unnecessary intermediate variables
Introduced in some previous commits, so basically reverting that.
📍 rework benchmarks output
Going for a terminal plot, for now, as this was the original idea and it is immediately visual. All benchmark points can also be obtained as JSON when redirecting the output, like for tests. So all-in-all, we provide a flexible output which should be useful. Whether it is the best we can do, time (and people/users) will tell.
📍 fix benchmark output when either the sampler or bench fails
This is likely even better than what was done for property testing. We shall revise that one perhaps one day.
📍 Update CHANGELOG w.r.t benchmarks

Introduced in some previous commits, so basically reverting that. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

likely due to an wrong merge conflict resolution. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

This is not supposed to happen, as only replayed sampler/fuzzer can stop. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Fixes: - Do not allow bench with no arguments; this causes a compiler panic down the line otherwise. - Do not force the return value to be a boolean or void. We do not actually control what's returned by benchmark, so anything really works here. Refactor: - Re-use code between test and bench type-checking; especially the bits related to gathering information about the via arguments. There's quite a lot and simply copy-pasting everything will likely cause issues and discrepency at the first change. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

In particular, using a concrete enum instead of a string to avoid an unnecessary incomplete pattern-match, and remove superfluous comments. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

This commit removes some duplication between bench and test runners, as well as fixing the results coming out of running benchmarks. Running benchmarks is expected to yield multiple measures, for each of the iteration. For now, it'll suffice to show results for each size; but eventually, we'll possibly try to interpolate results with different curves and pick the best candidate. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

The idea is to get a good sample of measures from running benchmarks with various sizes, so one can get an idea of how well a function performs at various sizes. Given that size can be made arbitrarily large, and that we currently report all benchmarks, I installed a fibonacci heuristic to gather data points from 0 to the max size using an increasing stepping. Defined as a trait as I already anticipate we might need different sizing strategy, likely driven by the user via a command-line option; but for now, this will do. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Going for a terminal plot, for now, as this was the original idea and it is immediately visual. All benchmark points can also be obtained as JSON when redirecting the output, like for tests. So all-in-all, we provide a flexible output which should be useful. Whether it is the best we can do, time (and people/users) will tell. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

This is likely even better than what was done for property testing. We shall revise that one perhaps one day. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

KtorZ added 13 commits February 9, 2025 16:18

remove unnecessary intermediate variables

782c327

Introduced in some previous commits, so basically reverting that. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

remove duplicate entry in CHANGELOG

d87e7f8

likely due to an wrong merge conflict resolution. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

actually fail if a (seeded) sampler return None

497f663

This is not supposed to happen, as only replayed sampler/fuzzer can stop. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

minor aesthetic changes in test framework.

0a4d60b

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Add additional test to check for Sampler alias formatting.

2a1253c

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

fixup aesthetics

37f721f

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

more aesthetic changes.

a7f4ece

In particular, using a concrete enum instead of a string to avoid an unnecessary incomplete pattern-match, and remove superfluous comments. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

fix benchmark output when either the sampler or bench fails

8edd8d3

This is likely even better than what was done for property testing. We shall revise that one perhaps one day. Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Update CHANGELOG w.r.t benchmarks

451179f

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

KtorZ force-pushed the benchmarks-wrapup branch from ebb29c0 to 451179f Compare February 9, 2025 15:28

KtorZ added 2 commits February 9, 2025 16:53

minor tweaks and proof-reading.

d53f770

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

Make nix build optional.

a6cdb55

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>

KtorZ marked this pull request as ready for review February 9, 2025 16:03

KtorZ requested a review from a team as a code owner February 9, 2025 16:03

KtorZ merged commit 94246bd into main Feb 9, 2025
13 checks passed

KtorZ deleted the benchmarks-wrapup branch February 9, 2025 16:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrapping-up benchmarks #1093

Wrapping-up benchmarks #1093

KtorZ commented Feb 8, 2025 •

edited

Loading

Wrapping-up benchmarks #1093

Wrapping-up benchmarks #1093

Conversation

KtorZ commented Feb 8, 2025 • edited Loading

Preview

CLI

Defining benchmarks

Running benchmarks

Error cases

Changelog

KtorZ commented Feb 8, 2025 •

edited

Loading