Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapping-up benchmarks #1093

Merged
merged 15 commits into from
Feb 9, 2025
Merged

Wrapping-up benchmarks #1093

merged 15 commits into from
Feb 9, 2025

Conversation

KtorZ
Copy link
Member

@KtorZ KtorZ commented Feb 8, 2025

Preview

CLI

Usage: aiken bench [OPTIONS] [DIRECTORY] [ENV]

Arguments:
  [DIRECTORY]  Path to project
  [ENV]        Environment to use for benchmarking

Options:
      --seed <SEED>                          An initial seed to initialize the pseudo-random generator for 
                                             benchmarks
      
      --max-size <MAX_SIZE>                  The maximum size to benchmark with. Note that this does 
                                             not necessarily equates the number of measurements actually 
                                             performed but controls the maximum size given to a Sampler 
                                             [default: 30]
                                             
  -m, --match-benchmarks <MATCH_BENCHMARKS>  Only run benchmarks if they match any of these strings
  
  -e, --exact-match                          This is meant to be used with `--match-benchmarks`. It forces 
                                             benchmark names to match exactly
  
  -f, --trace-filter <TRACE_FILTER>          Filter traces to be included in the generated program(s). 
                                             [possible values: user-defined, compiler-generated, all]
                                             
  -t, --trace-level <TRACE_LEVEL>            Choose the verbosity level of traces: 
                                             [default: silent] 
                                             [possible values: silent, compact, verbose]

Defining benchmarks

image

Running benchmarks

image
bytearray_length.json
{
  "benchmarks": [
    {
      "name": "bytearray_length",
      "module": "tests",
      "measures": [
        {
          "size": 0,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 1,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 2,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 3,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 4,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 5,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 6,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 7,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 8,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 9,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 10,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 11,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 12,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 13,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 14,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 15,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 16,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 17,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 18,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 19,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 20,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 21,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 22,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 23,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 24,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 25,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 26,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 27,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 28,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 29,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 30,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 31,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 32,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 33,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 34,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 35,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 36,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 37,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 38,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 39,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 40,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 41,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 42,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 43,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 44,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 45,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 46,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 47,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 48,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 49,
          "memory": 942,
          "cpu": 170342
        },
        {
          "size": 50,
          "memory": 942,
          "cpu": 170342
        }
      ]
    }
  ],
  "seed": 91686802
}
image
bytearray_comparison.json ```json { "benchmarks": [ { "name": "bytearray_comparison", "module": "tests", "measures": [ { "size": 0, "memory": 3293, "cpu": 823816 }, { "size": 1, "memory": 3293, "cpu": 824386 }, { "size": 2, "memory": 3293, "cpu": 824994 }, { "size": 3, "memory": 3293, "cpu": 825602 }, { "size": 4, "memory": 3293, "cpu": 826210 }, { "size": 5, "memory": 3293, "cpu": 826818 }, { "size": 6, "memory": 3293, "cpu": 827426 }, { "size": 7, "memory": 3293, "cpu": 828034 }, { "size": 8, "memory": 3293, "cpu": 828642 }, { "size": 9, "memory": 3293, "cpu": 829250 }, { "size": 10, "memory": 3293, "cpu": 829858 }, { "size": 11, "memory": 3293, "cpu": 830466 }, { "size": 12, "memory": 3293, "cpu": 831074 }, { "size": 13, "memory": 3293, "cpu": 831682 }, { "size": 14, "memory": 3293, "cpu": 832290 }, { "size": 15, "memory": 3293, "cpu": 832898 }, { "size": 16, "memory": 3293, "cpu": 833506 }, { "size": 17, "memory": 3293, "cpu": 834114 }, { "size": 18, "memory": 3293, "cpu": 834722 }, { "size": 19, "memory": 3293, "cpu": 835330 }, { "size": 20, "memory": 3293, "cpu": 835938 }, { "size": 21, "memory": 3293, "cpu": 836546 }, { "size": 22, "memory": 3293, "cpu": 837154 }, { "size": 23, "memory": 3293, "cpu": 837762 }, { "size": 24, "memory": 3293, "cpu": 838370 }, { "size": 25, "memory": 3293, "cpu": 838978 }, { "size": 26, "memory": 3293, "cpu": 839586 }, { "size": 27, "memory": 3293, "cpu": 840194 }, { "size": 28, "memory": 3293, "cpu": 840802 }, { "size": 29, "memory": 3293, "cpu": 841410 }, { "size": 30, "memory": 3293, "cpu": 842018 }, { "size": 31, "memory": 3293, "cpu": 842626 }, { "size": 32, "memory": 3293, "cpu": 843234 }, { "size": 33, "memory": 3293, "cpu": 843842 }, { "size": 34, "memory": 3293, "cpu": 844450 }, { "size": 35, "memory": 3293, "cpu": 845058 }, { "size": 36, "memory": 3293, "cpu": 845666 }, { "size": 37, "memory": 3293, "cpu": 846274 }, { "size": 38, "memory": 3293, "cpu": 846882 }, { "size": 39, "memory": 3293, "cpu": 847490 }, { "size": 40, "memory": 3293, "cpu": 848098 }, { "size": 41, "memory": 3293, "cpu": 848706 }, { "size": 42, "memory": 3293, "cpu": 849314 }, { "size": 43, "memory": 3293, "cpu": 849922 }, { "size": 44, "memory": 3293, "cpu": 850530 }, { "size": 45, "memory": 3293, "cpu": 851138 }, { "size": 46, "memory": 3293, "cpu": 851746 }, { "size": 47, "memory": 3293, "cpu": 852354 }, { "size": 48, "memory": 3293, "cpu": 852962 }, { "size": 49, "memory": 3293, "cpu": 853570 }, { "size": 50, "memory": 3293, "cpu": 854178 } ] } ], "seed": 3601959169 } ```
image image image

Error cases

In samplerimage
In benchimage

Changelog

  • 📍 remove unnecessary intermediate variables
    Introduced in some previous commits, so basically reverting that.

  • 📍 remove duplicate entry in CHANGELOG
    likely due to an wrong merge conflict resolution.

  • 📍 actually fail if a (seeded) sampler return None
    This is not supposed to happen, as only replayed sampler/fuzzer can stop.

  • 📍 minor aesthetic changes in test framework.

  • 📍 refactor and fix benchmark type-checking
    Fixes:

    • Do not allow bench with no arguments; this causes a compiler panic down the line otherwise.

    • Do not force the return value to be a boolean or void. We do not actually control what's returned by benchmark, so anything really works here.

    Refactor:

    • Re-use code between test and bench type-checking; especially the bits related to gathering information about the via arguments. There's quite a lot and simply copy-pasting everything will likely cause issues and discrepency at the first change.
  • 📍 Add additional test to check for Sampler alias formatting.

  • 📍 fixup aesthetics

  • 📍 more aesthetic changes.
    In particular, using a concrete enum instead of a string to avoid an unnecessary incomplete pattern-match, and remove superfluous comments.

  • 📍 fuse together bench & test runners, and collect all bench measures.
    This commit removes some duplication between bench and test runners, as well as fixing the results coming out of running benchmarks.

    Running benchmarks is expected to yield multiple measures, for each of the iteration. For now, it'll suffice to show results for each size; but eventually, we'll possibly try to interpolate results with different curves and pick the best candidate.

  • 📍 rework sizing of benchmarks, taking measures at different points
    The idea is to get a good sample of measures from running benchmarks with various sizes, so one can get an idea of how well a function performs at various sizes.

    Given that size can be made arbitrarily large, and that we currently report all benchmarks, I installed a fibonacci heuristic to gatherdata points from 0 to the max size using an increasing stepping.

    Defined as a trait as I already anticipate we might need different sizing strategy, likely driven by the user via a command-line option; but for now, this will do.

  • 📍 remove unnecessary intermediate variables
    Introduced in some previous commits, so basically reverting that.

  • 📍 rework benchmarks output
    Going for a terminal plot, for now, as this was the original idea and it is immediately visual. All benchmark points can also be obtained as JSON when redirecting the output, like for tests. So all-in-all, we provide a flexible output which should be useful. Whether it is the best we can do, time (and people/users) will tell.

  • 📍 fix benchmark output when either the sampler or bench fails
    This is likely even better than what was done for property testing. We shall revise that one perhaps one day.

  • 📍 Update CHANGELOG w.r.t benchmarks

KtorZ added 13 commits February 9, 2025 16:18
  Introduced in some previous commits, so basically reverting that.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  likely due to an wrong merge conflict resolution.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  This is not supposed to happen, as only replayed sampler/fuzzer can
  stop.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  Fixes:

  - Do not allow bench with no arguments; this causes a compiler panic
    down the line otherwise.

  - Do not force the return value to be a boolean or void. We do not
    actually control what's returned by benchmark, so anything really
    works here.

  Refactor:

  - Re-use code between test and bench type-checking; especially the
    bits related to gathering information about the via arguments.
    There's quite a lot and simply copy-pasting everything will likely
    cause issues and discrepency at the first change.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  In particular, using a concrete enum instead of a string to avoid an
  unnecessary incomplete pattern-match, and remove superfluous comments.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  This commit removes some duplication between bench and test runners,
  as well as fixing the results coming out of running benchmarks.

  Running benchmarks is expected to yield multiple measures, for each of
  the iteration. For now, it'll suffice to show results for each size;
  but eventually, we'll possibly try to interpolate results with
  different curves and pick the best candidate.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  The idea is to get a good sample of measures from running benchmarks
  with various sizes, so one can get an idea of how well a function
  performs at various sizes.

  Given that size can be made arbitrarily large, and that we currently
  report all benchmarks, I installed a fibonacci heuristic to gather
  data points from 0 to the max size using an increasing stepping.

  Defined as a trait as I already anticipate we might need different
  sizing strategy, likely driven by the user via a command-line option;
  but for now, this will do.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  Going for a terminal plot, for now, as this was the original idea and it is immediately visual. All benchmark points can also be obtained as JSON when redirecting the output, like for tests. So all-in-all, we provide a flexible output which should be useful. Whether it is the best we can do, time (and people/users) will tell.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
  This is likely even better than what was done for property testing. We
  shall revise that one perhaps one day.

Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
@KtorZ KtorZ force-pushed the benchmarks-wrapup branch from ebb29c0 to 451179f Compare February 9, 2025 15:28
KtorZ added 2 commits February 9, 2025 16:53
Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
Signed-off-by: KtorZ <5680256+KtorZ@users.noreply.github.com>
@KtorZ KtorZ marked this pull request as ready for review February 9, 2025 16:03
@KtorZ KtorZ requested a review from a team as a code owner February 9, 2025 16:03
@KtorZ KtorZ merged commit 94246bd into main Feb 9, 2025
13 checks passed
@KtorZ KtorZ deleted the benchmarks-wrapup branch February 9, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🚀 Released
Development

Successfully merging this pull request may close these issues.

1 participant