-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change markdown output in benchmark PR comments #2693
Conversation
could you provide links/images to see the difference before and after this PR? |
Compute Benchmarks level_zero run (with params: ): |
We can just run the benchmark to see. I just triggered it. |
Compute Benchmarks level_zero run (): |
Compute Benchmarks level_zero run (with params: --output-markdown): |
Compute Benchmarks level_zero run (--output-markdown): Summary(Emphasized values are the best results) Improved 10 (threshold 2.00%)
Regressed 21 (threshold 2.00%)
Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7): 99.763%
Relative perf in group (17): 99.671%
Relative perf in group SinKernelGraph (4): 100.088%
Relative perf in group SubmitGraph (3): 98.015%
Relative perf in group ExecGraph (3): 100.280%
Relative perf in group SubmitKernel CPU count (3): 100.000%
Velocity BenchRelative perf in group (8): 101.045%
SYCL-BenchRelative perf in group (54): cannot calculate
llama.cpp benchRelative perf in group (6): cannot calculate
UMFRelative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7): 98.691%
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7): 98.130%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7): 100.952%
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7): 99.236%
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 (4): 96.922%
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 (4): 102.096%
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 (7): 98.810%
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 (7): 99.405%
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 (4): 100.438%
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 (4): 99.735%
DetailsBenchmark details contain too many chars to display |
# Generate the row with the best value highlighted | ||
# Generate the row with all the results from saved runs specified by | ||
# --compare, | ||
# Highight the best value in the row with data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspell Highight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still a misspell ;d
2e8d039
to
09ce9ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grouping doesn't work for some of the benchmarks:
Relative perf in group (17): 99.671%
|
||
# If data is collected from already saved results, | ||
# the content is parsed as strings | ||
if isinstance(res.env, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this improve the existing way of printing env vars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My OCD couldn't stand empty Environment variables
sections, if you are asking about the introduced if
s.
If you are asking about ast.literal_eval
, I have added it when using results which have been not calculated during the script runs, but have been already saved. This function enables us to access the elements of the dictionary with environmental variables, which originally is parsed from json to string. Maybe we could change something about Benchmark.from_json()
instead.
|
||
|
||
def get_relative_perf_summary(group_size: int, diffs_product: int, | ||
root_for_geometric_mean: int, group_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geomean is complicated to calculate. I'd just replace the : xx % change with (improved X, regressed Y)
These benchmarks don't have explicit group assigned (example without assigned, example with assigned, the default explicit_group is empty string. I'm going to change the default group name to "Ungrouped". |
09ce9ac
to
3b3e942
Compare
Compute Benchmarks level_zero run (with params: --output-markdown): |
Compute Benchmarks level_zero run (--output-markdown): Summary(Emphasized values are the best results) Improved 23 (threshold 2.00%)
Regressed 27 (threshold 2.00%)
Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7)
Relative perf in group (17)
Relative perf in group SinKernelGraph (4)
Relative perf in group SubmitGraph (3)
Relative perf in group ExecGraph (3)
Relative perf in group SubmitKernel CPU count (3)
Velocity BenchRelative perf in group (8)
SYCL-BenchRelative perf in group (54)
llama.cpp benchRelative perf in group (6)
UMFRelative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 (4)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 (4)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 (7)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 (7)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 (4)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 (4)
DetailsBenchmark details contain too many chars to display |
3b3e942
to
37a046c
Compare
Compute Benchmarks level_zero run (with params: --output-markdown): |
Compute Benchmarks level_zero run (--output-markdown): Summary(Emphasized values are the best results) Improved 16 (threshold 2.00%)
Regressed 17 (threshold 2.00%)
Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7)
Relative perf in group (17)
Relative perf in group SinKernelGraph (4)
Relative perf in group SubmitGraph (3)
Relative perf in group ExecGraph (3)
Relative perf in group SubmitKernel CPU count (3)
Velocity BenchRelative perf in group Ungrouped (8)
SYCL-BenchRelative perf in group Ungrouped (53)
llama.cpp benchRelative perf in group Ungrouped (6)
UMFRelative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 (4)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 (4)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 (7)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 (7)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 (4)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 (4)
DetailsBenchmark details contain too many chars to display |
scripts/benchmarks/benches/result.py
Outdated
@@ -18,7 +18,7 @@ class Result: | |||
stdout: str | |||
passed: bool = True | |||
unit: str = "" | |||
explicit_group: str = "" | |||
explicit_group: str = "Ungrouped" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The html output interprets anything other than "" as a group (see https://github.com/oneapi-src/unified-runtime/blob/main/scripts/benchmarks/output_html.py#L117). And every explicit group is shown together on a bar chart.
So this needs to stay as "".
My suggestion is to use "Others" in the markdown output when "" is specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just a couple of nits...
scripts/benchmarks/main.py
Outdated
parser.add_argument("--dry-run", help='Do not run any actual benchmarks', action="store_true", default=False) | ||
parser.add_argument("--compute-runtime", nargs='?', const=options.compute_runtime_tag, help="Fetch and build compute runtime") | ||
parser.add_argument("--iterations-stddev", type=int, help="Max number of iterations of the loop calculating stddev after completed benchmark runs", default=options.iterations_stddev) | ||
parser.add_argument("--build-igc", help="Build IGC from source instead of using the OS-installed version", action="store_true", default=options.build_igc) | ||
parser.add_argument("--relative-perf", type=str, help="The name of the results which should be used as a baseline for metrics calculation", default=options.current_run_name) | ||
parser.add_argument("--new-base-name", help="New name of the default baseline to compare", type=str, default='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if we need this let's just remove the default compare. E.g., with nothing is specified, we don't compare at all.
This will eliminate the need for this option.
(x.diff is not None, x.diff), reverse=True) | ||
|
||
# Geometric mean calculation | ||
product = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to be unused?
37a046c
to
7f8ea38
Compare
Compute Benchmarks level_zero run (with params: --output-markdown): |
Compute Benchmarks level_zero run (--output-markdown): Summary(Emphasized values are the best results) Improved 20 (threshold 2.00%)
Regressed 14 (threshold 2.00%)
Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7)
Relative perf in group Ungrouped (17)
Relative perf in group SinKernelGraph (4)
Relative perf in group SubmitGraph (3)
Relative perf in group ExecGraph (3)
Relative perf in group SubmitKernel CPU count (3)
Velocity BenchRelative perf in group Ungrouped (8)
SYCL-BenchRelative perf in group Ungrouped (54)
llama.cpp benchRelative perf in group Ungrouped (6)
UMFRelative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 (4)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 (4)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 (7)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 (7)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 (4)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 (4)
DetailsBenchmark details contain too many chars to display |
7f8ea38
to
46a6bac
Compare
Compute Benchmarks level_zero run (with params: --output-markdown): |
Compute Benchmarks level_zero run (--output-markdown): Summary(Emphasized values are the best results) Improved 21 (threshold 2.00%)
Regressed 11 (threshold 2.00%)
Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7)
Relative perf in group Other (17)
Relative perf in group SinKernelGraph (4)
Relative perf in group SubmitGraph (3)
Relative perf in group ExecGraph (3)
Relative perf in group SubmitKernel CPU count (3)
Velocity BenchRelative perf in group Other (8)
SYCL-BenchRelative perf in group Other (54)
llama.cpp benchRelative perf in group Other (6)
UMFRelative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 (4)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 (4)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 (7)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 (7)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 (4)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 (4)
DetailsBenchmark details contain too many chars to display |
36ff0a5
to
7028a34
Compare
7028a34
to
bbd9097
Compare
scripts/benchmarks/README.md
Outdated
@@ -37,11 +37,16 @@ By default, the benchmark results are not stored. To store them, use the option | |||
|
|||
To compare a benchmark run with a previously stored result, use the option `--compare <name>`. You can compare with more than one result. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above there's a sentence By default, all benchmark runs are compared against
baseline, which is a well-established set of the latest data.
which should be now gone, I believe.
scripts/benchmarks/README.md
Outdated
## Output formats | ||
You can display the results in the form of a HTML file by using `--ouptut-html` and a markdown file by using `--output-markdown`. Due to character limits for posting PR comments, the final content of the markdown file might be reduced. In order to obtain the full markdown output, use `--output-markdown full`. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one redundant empty line
# Generate the row with the best value highlighted | ||
# Generate the row with all the results from saved runs specified by | ||
# --compare, | ||
# Highight the best value in the row with data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still a misspell ;d
add an option for limiting markdown content size calculate relative performance with different baselines calculate relative performance using only already saved data group results according to suite names and explicit groups add multiple data columns if multiple --compare specified
bbd9097
to
73a774b
Compare
Compute Benchmarks level_zero run (with params: --output-markdown): |
Compute Benchmarks level_zero run (--output-markdown): Summary(Emphasized values are the best results) Performance change in benchmark groupsCompute BenchmarksRelative perf in group SubmitKernel (7)
Relative perf in group Other (17)
Relative perf in group SinKernelGraph (4)
Relative perf in group SubmitGraph (3)
Relative perf in group ExecGraph (3)
Relative perf in group SubmitKernel CPU count (3)
Velocity BenchRelative perf in group Other (8)
SYCL-BenchRelative perf in group Other (53)
llama.cpp benchRelative perf in group Other (6)
UMFRelative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (7)
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (7)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 (4)
Relative perf in group alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 (4)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 (7)
Relative perf in group multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 (7)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 (4)
Relative perf in group multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 (4)
DetailsBenchmark details contain too many chars to display |
🥺 add an option for limiting markdown content size
🥺 calculate relative performance with different baselines
🥺 calculate relative performance using only already saved data
🥺 group results according to suite names and explicit groups
🥺 add multiple data columns if multiple --compare specified
An example of the previous output design