-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve benchmark setup #1323
Improve benchmark setup #1323
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several empty files for distribution tests
From cuda, there are two part of profiling: nsight system and nsight compute. (not sure aboth the other vendor)
I do not know the reason for splitting, but nsight system is for tracing (timeline) and nsight compute is for profiling kernels.
For tracing, annotations on the repetition should give a better timeline overview after the warmup.
For profiling kernels, the cuProfilerStart and cuProfilerStop or filter by annotation should help here.
It's also based on that we have some workspace avoiding reallocation and skipping some operations in the second run.
Only profiling the first run may lead us always see the reallocation overhead.
The distributed benchmarks are tested now as well. The -profile flag is meant as a shortcut, if users are interested in the difference between hot and cold calls, they can still see the individual |
a2f11db
to
715f4ab
Compare
I feel like this PR mixes quite a few things, which could stand as their own PR. I think splitting this up into 3 PRs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. currently the pull request has some content from the other pull request and is not rebased
@@ -272,7 +272,7 @@ if(GINKGO_BUILD_TESTS) | |||
endif() | |||
if(GINKGO_BUILD_BENCHMARKS) | |||
find_package(gflags 2.2.2 QUIET) | |||
find_package(RapidJSON 1.1.0 QUIET) | |||
find_package(nlohmann_json 3.9.1 QUIET) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason for picking 3.9.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I needed this particular minimum version to provide ordered_json
support
value.Accept(writer); | ||
return os; | ||
} | ||
using json = nlohmann::ordered_json; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it mainly for testing purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RapidJSON has the same property, I wanted to preserve it, since it also makes the output more stable.
6ab159f
to
22e12e9
Compare
9cdc64a
to
8b07b48
Compare
Codecov ReportPatch has no changes to coverable lines. 📢 Thoughts on this report? Let us know!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few minor comments left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but don't forget to update the CI.
This reverts commit 0dab762. Additionally replaces the JSON test case output by their description
they are sometimes implementation-dependent for libstdc++ types
- rename 'determinize' -> 'sanitize' - use empty struct for empty benchmark state - use version tag instead of commit ID - use std::endl where appropriate Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
- remove unnecessary stdin in tests - simplify validate_config - consistently use pointer members instead of reference members Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
1335b29
to
a725d3c
Compare
add_benchmark_test(multi_vector_distributed) | ||
add_benchmark_test(spmv_distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it had the issue from unstable output from MPI. Is it solved now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the instability came from the fact that multiple ranks were printing output. This is now fixed thanks to the do_print
variables that are set everywhere.
[ | ||
{ | ||
"size": 125, | ||
"size": 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size meaning is changed now? from matrix size to stecil point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, before the matrix dimensions were written into the size
field, now they are being written into rows
and cols
to avoid overwriting the input size specified for the stencil.
DEBUG: begin components::aos_to_soa | ||
DEBUG: end components::aos_to_soa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aos_to_soa
-> fill_array
+ copy
+ convert_idxs_to_ptrs
any idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has to do with changing the benchmark from using matrix_data
to using device_matrix_data
, so the AOS-SOA conversion only happens once.
- don't install nlohmann-json - simplify code - improve config description formatting Co-authored-by: Yuhsiang M. Tsai <yhmtsai@gmail.com>
17389b0
to
7b482dc
Compare
Kudos, SonarCloud Quality Gate passed!
|
I needed to find something lightweight to do to get my mind off the Cholesky issues, so here we are 😄