-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rapids_test
allowing projects to run gpu tests in parallel
#328
Add rapids_test
allowing projects to run gpu tests in parallel
#328
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool feature. I am guessing we want this to parallelize RAPIDS tests in CI? One concern is that percentages may not be the most useful metric. Some tests might allocate 8 GB of memory regardless of the GPU size, meaning that the percentage is different on a 32 GB GPU than a 48 GB GPU.
testing/test/init-activate_multi_allocations_same_gpu/CMakeLists.txt
Outdated
Show resolved
Hide resolved
Yes, and other NVIDIA projects. This first step should allow some projects like Thrust to have correct parallel support now.
So this comes back to the missing part needed for RAPIDS. There is no hard requirement when configuring and building RAPIDS projects that you have an actual GPU on the machine. Therefore we shouldn't expect that the test harness code to start hard erroring when it executes in these enviornments. RAPIDS CI runs the configure && build steps without any GPU devices so we want to gracefully fallback to presuming This comes back to the missing component. What rapids-cmake needs to offer is some executable / process for CI structures like RAPIDS to re-detect the number of GPUs on the machine before I am currently trying to figure out an approach for this, but didn't want to hold up the initial PR |
I wanted to keep the usage as simple as possible. I am worried that we use Another issue with using |
d65f91c
to
aa41d6e
Compare
b22e3d1
to
1091830
Compare
a75c984
to
a892495
Compare
rapids_test
allowing projects to run gpu tests in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertmaynard I started a review a while ago (probably a month ago) and didn't click submit. I am submitting those comments now -- forgive me if the comments are outdated.
endif() | ||
|
||
# verify that percent is inside the allowed bounds | ||
if(percent GREATER 100 OR percent LESS 1 OR (NOT percent MATCHES "^[0-9]+$")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm fine with whatever way you want to resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
82997a9
to
7961f8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is pretty hefty. I skimmed it again (most code was familiar) and had only a few minor suggestions. I'll approve this and you can handle my suggestions as you wish.
Introduces rapids_test functionality to allow tests executed via ctest -j to properly resource share GPUs. This is done by having tests state how many GPUs allocations they require, and uses CTest internal job scheduler to properly load balance.
611f4c0
to
ff710b8
Compare
Squashed everything down to a single commit and removed all the unneeded changes thanks to #378 being merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really cool!
I have a general question about properties that came up a few times in this review: can you set_property
with any arbitrary property string, even if it doesn't already exist? Do you not need to define_property
first in those cases?
I haven't reviewed the tests yet, will do that in a subsequent pass.
This reverts commit 642abb3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! It would be nice if we could somehow template the testing files with preprocessor macros or something since there's so much duplication (50% of this PR seems like it's just the license headers) but other than that everything looks good.
Co-authored-by: Vyas Ramasubramani <vyas.ramasubramani@gmail.com>
e53dcc0
to
e6a8403
Compare
/merge |
Description
Introduces
rapids_test
functionality to allow tests executed viactest -j
to properly resource share GPUs.This is done by having tests state how many GPUs allocations they require, and uses CTest internal job scheduler to properly load balance.
Checklist
cmake-format.json
is up to date with these changes.include_guard(GLOBAL)
)