-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ginkgo common kernels by Kokkos #12
Conversation
a78d60d
to
344707d
Compare
--------- Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
39ff35a
to
af450d5
Compare
Preventing error messages of the form: error: An extended __host__ __device__ lambda cannot be defined inside a generic lambda expression("operator()").
My setup:
Current status: The most fundamental thing which doesn't work in the described setup is the runtime flexibility. If I compile the setup, the runtime check precice/src/mapping/device/GinkgoRBFKernels.cpp Lines 43 to 44 in 576dce9
always returns false, even if I use the reference executor. The test suite runs all available Ginkgo configurations. If I permute the indices for the column-major access, the reference executor tests pass and work nicely (they would then also work if the if branching with the dynamic casting would work). However, the cuda executor tests don't work, unfortunately. This puzzles me a little bit at the moment as the executed kernel (a) should use the correct if branch and (b) should be the same as the reference executor. I ran the following test
and printed out the rbf system matrix as well as the polynomial matrix (using the following code snippet) {
auto tmp = gko::clone(_hostExecutor, _rbfSystemMatrix);
std::cout << "System matrix" << std::endl;
for (Eigen::Index i = 0; i < tmp->get_size()[0]; ++i) {
for (Eigen::Index j = 0; j < tmp->get_size()[1]; ++j) {
std::cout << tmp->at(i, j) << " ";
}
std::cout << std::endl;
}
} Comparing the very first test this suite executes, the polynomial matrix seems to be correct. matrix Q
0 0 1
1 0 1
1 1 1
0 1 1
matrix V
0 0 1
System matrix
1 0.0277778 0.0277778 0.0277778
0.0277778 1 1 1
0.0277778 1 1 1
0.0277778 1 1 1
Output vertices
0
0
Evaluation matrix
0.0277778 0.0277778 0.0277778 0.0277778
~/precice/src/mapping/tests/RadialBasisFctHelper.hpp(88): error: in "MappingTests/GinkgoRadialBasisFunctionSolver/cuSolver/MapCompactPolynomialC0": check value == 1.0 has failed [0.87500000000000178 != 1]. Relative difference exceeds tolerance [0.142857 > 1e-09]
Failure occurred in a following context:
Test context represents "Serial" and runs on rank 0 out of 1. Initialized: { Events} whereas for the reference executor I get (with the permuted indices/ the 'correct' if branch) the correct matrices: matrix Q
0 0 1
1 0 1
1 1 1
0 1 1
matrix V
0 0 1
System matrix
1 0.0277778 0 0.0277778
0.0277778 1 0.0277778 0
0 0.0277778 1 0.0277778
0.0277778 0 0.0277778 1
Output vertices
0 0
Evaluation matrix
1 0.0277778 0 0.0277778 |
@davidscn I've looked a bit into the code and got it running on an AMD machine. Here are some notes to get it running with HIP:
Regarding the runtime check:
This will always be false, if Kokkos is build with a GPU device enabled. Since you are now using kokkos, maybe it would make sense to remove the runtime executor choice for the |
Thanks!
Yes, this was on the radar, we only recently updated our minimum CMake version to a compatible one.
You mean you compiled Ginkgo with GCC, whereas you compiled preCICE and Kokkos using amdclang? You are surprised that the overall setup is working? At least in your docs you write, that GCC is supported for Ginkgo.
Are you sure about that? Eigen is header-only and anyway compiled into preCICE, such that Eigen and preCICE are compatible by construction. Boost data structures are not interfacing with the device memory or similar. I could imagine that using a Boost compiled with another compiler could work. Having the same for Kokkos, Ginkgo and preCICE makes sense for me.
Hm I have to think a bit about it. Having the reference executor for testing and validation has definitely value. |
Yes, I've used different compilers for Ginkgo and preCICE+Kokkos. The issue is not using GCC for Ginkgo, but linking something that was compiled with GCC to something that was compiled with clang. In the past I've had some quite annoying linker errors because of that.
Recompiling boost (also with amdclang) was necessary, since it's linked against. Regarding eigen, I'm not sure if it was necessary, but since I had to reinstall boost, I just also did the same with eigen (I installed both through spack). |
I see, thanks for the clarification. Regarding
I also just realized, that you ported the PP variables indicating the supported backend already to the Kokkos setup. Running the Ginkgo reference would essentially require to compile Kokkos with CPU support only, right? There is no chance to have the reference executor for reference testing next to another executor?! |
So I've made both the reference executor and device executor work at the same time. This basically requires some more templates for the kernels, since now the kernels can be run either on the host or device. |
Thanks a lot, that's more than appreciated! I will integrate this next week. Do you have the HIP patch by chance available anyway? Otherwise, I would just copy-implement it from Ginkgo's main |
Sure, here is the cmake patch BTW, I noticed that the test doesn't succeed for the |
On issue with the hip QR solver I've found so far, is that there are unnecessary/wrong transposes of |
Ok, thanks again! I will check it, the tests should be correct and the transposing/column-major vs row-major is at the moment a bit messy. I will try to clean it up and then open a PR towards main preCICE. I will ping you then again and (potentially) ask for a review on the critical spots. |
Main changes of this PR
These changes are compatible with
https://github.com/ginkgo-project/ginkgo/tree/kokkos-extensionginkgo-project/ginkgo#1358 and a (sufficiently recent) Kokkos version, I used version 4.1.