-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelizing pcgsolve with openmp #1154
Comments
Actually a good start would be to re-run this with finer timers, for instance using Kokkos-tools simple-kernel-timer. You just need to build the tool and then set: |
I will start looking at this... |
@zLi90 What size problem were you running on to get these timings? How sparse is the matrix? What was the convergence tolerance? How many iterations were you running? And what size of problem do you ultimately want to be able to run? |
@jennloe We are running a 3D transient subsurface flow model. This test case has 40x24x32=30720 grid cells, resulting a 7-diagonal matrix. The tolerance is 1e-8 and it takes (more or less) 18 pcg iterations for each time step (we ran a total of 900 seconds with a time step dt=60s). We hope to be able to run large scale hydrological simulations that contain 1 million grid cells or more. Thanks! |
Hi @jennloe , just wondering if you have figured out something? Do you have any suggestions on what we could try next? Thanks! |
I'm sorry; I have not had more time to look at this, and I will not have time in the next few weeks. Thank you for your patience. @brian-kelley would you be able to look into this scaling? |
I like Luc's comment above. @zLi90 can you enable the finer timers please? |
@srajama1 @lucbv Thanks for the suggestion! I have tried the simple-kernel-timer but I am not sure how to decode the output file (or perhaps I did something wrong?) For example, I got an output file with the first few lines like this: ���ÄÜ7�@h���(���Kokkos::Impl::host_space_deepcopy_doublei����������� ∞Ç?_deepcop����O���/���Kokkos::View::initialization [Diag for Seq SOR]�������������#^?tion [Di����B���"���Kokkos::View::initialization [RHS]������������Ä?a?tion [RH����D���$���Kokkos::View::initialization [Vvoid]�������������@+?tion [Vv����E���%���Kokkos::View::initialization [cg::Ap]��������������a?tion [cg����D���$���Kokkos::View::initialization [cg::p]������������Ķ |
Ok I found the reader provided. This is useful indeed! Here are the top 5 time-consuming sections: with 1 thread:
with 2 threads:
It seems that the last two items (Axpby and dot) don't scale well (from 0.2s to 0.14s)? I believe they belong to the pcgsolve of KokkosKernels. |
@zLi90 Do you compile with any BLAS enabled? We have the implementation routines of in Kokkos Kernels. However, one should use vendor BLAS when available. What is the platform and compiler combination? |
@srajama1 I am using Mac+gcc. I just use the default settings when building kokkos kernels. Do you mean I should add -DKokkosKernels_ENABLE_TPL_BLAS=ON when building? |
We are using the provided pcg solver in kokkoskernels for our hydrological model. We found that when we activate openmp, the scaling does not look good.
We simply call
run_pcg<Kokkos::OpenMP>(domsub.nCellDomain, statesub, domsub, timers);
and inrun_pcg
, we call:KokkosKernels::Experimental::Example::pcgsolve(kh, matA, vecB, vecX, cg_iteration_limit, cg_iteration_tolerance, & cg_result, true, clusterSize, useSequential);
Kokkos::fence();
The
pcgsolve
takes about 90% of the computational time and it doesn't scale well when we add threads. But since it is provided by kokkoskernels, it is difficult for us to find out what might be the reason. We were wondering if there are additional steps we are missing that can enhance scaling performance of the pcg solver?We also want to ask if you have any recommendations on choosing an linear system solver (perhaps also a nonlinear solver) that is compatible with kokkos and has good scaling performance when using openmp and mpi?
Here I attached a figure that shows the speedup of our model (the blue dots), using the

pcgsolve
. Thank you so much in advance!The text was updated successfully, but these errors were encountered: