Skip to content

Performance

Paul Kuberry edited this page Mar 6, 2020 · 4 revisions

The Compadre toolkit takes advantage of on-node parallelism to efficiently solve many parallel quadratic programs, formulated using Generalized Moving Least Squares (GMLS).

At this point in time, we use LAPACK when solving these resulting linear system (via LU, QR, or SVD) on a CPU, and CUSOLVER + CUBLAS when solving on an Nvidia GPU.

Thread-safety with LAPACK:

On the CPU, when wrapping parallel_for's around a LAPACK routine, we have found that some compilations of LAPACK are not thread-safe and will corrupt the shared-memory of other problems. This can often be detected using the first test in the library, which checks for thread-safety. If your compilation is not thread-safe, but you still would like to use it, the additional CMake variable " -DLAPACK_DECLARED_THREADSAFE:BOOL=OFF " will cause a serial for loop to wrap LAPACK calls, heavily affecting performance, but resulting in the correct solution.

Alternatively, either Trilinos, or Kokkos Kernels can be installed against a different version of LAPACK which is thread safe. When configuring Kokkos Kernels, this means using the CMake variable " -DKokkosKernels_LAPACK_ROOT:STRING=/some/directory/containing/threadsafe/lapack "

Thread-spawning with LAPACK:

LAPACK routines will often spawn their own additional threads when solving a problem. Since the user has likely already specified the number of threads that they desire using "--kokkos-threads=N", the spawning of these additional threads effectively oversubscribes the system. One way to prevent this is to add an environment variable when calling executables using Compadre like so:

>> OMP_NUM_THREADS=1 ./some_executable_using_compadre --kokkos-threads=16