Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BsrMatrix SpMV specializations incorrectly defined for cuSPARSE TPL #1417

Closed
cwpearson opened this issue May 20, 2022 · 2 comments
Closed

BsrMatrix SpMV specializations incorrectly defined for cuSPARSE TPL #1417

cwpearson opened this issue May 20, 2022 · 2 comments

Comments

@cwpearson
Copy link
Contributor

The SPMV_MV_BSRMATRIX struct template has three defaulted parameters at the end: integerScalarType, tpl_spec_avail, and eti_spec_avail

// declaration
template <class AT, class AO, class AD, class AM, class AS, class XT, class XL,
class XD, class XM, class YT, class YL, class YD, class YM,
const bool integerScalarType =
std::is_integral<typename std::decay<AT>::type>::value,
bool tpl_spec_avail = spmv_mv_bsrmatrix_tpl_spec_avail<
AT, AO, AD, AM, AS, XT, XL, XD, XM, YT, YL, YD, YM>::value,
bool eti_spec_avail = spmv_mv_bsrmatrix_eti_spec_avail<
AT, AO, AD, AM, AS, XT, XL, XD, XM, YT, YL, YD, YM>::value>
struct SPMV_MV_BSRMATRIX {
typedef BsrMatrix<AT, AO, AD, AM, AS> AMatrix;
typedef Kokkos::View<XT, XL, XD, XM> XVector;
typedef Kokkos::View<YT, YL, YD, YM> YVector;
typedef typename YVector::non_const_value_type YScalar;
static void spmv_mv_bsrmatrix(
const KokkosKernels::Experimental::Controls &controls, const char mode[],
const YScalar &alpha, const AMatrix &A, const XVector &x,
const YScalar &beta, const YVector &y);
};

In the SpMV interface, there is some logic that is intended to control whether to use the "native" KK implementation or a TPL:

if (useFallback) {
// Explicitly call the non-TPL SPMV_BSRMATRIX implementation
std::string label =
"KokkosSparse::spmv[NATIVE,BSMATRIX," +
Kokkos::ArithTraits<
typename AMatrix_Internal::non_const_value_type>::name() +
"]";
Kokkos::Profiling::pushRegion(label);
Experimental::Impl::SPMV_MV_BSRMATRIX<
typename AMatrix_Internal::const_value_type,
typename AMatrix_Internal::const_ordinal_type,
typename AMatrix_Internal::device_type,
typename AMatrix_Internal::memory_traits,
typename AMatrix_Internal::const_size_type,
typename XVector_Internal::const_value_type**,
typename XVector_Internal::array_layout,
typename XVector_Internal::device_type,
typename XVector_Internal::memory_traits,
typename YVector_Internal::value_type**,
typename YVector_Internal::array_layout,
typename YVector_Internal::device_type,
typename YVector_Internal::memory_traits,
false>::spmv_mv_bsrmatrix(controls, mode, alpha, A_i, x_i, beta, y_i);
Kokkos::Profiling::popRegion();
} else {
Experimental::Impl::SPMV_MV_BSRMATRIX<
typename AMatrix_Internal::const_value_type,
typename AMatrix_Internal::const_ordinal_type,
typename AMatrix_Internal::device_type,
typename AMatrix_Internal::memory_traits,
typename AMatrix_Internal::const_size_type,
typename XVector_Internal::const_value_type**,
typename XVector_Internal::array_layout,
typename XVector_Internal::device_type,
typename XVector_Internal::memory_traits,
typename YVector_Internal::value_type**,
typename YVector_Internal::array_layout,
typename YVector_Internal::device_type,
typename YVector_Internal::memory_traits>::spmv_mv_bsrmatrix(controls,
mode,
alpha, A_i,
x_i, beta,
y_i);
}

It does this by setting the first of those defaulted parameters to false when it wants to use the native version.
This is consistent with other spec structs, e.g. SPMV_BSRMATRIX, where tpl_spec_available is the first defaulted parameter. However SPMV_BSR_MATRIX has integerScalarType as the first parameter, so really that logic is not selecting the non-TPL implementation, but the non-integer implementation here:

struct SPMV_MV_BSRMATRIX<AT, AO, AD, AM, AS, XT, XL, XD, XM, YT, YL, YD, YM,

The logic to select the TPL or not, needs to be changed to something like this:

  if (useFallback) {
    // Explicitly call the non-TPL SPMV_BSRMATRIX implementation
    std::string label =
        "KokkosSparse::spmv[NATIVE,BSMATRIX," +
        Kokkos::ArithTraits<
            typename AMatrix_Internal::non_const_value_type>::name() +
        "]";
    Kokkos::Profiling::pushRegion(label);
    Experimental::Impl::SPMV_MV_BSRMATRIX<
        typename AMatrix_Internal::const_value_type,
        typename AMatrix_Internal::const_ordinal_type,
        typename AMatrix_Internal::device_type,
        typename AMatrix_Internal::memory_traits,
        typename AMatrix_Internal::const_size_type,
        typename XVector_Internal::const_value_type**,
        typename XVector_Internal::array_layout,
        typename XVector_Internal::device_type,
        typename XVector_Internal::memory_traits,
        typename YVector_Internal::value_type**,
        typename YVector_Internal::array_layout,
        typename YVector_Internal::device_type,
        typename YVector_Internal::memory_traits,
        std::is_integral<typename AMatrix_Internal::const_value_type>::value,
        false>::spmv_mv_bsrmatrix(controls, mode, alpha, A_i, x_i, beta, y_i);
    Kokkos::Profiling::popRegion();
  } else {
    std::cerr << __FILE__ << ":" << __LINE__ << ": not native.\n";
    Experimental::Impl::SPMV_MV_BSRMATRIX<
        typename AMatrix_Internal::const_value_type,
        typename AMatrix_Internal::const_ordinal_type,
        typename AMatrix_Internal::device_type,
        typename AMatrix_Internal::memory_traits,
        typename AMatrix_Internal::const_size_type,
        typename XVector_Internal::const_value_type**,
        typename XVector_Internal::array_layout,
        typename XVector_Internal::device_type,
        typename XVector_Internal::memory_traits,
        typename YVector_Internal::value_type**,
        typename YVector_Internal::array_layout,
        typename YVector_Internal::device_type,
        typename YVector_Internal::memory_traits,
        std::is_integral<typename AMatrix_Internal::const_value_type>::value
        >::spmv_mv_bsrmatrix(controls,
                                                                     mode,
                                                                     alpha, A_i,
                                                                     x_i, beta,
                                                                     y_i);
  }

(note the inclusion of std::is_integral for the appropriate template parameter)

Furthermore, the CUSPARSE TPL is only marked as available when integerScalarType is true:

Kokkos::MemoryTraits<Kokkos::Unmanaged>, true> { \

This is entirely inconsistent with the actual implementation, which wants floats or doubles:

/* perform the actual SpMV operation */
if ((std::is_same<int, offset_type>::value) &&
(std::is_same<int, entry_type>::value)) {
if (std::is_same<value_type, float>::value) {
KOKKOS_CUSPARSE_SAFE_CALL(cusparseSbsrmm(
cusparseHandle, dirA, myCusparseOperation,
CUSPARSE_OPERATION_NON_TRANSPOSE, A.numRows(), colx, A.numCols(),
A.nnz(), reinterpret_cast<float const*>(&alpha), descrA,
reinterpret_cast<float const*>(A.values.data()),
A.graph.row_map.data(), A.graph.entries.data(), A.blockDim(),
reinterpret_cast<float const*>(x.data()), ldx,
reinterpret_cast<float const*>(&beta),
reinterpret_cast<float*>(y.data()), ldy));
} else if (std::is_same<value_type, double>::value) {
KOKKOS_CUSPARSE_SAFE_CALL(cusparseDbsrmm(
cusparseHandle, dirA, myCusparseOperation,
CUSPARSE_OPERATION_NON_TRANSPOSE, A.numRows(), colx, A.numCols(),
A.nnz(), reinterpret_cast<double const*>(&alpha), descrA,
reinterpret_cast<double const*>(A.values.data()),
A.graph.row_map.data(), A.graph.entries.data(), A.blockDim(),
reinterpret_cast<double const*>(x.data()), ldx,
reinterpret_cast<double const*>(&beta),
reinterpret_cast<double*>(y.data()), ldy));
} else if (std::is_same<value_type, Kokkos::complex<float>>::value) {
KOKKOS_CUSPARSE_SAFE_CALL(cusparseCbsrmm(
cusparseHandle, dirA, myCusparseOperation,
CUSPARSE_OPERATION_NON_TRANSPOSE, A.numRows(), colx, A.numCols(),
A.nnz(), reinterpret_cast<cuComplex const*>(&alpha), descrA,
reinterpret_cast<cuComplex const*>(A.values.data()),
A.graph.row_map.data(), A.graph.entries.data(), A.blockDim(),
reinterpret_cast<cuComplex const*>(x.data()), ldx,
reinterpret_cast<cuComplex const*>(&beta),
reinterpret_cast<cuComplex*>(y.data()), ldy));
} else if (std::is_same<value_type, Kokkos::complex<double>>::value) {
KOKKOS_CUSPARSE_SAFE_CALL(cusparseZbsrmm(
cusparseHandle, dirA, myCusparseOperation,
CUSPARSE_OPERATION_NON_TRANSPOSE, A.numRows(), colx, A.numCols(),
A.nnz(), reinterpret_cast<cuDoubleComplex const*>(&alpha), descrA,
reinterpret_cast<cuDoubleComplex const*>(A.values.data()),
A.graph.row_map.data(), A.graph.entries.data(), A.blockDim(),
reinterpret_cast<cuDoubleComplex const*>(x.data()), ldx,
reinterpret_cast<cuDoubleComplex const*>(&beta),
reinterpret_cast<cuDoubleComplex*>(y.data()), ldy));
} else {

So, the avail macro also needs to be changed:

#define KOKKOSSPARSE_SPMV_MV_BSRMATRIX_TPL_SPEC_AVAIL_CUSPARSE(                \
    SCALAR, ORDINAL, OFFSET, XL, YL, MEMSPACE)                                 \
  template <>                                                                  \
  struct spmv_mv_bsrmatrix_tpl_spec_avail<                                     \
      const SCALAR, const ORDINAL, Kokkos::Device<Kokkos::Cuda, MEMSPACE>,     \
      Kokkos::MemoryTraits<Kokkos::Unmanaged>, const OFFSET, const SCALAR*,    \
      XL, Kokkos::Device<Kokkos::Cuda, MEMSPACE>,                              \
      Kokkos::MemoryTraits<Kokkos::Unmanaged | Kokkos::RandomAccess>, SCALAR*, \
      YL, Kokkos::Device<Kokkos::Cuda, MEMSPACE>,                              \
      Kokkos::MemoryTraits<Kokkos::Unmanaged>, false> {                         \
    enum : bool { value = true };                                              \
  };

note false at the end of the template params instead of true, indicating the TPL is only available for non-integer scalar types

Furthermore, the multivector view types will be const SCALAR ** and SCALAR **, so actually the macro definition should be something like

#define KOKKOSSPARSE_SPMV_MV_BSRMATRIX_TPL_SPEC_AVAIL_CUSPARSE(                \
    SCALAR, ORDINAL, OFFSET, XL, YL, MEMSPACE)                                 \
  template <>                                                                  \
  struct spmv_mv_bsrmatrix_tpl_spec_avail<                                     \
      const SCALAR, const ORDINAL, Kokkos::Device<Kokkos::Cuda, MEMSPACE>,     \
      Kokkos::MemoryTraits<Kokkos::Unmanaged>, const OFFSET, const SCALAR**,    \
      XL, Kokkos::Device<Kokkos::Cuda, MEMSPACE>,                              \
      Kokkos::MemoryTraits<Kokkos::Unmanaged | Kokkos::RandomAccess>, SCALAR**, \
      YL, Kokkos::Device<Kokkos::Cuda, MEMSPACE>,                              \
      Kokkos::MemoryTraits<Kokkos::Unmanaged>, false> {                         \
    enum : bool { value = true };                                              \
  };

Finally, the cuSparse specialization should exist for non-integer scalar types, instead of integer as defined here:

Kokkos::MemoryTraits<Kokkos::Unmanaged>, true, true, COMPILE_LIBRARY> { \

So it should be

#define KOKKOSSPARSE_SPMV_MV_CUSPARSE(SCALAR, ORDINAL, OFFSET, LAYOUT, SPACE,  \
                                      COMPILE_LIBRARY)                         \
  template <>                                                                  \
  struct SPMV_MV_BSRMATRIX<                                                    \
      SCALAR const, ORDINAL const, Kokkos::Device<Kokkos::Cuda, SPACE>,        \
      Kokkos::MemoryTraits<Kokkos::Unmanaged>, OFFSET const, SCALAR const**,   \
      LAYOUT, Kokkos::Device<Kokkos::Cuda, SPACE>,                             \
      Kokkos::MemoryTraits<Kokkos::Unmanaged | Kokkos::RandomAccess>,          \
      SCALAR**, LAYOUT, Kokkos::Device<Kokkos::Cuda, SPACE>,                   \
      Kokkos::MemoryTraits<Kokkos::Unmanaged>, false, true, COMPILE_LIBRARY> { \

note false, true, COMPILE_LIBRARY instead of true, true, COMPILE_LIBRARY

Also need to change KokkosSparse_spmv_bsrmatrix_spec.hpp eti avail macro to use cuSparse even if an ETI is available:

KOKKOSSPARSE_SPMV_MV_CUSPARSE(double, int, int, Kokkos::LayoutLeft,
                              Kokkos::CudaSpace,
                              true)
KOKKOSSPARSE_SPMV_MV_CUSPARSE(double, int, int, Kokkos::LayoutLeft,
                              Kokkos::CudaSpace,
                              false)

true and false instead of KOKKOSKERNELS_IMPL_COMPILE_LIBRARY

Now, the TPL selection logic actually discriminates using TPL availability, AND there TPL available is correctly indicated for SPMV_MV_BSRMATRIX for cuSparse, AND the TPL specialization actually exists properly for cuSparse. Prior to these changes the cuSparse code is compiled but always unreachable.

@cwpearson
Copy link
Contributor Author

@lucbv @uhetmaniuk

@cwpearson
Copy link
Contributor Author

Fixed in #1418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant