Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fbcsr kernels for Cuda and OpenMP #775

Merged
merged 48 commits into from
Oct 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
75f503c
added some cusparse block bindings
Slaedr Mar 31, 2021
e20fae1
fixed cusparse block bindings after rebase
Slaedr Mar 31, 2021
e072f50
added fixed-block matrix generation [test fails]
Slaedr Jan 18, 2021
8703917
fb matrix generation tested
Slaedr Jan 19, 2021
00a54f3
fixed to cusparse block bindings and cuda fbcsr kernels after rebase
Slaedr Mar 31, 2021
611e213
cusparse bindings of block operations now take the block sizes as inp…
Slaedr Jan 27, 2021
91a987b
added fbcsr transpose kernel and enabled sorting kernel for cuda
Slaedr Feb 2, 2021
4bbcc60
removed dependence of fb matrix generator on block factorization kernels
Slaedr Mar 31, 2021
a587c67
added function to generate random square Fbcsr matrix for testing pur…
Slaedr Mar 31, 2021
6bc6c5e
enabled block size 7 in Fbcsr sort reference kernel and transpose cud…
Slaedr Mar 31, 2021
db7cc15
fbcsr matrix generator is generalized for non-square matrices
Slaedr Mar 31, 2021
561d66e
added many openmp Fbcsr matrix kernels
Slaedr Mar 31, 2021
65e3814
several fixes to cusparse block wrappers
Slaedr Mar 31, 2021
a972876
fixed assertion issue in omp fbcsr extract diagonal
Slaedr Feb 15, 2021
f7bcf1e
cusparse spmv for fbcsr
Slaedr Feb 15, 2021
44e052b
Removed bsrmm
Slaedr Feb 15, 2021
5ea4723
now using accessor in fb matrix generator test
Slaedr Mar 31, 2021
2f753ef
omp fbcsr kernels now use col-major accessor
Slaedr Mar 31, 2021
cd11eb1
fixed bug in common block transpose
Slaedr Mar 4, 2021
7c18da4
addressed rename to block_col_major accessor in fbcsr omp kernels and…
Slaedr Mar 31, 2021
877481c
added inline keyword in fb matrix generator helper functions, reduced…
Slaedr Mar 31, 2021
5fe5a23
second attempt to fix MSVC build issue
Slaedr Mar 7, 2021
e9fa3db
added helper function to convert some numbers to std array of request…
Slaedr Mar 31, 2021
2d26a21
fixed omp fbcsr kernels for accessor rearrangement
Slaedr Mar 31, 2021
9eedddf
top-level routine for generating random Fbcsr matrices is now being t…
Slaedr Mar 31, 2021
05539bc
added documentation for fbcsr matrix generation functions
Slaedr Mar 31, 2021
b0072c3
fixed cusparse block bindings to use column-major blocks
Slaedr Mar 31, 2021
2a74117
fbcsr apply_impls now use precision_dispatch
Slaedr May 17, 2021
50d9207
added spmv and advanced spmv to omp fbcsr
Slaedr May 17, 2021
92d3680
enabled fbcsr cuda advanced apply and added all value types to cuda f…
Slaedr May 18, 2021
d83cc17
syn value_list is used for compiling block sizes
Slaedr May 18, 2021
5ec70fb
reused conj-transpose and max_nnz_per_row kernels from csr for fbcsr in
Slaedr May 25, 2021
60569fa
minor fix to omp sorting tests
Slaedr May 25, 2021
eff416a
removed dependence of fb_matrix_generator on reference kernels
Slaedr Jun 22, 2021
fd01a57
added missing omp test
Slaedr Jun 23, 2021
bfc9634
completed tests
Slaedr Jun 26, 2021
3113d0f
rebased and fixed issues
Slaedr Oct 4, 2021
037b4d4
Format files
ginkgo-bot Oct 4, 2021
854ca45
Review comments, improved and fixed a bug in tests
Slaedr Oct 4, 2021
4b57f0c
review comments
Slaedr Oct 5, 2021
0c30369
fixed pointer mode issue in cuda fbcsr advanced apply
Slaedr Oct 6, 2021
af36b2c
Review comments
Slaedr Oct 18, 2021
433fddd
added missing semicolons flagged by cuda 10.1 in generic kernel launc…
Slaedr Oct 18, 2021
88c064d
semicolon fix in hip common reduction kernels and minor whitespace fixes
Slaedr Oct 19, 2021
010279a
New algorithm for cuda block transpose, and multiple RHS are now supp…
Slaedr Oct 20, 2021
3d9312d
fixed linker issues by removing core calls from fbcsr cuda kernels
Slaedr Oct 20, 2021
079c52c
renamed block transpose functions
Slaedr Oct 22, 2021
3fa1b4e
fixed missing include
Slaedr Oct 22, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions common/cuda_hip/matrix/fbcsr_kernels.hpp.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/*******************************<GINKGO LICENSE>******************************
Copyright (c) 2017-2021, the Ginkgo authors
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
******************************<GINKGO LICENSE>*******************************/

namespace kernel {


template <int mat_blk_sz, int subwarp_size, typename ValueType,
typename IndexType>
__global__ __launch_bounds__(default_block_size) void transpose_blocks(
const IndexType nbnz, ValueType* const values)
{
const auto total_subwarp_count =
thread::get_subwarp_num_flat<subwarp_size, IndexType>();
const IndexType begin_blk =
thread::get_subwarp_id_flat<subwarp_size, IndexType>();

auto thread_block = group::this_thread_block();
auto subwarp_grp = group::tiled_partition<subwarp_size>(thread_block);
const int sw_threadidx = subwarp_grp.thread_rank();

constexpr int mat_blk_sz_2{mat_blk_sz * mat_blk_sz};
constexpr int num_entries_per_thread{(mat_blk_sz_2 - 1) / subwarp_size + 1};
ValueType orig_vals[num_entries_per_thread];

for (auto ibz = begin_blk; ibz < nbnz; ibz += total_subwarp_count) {
for (int i = sw_threadidx; i < mat_blk_sz_2; i += subwarp_size) {
orig_vals[i / subwarp_size] = values[ibz * mat_blk_sz_2 + i];
}
subwarp_grp.sync();

for (int i = 0; i < num_entries_per_thread; i++) {
const int orig_pos = i * subwarp_size + sw_threadidx;
if (orig_pos >= mat_blk_sz_2) {
break;
}
const int orig_row = orig_pos % mat_blk_sz;
const int orig_col = orig_pos / mat_blk_sz;
const int new_pos = orig_row * mat_blk_sz + orig_col;
values[ibz * mat_blk_sz_2 + new_pos] = orig_vals[i];
}
subwarp_grp.sync();
}
}


} // namespace kernel
64 changes: 64 additions & 0 deletions core/base/block_sizes.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/*******************************<GINKGO LICENSE>******************************
Copyright (c) 2017-2021, the Ginkgo authors
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
******************************<GINKGO LICENSE>*******************************/

#ifndef GKO_CORE_BASE_BLOCK_SIZES_HPP_
#define GKO_CORE_BASE_BLOCK_SIZES_HPP_


#include <ginkgo/config.hpp>
#include <ginkgo/core/synthesizer/containers.hpp>


namespace gko {
namespace fixedblock {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be under cuda/hip matrix/fb_csr there?
like jacobi generate stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used by all backends, so it cannot be cuda or hip. Further, any algorithm that uses static fixed-size blocks, like the ParBILU that I was working on, will also use this. So I decided to have a common fixedblock namespace for such common things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense.
Is ParBILU for blockCSR or different format?

Copy link
Contributor Author

@Slaedr Slaedr Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least initially, ParBILU will only be for Fbcsr.



/**
* @def GKO_FIXED_BLOCK_CUSTOM_SIZES
* Optionally-defined comma-separated list of fixed block sizes to compile.
*/
#ifdef GKO_FIXED_BLOCK_CUSTOM_SIZES
/**
* A compile-time list of block sizes for which dedicated fixed-block matrix
* and corresponding preconditioner kernels should be compiled.
*/
using compiled_kernels = syn::value_list<int, GKO_FIXED_BLOCK_CUSTOM_SIZES>;
#else
using compiled_kernels = syn::value_list<int, 2, 3, 4, 7>;
#endif


} // namespace fixedblock
} // namespace gko


#endif // GKO_CORE_BASE_BLOCK_SIZES_HPP_
14 changes: 14 additions & 0 deletions core/base/utils.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,20 @@ std::shared_ptr<const Dest> convert_to_with_sorting(
skip_sorting);
}

/**
* Converts the given arguments into an array of entries of the requested
* template type.
*
* @tparam T The requested type of entries in the output array.
*
* @param args Entities to be filled into an array after casting to type T.
*/
template <typename T, typename... Args>
constexpr std::array<T, sizeof...(Args)> to_std_array(Args&&... args)
{
return {static_cast<T>(args)...};
}


} // namespace gko

Expand Down
21 changes: 14 additions & 7 deletions core/matrix/fbcsr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#include <ginkgo/core/base/exception_helpers.hpp>
#include <ginkgo/core/base/executor.hpp>
#include <ginkgo/core/base/math.hpp>
#include <ginkgo/core/base/precision_dispatch.hpp>
#include <ginkgo/core/base/utils.hpp>
#include <ginkgo/core/matrix/dense.hpp>
#include <ginkgo/core/matrix/identity.hpp>
Expand Down Expand Up @@ -155,14 +156,17 @@ template <typename ValueType, typename IndexType>
void Fbcsr<ValueType, IndexType>::apply_impl(const LinOp* const b,
LinOp* const x) const
{
using Dense = Dense<ValueType>;
if (auto b_fbcsr = dynamic_cast<const Fbcsr<ValueType, IndexType>*>(b)) {
// if b is a FBCSR matrix, we need an SpGeMM
GKO_NOT_SUPPORTED(b_fbcsr);
} else {
Comment on lines 159 to 162
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precision_dispatch_real_complex also throw the error when input not dense, so this part is unncessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't want the spmv kernel to be called when b is an Fbcsr. Fbcsr is convertible to Dense, so I guess I need the check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think temporary_conversion is implemented by dynamic_cast. Maybe @upsj can correct me.
when all dynamic_cast<*dense>(fbcsr) are failed, it throws the error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we check against the exact type, not ConvertibleTo

// otherwise we assume that b is dense and compute a SpMV/SpMM
this->get_executor()->run(
fbcsr::make_spmv(this, as<Dense>(b), as<Dense>(x)));
precision_dispatch_real_complex<ValueType>(
[this](auto dense_b, auto dense_x) {
this->get_executor()->run(
fbcsr::make_spmv(this, dense_b, dense_x));
},
b, x);
}
}

Expand All @@ -173,7 +177,6 @@ void Fbcsr<ValueType, IndexType>::apply_impl(const LinOp* const alpha,
const LinOp* const beta,
LinOp* const x) const
{
using Dense = Dense<ValueType>;
if (auto b_fbcsr = dynamic_cast<const Fbcsr<ValueType, IndexType>*>(b)) {
// if b is a FBCSR matrix, we need an SpGeMM
GKO_NOT_SUPPORTED(b_fbcsr);
Expand All @@ -182,9 +185,13 @@ void Fbcsr<ValueType, IndexType>::apply_impl(const LinOp* const alpha,
GKO_NOT_SUPPORTED(b_ident);
} else {
// otherwise we assume that b is dense and compute a SpMV/SpMM
this->get_executor()->run(
fbcsr::make_advanced_spmv(as<Dense>(alpha), this, as<Dense>(b),
as<Dense>(beta), as<Dense>(x)));
precision_dispatch_real_complex<ValueType>(
[this](auto dense_alpha, auto dense_b, auto dense_beta,
auto dense_x) {
this->get_executor()->run(fbcsr::make_advanced_spmv(
dense_alpha, this, dense_b, dense_beta, dense_x));
},
alpha, b, beta, x);
}
}

Expand Down
5 changes: 4 additions & 1 deletion core/synthesizer/implementation_selection.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,10 @@ namespace syn {
_name(::gko::syn::value_list<int, Rest...>(), is_eligible, \
int_args, type_args, std::forward<InferredArgs>(args)...); \
} \
}
} \
static_assert(true, \
"This assert is used to counter the false positive extra " \
"semi-colon warnings")

#define GKO_ENABLE_IMPLEMENTATION_CONFIG_SELECTION(_name, _callable) \
template <typename Predicate, bool... BoolArgs, int... IntArgs, \
Expand Down
1 change: 1 addition & 0 deletions core/test/utils/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@ ginkgo_create_test(array_generator_test)
ginkgo_create_test(assertions_test)
ginkgo_create_test(matrix_generator_test)
ginkgo_create_test(matrix_utils_test)
ginkgo_create_test(fb_matrix_generator_test)
ginkgo_create_test(unsort_matrix_test)
ginkgo_create_test(value_generator_test)
Loading