-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add extract_diagonal for all matrix formats #563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I mainly have a few nits and a request: For robustness, could you test with dense stride > 1 instead? This way, we can see that all the stride computations are correct.
Also for your CUDA/HIP code, you always write if (idx < ...) { ... }
. It can be simpler to use if (idx >= ...) { return; }
instead since this doesn't need indentation and makes sure that the kernel exits and no code is executed afterwards.
common/matrix/csr_kernels.hpp.inc
Outdated
const auto orig_idx = i + orig_row_ptrs[row]; | ||
if (orig_idx < nnz) { | ||
if (orig_col_idxs[orig_idx] == row) { | ||
diag[row * diag_stride] = orig_values[orig_idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can return early here
diag[row * diag_stride] = orig_values[orig_idx]; | |
diag[row * diag_stride] = orig_values[orig_idx]; | |
return; |
(same with all similar kernels)
common/matrix/hybrid_kernels.hpp.inc
Outdated
template <typename ValueType, typename IndexType> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just use the COO/ELL kernels here?
common/matrix/sellp_kernels.hpp.inc
Outdated
const auto tid_in_warp = warp_tile.thread_rank(); | ||
|
||
for (size_type sellp_ind = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
for (size_type sellp_ind = | |
for (auto sellp_ind = |
Maybe you could also extract origin_slice_sets[slice_id(+1)] or the complete loop bounds into variables to simplify the loop header.
if (global_row < diag_size) { | ||
if (orig_col_idxs[sellp_ind] == global_row && | ||
orig_values[sellp_ind] != zero<ValueType>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency: Maybe compare to zero in both ELL/SELLP? Or leave the comparison out in both?
cuda/test/matrix/csr_kernels.cpp
Outdated
@@ -664,4 +664,21 @@ TEST_F(Csr, SortUnsortedMatrixIsEquivalentToRef) | |||
} | |||
|
|||
|
|||
TEST_F(Csr, ExtractDiagonalIsquivalentToRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEST_F(Csr, ExtractDiagonalIsquivalentToRef) | |
TEST_F(Csr, ExtractDiagonalIsEquivalentToRef) |
include/ginkgo/core/matrix/ell.hpp
Outdated
* | ||
* @param diag the vector into which the diagonal will be written | ||
*/ | ||
void extract_diagonal(Dense<value_type> *diag) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void extract_diagonal(Dense<value_type> *diag) const; | |
void extract_diagonal(Dense<ValueType> *diag) const; |
include/ginkgo/core/matrix/sellp.hpp
Outdated
value_type val_at(size_type row, size_type slice_set, | ||
size_type idx) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, clang-format did this. Should I leave it or change it back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also use clang-format 8?
omp/matrix/sellp_kernels.cpp
Outdated
#pragma omp parallel for | ||
for (size_type slice = 0; slice < slice_num; slice++) { | ||
for (size_type row = 0; | ||
row < slice_size && slice_size * slice + row < diag_size; row++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a rather complicated loop condition. Can it be simplified, maybe?
omp/test/matrix/coo_kernels.cpp
Outdated
@@ -222,4 +222,21 @@ TEST_F(Coo, ConvertToCsrIsEquivalentToRef) | |||
} | |||
|
|||
|
|||
TEST_F(Coo, ExtractDiagonalIsquivalentToRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEST_F(Coo, ExtractDiagonalIsquivalentToRef) | |
TEST_F(Coo, ExtractDiagonalIsEquivalentToRef) |
reference/matrix/sellp_kernels.cpp
Outdated
#include <iostream> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <iostream> |
66c753d
to
dd9e558
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts on the Mixin
3a31aa5
to
e0608a6
Compare
cuda/matrix/hybrid_kernels.cu
Outdated
|
||
|
||
#include "common/matrix/coo_kernels.hpp.inc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
include/ginkgo/core/base/lin_op.hpp
Outdated
@@ -51,6 +51,11 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |||
|
|||
namespace gko { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include/ginkgo/core/base/lin_op.hpp
Outdated
namespace matrix { | ||
template <typename ValueType> | ||
class Dense; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace matrix { | |
template <typename ValueType> | |
class Dense; | |
} | |
namespace matrix { | |
template <typename ValueType> | |
class Dense; | |
} |
GKO_ASSERT_MTX_NEAR(diag, | ||
l({{1.}, {2.}, {1.2}}), r<TypeParam>::value); | ||
// clang-format on | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you add a non-square matrix test
8b308b1
to
dd5d432
Compare
Codecov Report
@@ Coverage Diff @@
## develop #563 +/- ##
===========================================
+ Coverage 92.86% 92.88% +0.02%
===========================================
Files 303 303
Lines 21115 21331 +216
===========================================
+ Hits 19608 19814 +206
- Misses 1507 1517 +10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
common/matrix/dense_kernels.hpp.inc
Outdated
size_type problem_size, const ValueType *__restrict__ orig, | ||
size_type stride_orig, ValueType *__restrict__ diag, size_type stride_diag) | ||
{ | ||
constexpr auto warps_per_block = default_block_size / config::warp_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warps_per_block
is unused
common/matrix/sellp_kernels.hpp.inc
Outdated
if (orig_col_idxs[sellp_ind] == global_row && | ||
orig_values[sellp_ind] != zero<ValueType>()) { | ||
diag[global_row * diag_stride] = orig_values[sellp_ind]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it can also return earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that because the slices are stored column-major and there is one warp per slice, it can't return early here. Depending on the slice size, it can happen that one thread finds multiple diagonal entries.
cuda/matrix/hybrid_kernels.cu
Outdated
@@ -64,6 +64,7 @@ namespace hybrid { | |||
|
|||
constexpr int default_block_size = 512; | |||
constexpr int warps_in_block = 4; | |||
constexpr int spmv_block_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not needed, right?
302beb8
to
9a54459
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
include/ginkgo/core/matrix/dense.hpp
Outdated
@@ -262,6 +265,7 @@ class Dense : public EnableLinOp<Dense<ValueType>>, | |||
return values_.get_const_data(); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -242,6 +243,8 @@ class Dense : public EnableLinOp<Dense<ValueType>>, | |||
std::unique_ptr<LinOp> inverse_column_permute( | |||
const Array<int64> *inverse_permutation_indices) const override; | |||
|
|||
std::unique_ptr<Diagonal<ValueType>> extract_diagonal() const override; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include/ginkgo/core/matrix/sellp.hpp
Outdated
value_type val_at(size_type row, size_type slice_set, | ||
size_type idx) const noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also use clang-format 8?
4a1b6ec
to
5c64ad1
Compare
Kudos, SonarCloud Quality Gate passed!
|
Release 1.3.0 of Ginkgo. The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #625
The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #627
This PR introduces extraction of the diagonal (which is needed by openCARP) into a vector for all matrix formats.
TODO