Add extract_diagonal for all matrix formats #563

fritzgoebel · 2020-06-09T13:19:47Z

This PR introduces extraction of the diagonal (which is needed by openCARP) into a vector for all matrix formats.

TODO

Merge Add diagonal matrix format for diagonal scaling #580
Rebase and update to return diagonal matrix format.

upsj

LGTM! I mainly have a few nits and a request: For robustness, could you test with dense stride > 1 instead? This way, we can see that all the stride computations are correct.
Also for your CUDA/HIP code, you always write if (idx < ...) { ... }. It can be simpler to use if (idx >= ...) { return; } instead since this doesn't need indentation and makes sure that the kernel exits and no code is executed afterwards.

upsj · 2020-06-09T13:21:59Z

common/matrix/csr_kernels.hpp.inc

+            const auto orig_idx = i + orig_row_ptrs[row];
+            if (orig_idx < nnz) {
+                if (orig_col_idxs[orig_idx] == row) {
+                    diag[row * diag_stride] = orig_values[orig_idx];


I think you can return early here

Suggested change

diag[row * diag_stride] = orig_values[orig_idx];

diag[row * diag_stride] = orig_values[orig_idx];

return;

(same with all similar kernels)

upsj · 2020-06-09T13:25:09Z

common/matrix/hybrid_kernels.hpp.inc

Couldn't you just use the COO/ELL kernels here?

upsj · 2020-06-09T13:25:36Z

common/matrix/sellp_kernels.hpp.inc

+    const auto tid_in_warp = warp_tile.thread_rank();
+
+    for (size_type sellp_ind =


nit:

Suggested change

for (size_type sellp_ind =

for (auto sellp_ind =

Maybe you could also extract origin_slice_sets[slice_id(+1)] or the complete loop bounds into variables to simplify the loop header.

upsj · 2020-06-09T13:30:08Z

common/matrix/sellp_kernels.hpp.inc

+        if (global_row < diag_size) {
+            if (orig_col_idxs[sellp_ind] == global_row &&
+                orig_values[sellp_ind] != zero<ValueType>()) {


For consistency: Maybe compare to zero in both ELL/SELLP? Or leave the comparison out in both?

upsj · 2020-06-09T14:17:27Z

cuda/test/matrix/csr_kernels.cpp

@@ -664,4 +664,21 @@ TEST_F(Csr, SortUnsortedMatrixIsEquivalentToRef)
 }


+TEST_F(Csr, ExtractDiagonalIsquivalentToRef)


Suggested change

TEST_F(Csr, ExtractDiagonalIsquivalentToRef)

TEST_F(Csr, ExtractDiagonalIsEquivalentToRef)

upsj · 2020-06-09T14:20:46Z

include/ginkgo/core/matrix/ell.hpp

+     *
+     * @param diag  the vector into which the diagonal will be written
+     */
+    void extract_diagonal(Dense<value_type> *diag) const;


Suggested change

void extract_diagonal(Dense<value_type> *diag) const;

void extract_diagonal(Dense<ValueType> *diag) const;

upsj · 2020-06-09T14:21:04Z

include/ginkgo/core/matrix/sellp.hpp

+    value_type val_at(size_type row, size_type slice_set,
+                      size_type idx) const noexcept


formatting?

Yes, clang-format did this. Should I leave it or change it back?

Do you also use clang-format 8?

upsj · 2020-06-09T14:22:41Z

omp/matrix/sellp_kernels.cpp

+#pragma omp parallel for
+    for (size_type slice = 0; slice < slice_num; slice++) {
+        for (size_type row = 0;
+             row < slice_size && slice_size * slice + row < diag_size; row++) {


That's a rather complicated loop condition. Can it be simplified, maybe?

upsj · 2020-06-09T14:23:09Z

omp/test/matrix/coo_kernels.cpp

@@ -222,4 +222,21 @@ TEST_F(Coo, ConvertToCsrIsEquivalentToRef)
 }


+TEST_F(Coo, ExtractDiagonalIsquivalentToRef)


Suggested change

TEST_F(Coo, ExtractDiagonalIsquivalentToRef)

TEST_F(Coo, ExtractDiagonalIsEquivalentToRef)

upsj · 2020-06-09T14:23:41Z

reference/matrix/sellp_kernels.cpp

+#include <iostream>
+


Suggested change

#include <iostream>

thoasm

My thoughts on the Mixin

include/ginkgo/core/matrix/coo.hpp

include/ginkgo/core/base/lin_op.hpp

common/matrix/ell_kernels.hpp.inc

yhmtsai · 2020-06-15T09:54:42Z

cuda/matrix/hybrid_kernels.cu



+#include "common/matrix/coo_kernels.hpp.inc"


do you need this here?

yhmtsai · 2020-06-15T09:56:52Z

include/ginkgo/core/base/lin_op.hpp

@@ -51,6 +51,11 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 namespace gko {



Suggested change

yhmtsai · 2020-06-15T09:57:38Z

include/ginkgo/core/base/lin_op.hpp

+namespace matrix {
+template <typename ValueType>
+class Dense;
+}


Suggested change

namespace matrix {

template <typename ValueType>

class Dense;

}

namespace matrix {

template <typename ValueType>

class Dense;

}

reference/matrix/ell_kernels.cpp

yhmtsai · 2020-06-15T10:06:16Z

reference/test/matrix/dense_kernels.cpp

+    GKO_ASSERT_MTX_NEAR(diag,
+                        l({{1.}, {2.}, {1.2}}), r<TypeParam>::value);
+    // clang-format on
+}


It would be great if you add a non-square matrix test

Stale review

codecov · 2020-06-29T21:39:52Z

Codecov Report

Merging #563 into develop will increase coverage by 0.02%.
The diff coverage is 95.37%.

@@             Coverage Diff             @@
##           develop     #563      +/-   ##
===========================================
+ Coverage    92.86%   92.88%   +0.02%     
===========================================
  Files          303      303              
  Lines        21115    21331     +216     
===========================================
+ Hits         19608    19814     +206     
- Misses        1507     1517      +10

Impacted Files	Coverage Δ
core/device_hooks/common_kernels.inc.cpp	`0.00% <0.00%> (ø)`
include/ginkgo/core/matrix/coo.hpp	`94.73% <ø> (ø)`
include/ginkgo/core/matrix/csr.hpp	`47.72% <ø> (ø)`
include/ginkgo/core/matrix/dense.hpp	`97.70% <ø> (ø)`
include/ginkgo/core/matrix/ell.hpp	`100.00% <ø> (ø)`
include/ginkgo/core/matrix/hybrid.hpp	`94.11% <ø> (ø)`
include/ginkgo/core/matrix/sellp.hpp	`89.74% <ø> (ø)`
core/matrix/coo.cpp	`97.61% <100.00%> (+0.32%)`	⬆️
core/matrix/csr.cpp	`98.55% <100.00%> (+0.07%)`	⬆️
core/matrix/dense.cpp	`99.21% <100.00%> (+0.02%)`	⬆️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a45c3f...5c64ad1. Read the comment docs.

yhmtsai

LGTM

yhmtsai · 2020-07-06T11:39:09Z

common/matrix/dense_kernels.hpp.inc

+    size_type problem_size, const ValueType *__restrict__ orig,
+    size_type stride_orig, ValueType *__restrict__ diag, size_type stride_diag)
+{
+    constexpr auto warps_per_block = default_block_size / config::warp_size;


warps_per_block is unused

yhmtsai · 2020-07-06T11:43:54Z

common/matrix/sellp_kernels.hpp.inc

+            if (orig_col_idxs[sellp_ind] == global_row &&
+                orig_values[sellp_ind] != zero<ValueType>()) {
+                diag[global_row * diag_stride] = orig_values[sellp_ind];


it can also return earlier

I'm pretty sure that because the slices are stored column-major and there is one warp per slice, it can't return early here. Depending on the slice size, it can happen that one thread finds multiple diagonal entries.

yhmtsai · 2020-07-06T11:46:31Z

cuda/matrix/hybrid_kernels.cu

@@ -64,6 +64,7 @@ namespace hybrid {

 constexpr int default_block_size = 512;
 constexpr int warps_in_block = 4;
+constexpr int spmv_block_size = 0;


it is not needed, right?

include/ginkgo/core/base/lin_op.hpp

yhmtsai

LGTM

yhmtsai · 2020-08-20T08:50:48Z

include/ginkgo/core/matrix/dense.hpp

@@ -262,6 +265,7 @@ class Dense : public EnableLinOp<Dense<ValueType>>,
        return values_.get_const_data();
    }

+


Suggested change

yhmtsai · 2020-08-20T08:51:55Z

include/ginkgo/core/matrix/dense.hpp

@@ -242,6 +243,8 @@ class Dense : public EnableLinOp<Dense<ValueType>>,
    std::unique_ptr<LinOp> inverse_column_permute(
        const Array<int64> *inverse_permutation_indices) const override;

+    std::unique_ptr<Diagonal<ValueType>> extract_diagonal() const override;
+


Suggested change

yhmtsai · 2020-08-20T08:53:22Z

include/ginkgo/core/matrix/sellp.hpp

+    value_type val_at(size_type row, size_type slice_set,
+                      size_type idx) const noexcept


Do you also use clang-format 8?

reference/matrix/sellp_kernels.cpp

omp/test/matrix/sellp_kernels.cpp

reference/test/matrix/csr_kernels.cpp

sonarqubecloud · 2020-08-21T10:18:31Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
22 Code Smells

63.5% Coverage
4.5% Duplication

The version of Java (1.8.0_121) you have used to run this analysis is deprecated and we will stop accepting it from October 2020. Please update to at least Java 11.
Read more here

Release 1.3.0 of Ginkgo. The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #625

The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.3.0. This release brings CUDA 11 support, changes the default C++ standard to be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for diagonal extraction, significantly improves the CMake configuration output format, adds the Ginkgo paper which got accepted into the Journal of Open Source Software (JOSS), and fixes multiple issues. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). Additions: + Add paper for Journal of Open Source Software (JOSS). [#479](#479) + Add a DiagonalExtractable interface. [#563](#563) + Add a new diagonal Matrix Format. [#580](#580) + Add Cuda11 support. [#603](#603) + Add information output after CMake configuration. [#610](#610) + Add a new preconditioner export example. [#595](#595) + Add a new cuda-memcheck CI job. [#592](#592) Changes: + Use unified memory in CUDA debug builds. [#621](#621) + Improve `BENCHMARKING.md` with more detailed info. [#619](#619) + Use C++14 standard instead of C++11. [#611](#611) + Update the Ampere sm information and CudaArchitectureSelector. [#588](#588) Fixes: + Fix documentation warnings and errors. [#624](#624) + Fix warnings for diagonal matrix format. [#622](#622) + Fix criterion factory parameters in CUDA. [#586](#586) + Fix the norm-type in the examples. [#612](#612) + Fix the WAW race in OpenMP is_sorted_by_column_index. [#617](#617) + Fix the example's exec_map by creating the executor only if requested. [#602](#602) + Fix some CMake warnings. [#614](#614) + Fix Windows building documentation. [#601](#601) + Warn when CXX and CUDA host compiler do not match. [#607](#607) + Fix reduce_add, prefix_sum, and doc-build. [#593](#593) + Fix find_library(cublas) issue on machines installing multiple cuda. [#591](#591) + Fix allocator in sellp read. [#589](#589) + Fix the CAS with HIP and NVIDIA backends. [#585](#585) Deletions: + Remove unused preconditioner parameter in LowerTrs. [#587](#587) Related PR: #627

fritzgoebel requested review from upsj, pratikvn, thoasm, yhmtsai, hartwiganzt and tcojean June 9, 2020 13:19

fritzgoebel self-assigned this Jun 9, 2020

upsj reviewed Jun 9, 2020

View reviewed changes

yhmtsai mentioned this pull request Jun 9, 2020

Amgx parallel graph match reference and fill_seq_array #528

Merged

2 tasks

fritzgoebel force-pushed the add_get_diag branch from 66c753d to dd9e558 Compare June 10, 2020 14:57

thoasm previously requested changes Jun 10, 2020

View reviewed changes

include/ginkgo/core/matrix/coo.hpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/lin_op.hpp Outdated Show resolved Hide resolved

fritzgoebel force-pushed the add_get_diag branch 4 times, most recently from 3a31aa5 to e0608a6 Compare June 15, 2020 08:54

yhmtsai requested changes Jun 15, 2020

View reviewed changes

thoasm self-requested a review June 16, 2020 12:37

fritzgoebel force-pushed the add_get_diag branch 2 times, most recently from 8b308b1 to dd5d432 Compare June 29, 2020 15:40

yhmtsai approved these changes Jul 6, 2020

View reviewed changes

yhmtsai mentioned this pull request Jul 7, 2020

Add diagonal matrix format for diagonal scaling #580

Merged

fritzgoebel added the 1:ST:ready-for-review This PR is ready for review label Aug 19, 2020

fritzgoebel force-pushed the add_get_diag branch from 302beb8 to 9a54459 Compare August 20, 2020 08:15

yhmtsai approved these changes Aug 20, 2020

View reviewed changes

upsj approved these changes Aug 20, 2020

View reviewed changes

omp/test/matrix/sellp_kernels.cpp Outdated Show resolved Hide resolved

reference/test/matrix/csr_kernels.cpp Outdated Show resolved Hide resolved

fritzgoebel added 17 commits August 21, 2020 09:05

Add extract_diagonal for dense

ab8d8ee

Add extract_matrix for CSR

cebee3d

Add extract_diagonal for COO

84c915e

Add extract_diagonal for ELL

fa8ae88

Add extract_diagonal for SELLP

88ae29c

add extract_diagonal for Hybrid

846ae5c

make DiagonalExtractable mixin

51175ed

Apply suggestions from code review

7167dff

Switch from mixin to abstract class

a731812

fix hip sellp test

0815cfd

Apply suggestions from code review

1eccdb0

Fix CUDA and HIP extract_diagonal.

f9cc628

Review comments

08d4820

Add description of extract_diagonal

37c0594

rebase and use diagonal matrix format as output to extract_diagonal

11a5c08

Remove diagonal stride

8e6798f

code review

5c64ad1

fritzgoebel force-pushed the add_get_diag branch from 4a1b6ec to 5c64ad1 Compare August 21, 2020 07:05

fritzgoebel merged commit bc432c4 into develop Aug 24, 2020

fritzgoebel deleted the add_get_diag branch August 24, 2020 07:21

tcojean added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Aug 24, 2020

upsj mentioned this pull request May 5, 2021

Adding Functionality needed by openCARP #555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extract_diagonal for all matrix formats #563

Add extract_diagonal for all matrix formats #563

fritzgoebel commented Jun 9, 2020 •

edited

Loading

upsj left a comment

upsj Jun 9, 2020 •

edited

Loading

upsj Jun 9, 2020

upsj Jun 9, 2020

upsj Jun 9, 2020

upsj Jun 9, 2020

upsj Jun 9, 2020

upsj Jun 9, 2020

fritzgoebel Jun 12, 2020

yhmtsai Aug 20, 2020

upsj Jun 9, 2020

upsj Jun 9, 2020

upsj Jun 9, 2020

thoasm left a comment

yhmtsai Jun 15, 2020

fritzgoebel Jun 16, 2020

yhmtsai Jun 15, 2020

yhmtsai Jun 15, 2020

yhmtsai Jun 15, 2020

codecov bot commented Jun 29, 2020 •

edited

Loading

yhmtsai left a comment

yhmtsai Jul 6, 2020

yhmtsai Jul 6, 2020

fritzgoebel Jul 7, 2020

yhmtsai Jul 6, 2020

yhmtsai left a comment

yhmtsai Aug 20, 2020

yhmtsai Aug 20, 2020

yhmtsai Aug 20, 2020

sonarqubecloud bot commented Aug 21, 2020

	diag[row * diag_stride] = orig_values[orig_idx];
	diag[row * diag_stride] = orig_values[orig_idx];
	return;

		const auto tid_in_warp = warp_tile.thread_rank();

		for (size_type sellp_ind =

		@@ -664,4 +664,21 @@ TEST_F(Csr, SortUnsortedMatrixIsEquivalentToRef)
		}


		TEST_F(Csr, ExtractDiagonalIsquivalentToRef)

	TEST_F(Csr, ExtractDiagonalIsquivalentToRef)
	TEST_F(Csr, ExtractDiagonalIsEquivalentToRef)

	void extract_diagonal(Dense<value_type> *diag) const;
	void extract_diagonal(Dense<ValueType> *diag) const;

		value_type val_at(size_type row, size_type slice_set,
		size_type idx) const noexcept

		@@ -222,4 +222,21 @@ TEST_F(Coo, ConvertToCsrIsEquivalentToRef)
		}


		TEST_F(Coo, ExtractDiagonalIsquivalentToRef)

	TEST_F(Coo, ExtractDiagonalIsquivalentToRef)
	TEST_F(Coo, ExtractDiagonalIsEquivalentToRef)

		@@ -51,6 +51,11 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

		namespace gko {

		@@ -262,6 +265,7 @@ class Dense : public EnableLinOp<Dense<ValueType>>,
		return values_.get_const_data();
		}

Add extract_diagonal for all matrix formats #563

Add extract_diagonal for all matrix formats #563

Conversation

fritzgoebel commented Jun 9, 2020 • edited Loading

TODO

upsj left a comment

Choose a reason for hiding this comment

upsj Jun 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thoasm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jun 29, 2020 • edited Loading

Codecov Report

yhmtsai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhmtsai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Aug 21, 2020

fritzgoebel commented Jun 9, 2020 •

edited

Loading

upsj Jun 9, 2020 •

edited

Loading

codecov bot commented Jun 29, 2020 •

edited

Loading