Adds distributed support for several solvers #976

MarcelKoch · 2022-02-24T08:38:41Z

This PR will enable using distributed matrices and vector (#971 and #961) in the following iterative solvers:

~~Bicg~~
Bicgstab
Cg
Cgs
Fcg
Ir

Currently not supported are:

Bicg
[cb_]Gmres
Idr
Multigrid
Lower/Upper_trs

With the changes in #861 it should be possible to also enable distributed systems for [cb_]gmres. I've not look into Idr too much, but I guess the issue there is that reductions are merged with other operations into one kernel launch and thus can't use global communication.

The handling of the distributed/non-distributed data is done via additional dispatch routines that expand on precision_dispatch_real_complex, and helper routines to extract the underlying dense matrix from either a distributed or dense vector. Also, the residual norm stopping criteria implementation has been changed to also use a similar dispatch approach.

This also contains some fixes regarding the doxygen documentation for the other distributed classes, which I will not add to the previous PRs.

Partially addresses #907.

Todos:

add solver tests
add [cb_]gmres (wait for Simplify GMRES kernels #861)
~~add Idr~~ postponed for now

Main contributions are from @upsj and @pratikvn.

tcojean

LGTM! I mostly have small comments.

examples/distributed-solver/doc/results.dox

examples/distributed-solver/doc/kind

include/ginkgo/core/base/precision_dispatch.hpp

core/stop/residual_norm.cpp

test/mpi/solver/solver.cpp

core/solver/bicg.cpp

test/mpi/solver/solver.cpp

pratikvn

LGTM!

core/stop/residual_norm.cpp

include/ginkgo/core/base/mpi.hpp

Co-authored-by: Terry Cojean <terry.cojean@kit.edu>

- documentation - simplified any_is_complex check - move is_distributed Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>

MarcelKoch · 2022-08-26T15:45:25Z

format!

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

previously this could lead to divergence between the processes and subsequent deadlocks

tcojean

LGTM

tcojean · 2022-09-23T14:34:21Z

core/solver/bicg.cpp

@@ -42,6 +42,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include <ginkgo/core/base/utils.hpp>


+#include "core/distributed/helpers.hpp"


tcojean · 2022-09-23T14:35:38Z

include/ginkgo/core/solver/bicg.hpp

+    void apply_dense_impl(const gko::matrix::Dense<ValueType>* b,
+                          gko::matrix::Dense<ValueType>* x) const;


Also unneeded?

tcojean · 2022-09-23T14:38:51Z

include/ginkgo/ginkgo.hpp

-#include <ginkgo/core/solver/lower_trs.hpp>
 #include <ginkgo/core/solver/multigrid.hpp>
 #include <ginkgo/core/solver/solver_base.hpp>
 #include <ginkgo/core/solver/solver_traits.hpp>
-#include <ginkgo/core/solver/upper_trs.hpp>


Shouldn't these be kept?

These are moved to triangular.hpp in develop, after rebasing that one should be here.

I don't think I will rebase this PR or #1054, so I will fix it

fritzgoebel

LGTM, nice work!

fritzgoebel · 2022-09-26T07:00:19Z

core/solver/idr.cpp

@@ -42,6 +42,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #include <ginkgo/core/solver/solver_base.hpp>


+#include "core/components/fill_array_kernels.hpp"


I think this is unused

fritzgoebel · 2022-09-26T07:13:23Z

include/ginkgo/ginkgo.hpp

-#include <ginkgo/core/solver/lower_trs.hpp>
 #include <ginkgo/core/solver/multigrid.hpp>
 #include <ginkgo/core/solver/solver_base.hpp>
 #include <ginkgo/core/solver/solver_traits.hpp>
-#include <ginkgo/core/solver/upper_trs.hpp>


These are moved to triangular.hpp in develop, after rebasing that one should be here.

examples/distributed-solver/distributed-solver.cpp

- remove unnecessary includes Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Fritz Goebel <fritz.goebel@kit.edu>

MarcelKoch · 2022-09-26T08:24:11Z

format!

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

ginkgo-bot · 2022-09-27T07:36:58Z

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 912 Removed, 0 Changed (1 filtered out), 1018 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

This PR will enable using distributed matrices and vector (#971 and #961) in the following iterative solvers: - Bicgstab - Cg - Cgs - Fcg - Ir Currently not supported are: - Bicg - [cb_]Gmres - Idr - Multigrid - Lower/Upper_trs The handling of the distributed/non-distributed data is done via additional dispatch routines that expand on precision_dispatch_real_complex, and helper routines to extract the underlying dense matrix from either a distributed or dense vector. Also, the residual norm stopping criteria implementation has been changed to also use a similar dispatch approach. This also contains some fixes regarding the doxygen documentation for the other distributed classes. Related PR: #976

This PR will add basic, distributed data structures (matrix and vector), and enable some solvers for these types. This PR contains the following PRs: - #961 - #971 - #976 - #985 - #1007 - #1030 - #1054 # Additional Changes - moves new types into experimental namespace - moves existing Partition class into experimental namespace - moves existing mpi namespace into experimental namespace - makes generic_scoped_device_id_guard destructor noexcept by terminating if restoring the original device id fails - switches to blocking communication in the SpMV if OpenMPI version 4.0.x is used - disables Horeka mpi tests and uses nla-gpu instead Related PR: #1133

Advertise release 1.5.0 and last changes + Add changelog, + Update third party libraries + A small fix to a CMake file See PR: #1195 The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as: - MPI-based multi-node support for all matrix formats and most solvers; - full DPC++/SYCL support, - functionality and interface for GPU-resident sparse direct solvers, - an interface for wrapping solvers with scaling and reordering applied, - a new algebraic Multigrid solver/preconditioner, - improved mixed-precision support, - support for device matrix assembly, and much more. If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions). Supported systems and requirements: + For all platforms, CMake 3.13+ + C++14 compliant compiler + Linux and macOS + GCC: 5.5+ + clang: 3.9+ + Intel compiler: 2018+ + Apple LLVM: 8.0+ + NVHPC: 22.7+ + Cray Compiler: 14.0.1+ + CUDA module: CUDA 9.2+ or NVHPC 22.7+ + HIP module: ROCm 4.0+ + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`. + Windows + MinGW and Cygwin: GCC 5.5+ + Microsoft Visual Studio: VS 2019 + CUDA module: CUDA 9.2+, Microsoft Visual Studio + OpenMP module: MinGW or Cygwin. Algorithm and important feature additions: + Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148)) + Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110), [#1142](#1142)) + Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082)) + Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059)) + Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982), [#986](#986)) + Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926)) + Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117)) + Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965)) + Add support for mixed real/complex BLAS operations ([#864](#864)) + Add a FFT LinOp for all but DPC++/SYCL ([#701](#701)) + Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775)) + Add CSR scaling ([#848](#848)) + Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890)) + Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901)) + Add mixed precision SparsityCsr SpMV support ([#970](#970)) + Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964)) + Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942)) Deprecations and important changes: + Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)). + Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101)) + Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052)) + Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020)) + Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)). + Drop official support for old CUDA < 9.2 ([#887](#887)) Improved performance additions: + Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028)) + Add HIP unsafe atomic option for AMD ([#1091](#1091)) + Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)). + Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809)) Fixes: + Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189)) + Fix issues with hwloc-related tests ([#1074](#1074)) + Fix include headers for GCC 12 ([#1071](#1071)) + Fix for simple-solver-logging example ([#1066](#1066)) + Fix for potential memory leak in Logger ([#1056](#1056)) + Fix logging of mixin classes ([#1037](#1037)) + Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753)) + Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978)) + Fix uninitialized data ([#958](#958)) + Fix CUDA version requirement for cusparseSpSM ([#953](#953)) + Fix several issues within bash-script ([#1016](#1016)) + Fixes for `NVHPC` compiler support ([#1194](#1194)) Other additions: + Simplify and properly name GMRES kernels ([#861](#861)) + Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109)) + Improve gdb pretty printer ([#987](#987), [#1114](#1114)) + Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035)) + Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032)) + Better CSR strategy defaults ([#969](#969)) + Add `move_from` to `PolymorphicObject` ([#997](#997)) + Remove unnecessary device_guard usage ([#956](#956)) + Improvements to the generic accessor for mixed-precision ([#727](#727)) + Add a naive lower triangular solver implementation for CUDA ([#764](#764)) + Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897)) + Add a L1 norm implementation ([#900](#900)) + Add reduce_add for arrays ([#831](#831)) + Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)). + Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123)) + Make IDR random initilization deterministic ([#1116](#1116)) + Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088)) + Update CUDA archCoresPerSM ([#1175](#1116)) + Add kernels for Csr sparsity pattern lookup ([#994](#994)) + Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027)) + Add a binary IO format for matrix data ([#984](#984)) + Add a tuple zip_iterator implementation ([#966](#966)) + Simplify kernel stubs and declarations ([#888](#888)) + Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859)) + Simplify copy to device in tests and examples ([#863](#863)) + More verbose output to array assertions ([#858](#858)) + Allow parallel compilation for Jacobi kernels ([#871](#871)) + Change clang-format pointer alignment to left ([#872](#872)) + Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137)) + Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154)) + Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903), [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))

MarcelKoch added this to the Ginkgo 1.5.0 milestone Feb 24, 2022

MarcelKoch self-assigned this Feb 24, 2022

ginkgo-bot added reg:build This is related to the build system. reg:documentation This is related to documentation. reg:example This is related to the examples. labels Feb 24, 2022

MarcelKoch force-pushed the distributed-solvers branch from 9fb7ce5 to 1341500 Compare February 24, 2022 09:03

MarcelKoch added 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. 1:ST:WIP This PR is a work in progress. Not ready for review. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 24, 2022

MarcelKoch force-pushed the new-distributed-matrix branch from a9b37cb to 4acb343 Compare February 28, 2022 14:31

MarcelKoch force-pushed the distributed-solvers branch from dbffde7 to 9122166 Compare February 28, 2022 14:32

MarcelKoch force-pushed the new-distributed-matrix branch from 4acb343 to cdd5bf9 Compare March 1, 2022 08:33

MarcelKoch force-pushed the distributed-solvers branch from 9122166 to 4f09d10 Compare March 1, 2022 08:34

upsj mentioned this pull request Mar 4, 2022

Create a clean distributed-ginkgo branch #907

Open

8 tasks

MarcelKoch force-pushed the new-distributed-matrix branch from cdd5bf9 to d98f4b1 Compare March 10, 2022 14:31

MarcelKoch force-pushed the new-distributed-matrix branch 2 times, most recently from dfbfb8a to d050278 Compare April 1, 2022 07:16

MarcelKoch force-pushed the new-distributed-matrix branch from cd61a1d to a293a5a Compare April 21, 2022 12:49

MarcelKoch force-pushed the distributed-solvers branch from 1dc6d94 to d923dbf Compare April 21, 2022 15:51

MarcelKoch force-pushed the new-distributed-matrix branch from a26e06d to 77a0638 Compare April 22, 2022 09:54

MarcelKoch force-pushed the distributed-solvers branch 3 times, most recently from b432578 to 3b13196 Compare April 22, 2022 13:50

tcojean approved these changes Aug 26, 2022

View reviewed changes

pratikvn approved these changes Aug 26, 2022

View reviewed changes

core/stop/residual_norm.cpp Outdated Show resolved Hide resolved

include/ginkgo/core/base/mpi.hpp Show resolved Hide resolved

MarcelKoch and others added 3 commits August 26, 2022 10:55

adds distributed example kind

1d8d148

Co-authored-by: Terry Cojean <terry.cojean@kit.edu>

removes template apply_impl of Bicg

af1e7ca

Co-authored-by: Terry Cojean <terry.cojean@kit.edu>

review updates:

c05f6a1

- documentation - simplified any_is_complex check - move is_distributed Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>

ginkgo-bot and others added 5 commits August 26, 2022 15:46

Format files

0f58c62

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

fixes residual_norm precision dispatch for non-mpi

f504af1

adds test with different partition types

6dc6667

removes special case if no non-local matrix

33fd976

previously this could lead to divergence between the processes and subsequent deadlocks

frees mpi request and makes it move-only

be0983f

tcojean approved these changes Sep 23, 2022

View reviewed changes

fritzgoebel approved these changes Sep 26, 2022

View reviewed changes

review updates:

06f9221

- remove unnecessary includes Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Fritz Goebel <fritz.goebel@kit.edu>

Format files

cae4b88

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

MarcelKoch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Sep 26, 2022

Merge branch 'distributed-develop' into distributed-solvers

8113be4

MarcelKoch merged commit 8b5785d into distributed-develop Sep 28, 2022

MarcelKoch deleted the distributed-solvers branch September 28, 2022 15:20

MarcelKoch mentioned this pull request Oct 5, 2022

Add distributed capabilities #1133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds distributed support for several solvers #976

Adds distributed support for several solvers #976

MarcelKoch commented Feb 24, 2022 •

edited

Loading

tcojean left a comment

pratikvn left a comment

MarcelKoch commented Aug 26, 2022

tcojean left a comment

tcojean Sep 23, 2022

tcojean Sep 23, 2022

tcojean Sep 23, 2022

fritzgoebel Sep 26, 2022

MarcelKoch Sep 26, 2022

fritzgoebel left a comment

fritzgoebel Sep 26, 2022

fritzgoebel Sep 26, 2022

MarcelKoch commented Sep 26, 2022

ginkgo-bot commented Sep 27, 2022

		@@ -42,6 +42,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
		#include <ginkgo/core/base/utils.hpp>


		#include "core/distributed/helpers.hpp"

		void apply_dense_impl(const gko::matrix::Dense<ValueType>* b,
		gko::matrix::Dense<ValueType>* x) const;

		@@ -42,6 +42,8 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
		#include <ginkgo/core/solver/solver_base.hpp>


		#include "core/components/fill_array_kernels.hpp"

Adds distributed support for several solvers #976

Adds distributed support for several solvers #976

Conversation

MarcelKoch commented Feb 24, 2022 • edited Loading

tcojean left a comment

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

MarcelKoch commented Aug 26, 2022

tcojean left a comment

Choose a reason for hiding this comment

tcojean Sep 23, 2022

Choose a reason for hiding this comment

tcojean Sep 23, 2022

Choose a reason for hiding this comment

tcojean Sep 23, 2022

Choose a reason for hiding this comment

fritzgoebel Sep 26, 2022

Choose a reason for hiding this comment

MarcelKoch Sep 26, 2022

Choose a reason for hiding this comment

fritzgoebel left a comment

Choose a reason for hiding this comment

fritzgoebel Sep 26, 2022

Choose a reason for hiding this comment

fritzgoebel Sep 26, 2022

Choose a reason for hiding this comment

MarcelKoch commented Sep 26, 2022

ginkgo-bot commented Sep 27, 2022

MarcelKoch commented Feb 24, 2022 •

edited

Loading