Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a MPI layer, some bindings, CMake setup and MPI test framework #908

Merged
merged 43 commits into from
Nov 27, 2021

Conversation

pratikvn
Copy link
Member

This PR aims to add the MPI base framework to Ginkgo in a experimental distributed-develop branch.

Partially closes #907

@pratikvn pratikvn added 1:ST:WIP This PR is a work in progress. Not ready for review. mod:mpi This is related to the MPI module labels Oct 22, 2021
@pratikvn pratikvn self-assigned this Oct 22, 2021
@ginkgo-bot ginkgo-bot added mod:core This is related to the core module. reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. reg:testing This is related to testing. labels Oct 22, 2021
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a few areas that need some more discussion before we can merge this, namely:

  • ginkgo_mpi as a separate shared library probably won't work consistently, especially with the deep integration into solvers.
  • declaring MPI functionality in headers with GKO_BUILD_MPI could lead to a lot of issues with binary incompatibility, I think we should consider including these files/their contents only conditionally. For an example, check distributed-matrix
  • personally I prefer to have the wrappers available as function templates in the headers, since they can then be easily extended to custom types

@codecov
Copy link

codecov bot commented Oct 22, 2021

Codecov Report

Merging #908 (37b8a44) into distributed-develop (2a34e0e) will decrease coverage by 1.08%.
The diff coverage is 90.00%.

Impacted file tree graph

@@                   Coverage Diff                   @@
##           distributed-develop     #908      +/-   ##
=======================================================
- Coverage                94.39%   93.30%   -1.09%     
=======================================================
  Files                      453      459       +6     
  Lines                    36907    37578     +671     
=======================================================
+ Hits                     34838    35063     +225     
- Misses                    2069     2515     +446     
Impacted Files Coverage Δ
core/test/utils.hpp 100.00% <ø> (ø)
include/ginkgo/core/base/exception_helpers.hpp 90.90% <ø> (ø)
include/ginkgo/core/base/types.hpp 92.59% <ø> (ø)
third_party/gtest/gtest_mpi_listener.cpp 55.72% <55.72%> (ø)
include/ginkgo/core/base/mpi.hpp 92.39% <92.39%> (ø)
core/mpi/exception.cpp 100.00% <100.00%> (ø)
core/test/mpi/base/bindings.cpp 100.00% <100.00%> (ø)
core/test/mpi/base/communicator.cpp 100.00% <100.00%> (ø)
core/test/mpi/base/exception_helpers.cpp 100.00% <100.00%> (ø)
include/ginkgo/core/base/exception.hpp 100.00% <100.00%> (ø)
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a34e0e...37b8a44. Read the comment docs.

@pratikvn pratikvn mentioned this pull request Oct 22, 2021
4 tasks
@pratikvn
Copy link
Member Author

@upsj, declaring files conditionally has some side effects:

  1. You cannot have one code base compiled once, which allows you to switch between the cases when MPI is available or not. For example, gko::distributed::Matrix would not even be defined when MPI is off. I think this would cause more issues because you then have non-existent symbols instead of hooks like we do in Ginkgo currently. AFAIK, this need to write both MPI and non-MPI versions is a desired feature in many applications.
  2. In some solvers, it might be necessary to switch between algorithms or do something else for MPI, which again would not be available because the symbols are no longer defined.

Can you elaborate on what you mean by possible binary incompatibilities ?

@pratikvn
Copy link
Member Author

ginkgo_mpi as a separate shared library probably won't work consistently, especially with the deep integration into solvers.

I am not sure what else we can do here. I think putting the MPI symbols into ginkgo_core conditionally is not good either and I don't think we can put it anywhere else. What kind of problems do you foresee with ginkgo_mpi as a separate shared library ?

@pratikvn
Copy link
Member Author

personally I prefer to have the wrappers available as function templates in the headers, since they can then be easily extended to custom types

With the current approach as well, the user facing wrappers are templated. This should allow for easy extension to custom types as well, with automatic mpi_type deduction and unwrapping of the data of custom types such as Arrays, maybe inside a specific instantiation.

Also, extension to custom types usually needs specific overloads for the reason that you usually don't need to specify counts, which you need to for the POD types, so the function signature needs to be different in my opinion.

@upsj
Copy link
Member

upsj commented Oct 25, 2021

@pratikvn yes, unfortunately I see no nice solution to this dilemma - distributed matrix and vector need MPI functionality, i.e. they need to store a MPI communicator, i.e. the type of MPI_Comm must be known in the headers.
If you were to declare MPI_Comm = int, sizeof(MPI_Comm) is wrong compared to an actual MPI implementation (and may even differ between implementations), which leads to subtle breaks when trying to pass a MPI-enabled type into a non MPI-enabled Ginkgo. That is a simple example for a ABI break that can easily occur in such a setting.
There are ways around this (pimpl idiom), but those also require some place where the actual MPI type can be used, so we don't really get clean separation.

If you compile Ginkgo without MPI, then the corresponding headers are empty thanks to the #ifdef, so you can't even compile against such a library if you try to use distributed parts of Ginkgo, no potential for missing symbols here.

If you tried to pass a distributed vector (which you got by hacking around in the types) to a non-distributed solver, you'd simply get a runtime error, since it expects Dense instead of distributed::Vector.

@upsj
Copy link
Member

upsj commented Oct 25, 2021

With the current approach as well, the user facing wrappers are templated. This should allow for easy extension to custom types as well, with automatic mpi_type deduction and unwrapping of the data of custom types such as Arrays, maybe inside a specific instantiation.

It sounds to me like you are talking about the function declarations here, but without a function definition available in the header, you as a user cannot instantiate the function for custom types, even if they would have nicely corresponding MPI types.
If we provide such wrapper declarations, but limit their definitions/instantiations to our internally used value and index types, how useful are they as part of the public interface then?

@upsj
Copy link
Member

upsj commented Oct 25, 2021

I am not sure what else we can do here. I think putting the MPI symbols into ginkgo_core conditionally is not good either and I don't think we can put it anywhere else. What kind of problems do you foresee with ginkgo_mpi as a separate shared library ?

The issue is at the interaction with solvers - we need three things: A dispatch that takes care of the distinction between Dense and distributed Vector, a unpacking function that returns the underlying local Dense vector for kernel calls, and a reduction function that can be used to allreduce the result of custom reductions like in CGS GMRES. All of them need distributed::Vector to be a complete type with known binary layout etc, so for the aforementioned reasons, we cannot simply provide a dummy distributed::Vector with the correct binary layout, because they can differ between MPI implementations.

So with fundamental parts of the distributed integration not being part of ginkgo_mpi, I am not sure what we would gain from this approach otherwise.

Don't get me wrong, it it quite possible that there is a clean solution to implement this, but so far I haven't seen anything close to it that doesn't require large amounts of boilerplate code, or complicates user interaction.

@pratikvn pratikvn added the 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. label Oct 26, 2021
@pratikvn
Copy link
Member Author

Maybe I need to clarify what I meant. We need to decide on a couple of things which will probably push us towards one decision or the other.

  1. Do we allow users to write code such as this:
std::ifstream stream("fname");
auto data = gko::read_raw<>(stream);

auto mat = gko::matrix::Csr<>::create(exec); // single exec
auto dist_mat = gko::distributed::Matrix<Csr<>>::create(exec, comm); // distributed object

auto partition = gko::Partition::build(exec, ranges); // build partition

if(is_distributed()){ // Some way to say distributed functionality has been compiled for. 
    dist_mat->read_distributed(data, partition);
} else {
    mat->read(data);
}

If we do, then we need a dummy implementation. In mpi-base-dist-mat, we did provide the dummy symbols and it worked fine as far as I could see. It may be that there could have been issues we did not face yet. Of course, as you say there is some boiler plate involved, which we have for most of Ginkgo anyway.

This needs some input from @tcojean and @fritzgoebel as they have some experience/input from the OpenCARP side.

  1. Can we completely separate the distributed and the non-distributed solvers and preconditioners ? For example, is there no place we would have the same core, but a branch to select between a distributed and non-distributed algorithm ?
// Some core algorithm
// Do common stuff with LinOp's (both non-distributed and distributed have that as base type)

if(is_distributed()){
// Call MPI functions or work on distributed objects with dynamic_casts.
} else {
// Do single executor stuff.
}

// Do common stuff with LinOp's (both non-distributed and distributed have that as base type)

In other words, can we guarantee that the core algorithm will be the same for non-distributed and distributed for all current and future algorithms. I think this would be a pretty strict restriction.

Both of the above, will also work with #ifdef but will require re-compilation every time either is required. If a dummy distributed is provided, then when MPI is off, only non-distributed will work, but the distributed code still compiles. With MPI switched on, you have the choice to run either non-distributed or distributed without re-compiling.

@tcojean
Copy link
Member

tcojean commented Oct 27, 2021

I'm not in a position yet to give much feedback on all those issues, but I think the #ifdef way might be fine as well. They would probably anyway need to recompile themselves for enabling or not MPI in their own code after all (there shouldn't be much use case where Ginkgo is the only distributed code in the stack?). But the questions of how much code changes are required from standard Ginkgo to Ginkgo distributed, if distributed/non distributed Ginkgo code can live together from a user perspective are important ones.

@pratikvn
Copy link
Member Author

To collect the summary of our discussions and decisions:

  1. We go with having no distributed functionality when MPI is off. The distributed files (MPI bindings, distributed classes) are all not compiled when MPI is off.
    a. We try to unify the core algorithms as much as possible (for most solvers that we use this should be straightforward), for preconditioners in case we need different execution paths, we duplicate the code and enable conditional compilation.
  2. Distributed functionality (distributed::Matrix<> and distributed::Vector and their functions) will be within the ginkgo_core library. To separate MPI bindings and functions, a separate library, ginkgo_mpi will be added which only contains wrappers for MPI functionality. This would allow us to see MPI as a backend of sorts, and keep separation between core library (ginkgo related functionality) and external functionality (through wrappers).
  3. To allow for wrappers for MPI functions to work with user-defined custom types, move wrappers to a public headers with templated defintions.

@upsj and @MarcelKoch, please feel free to correct me if I am missing something.

@MarcelKoch
Copy link
Member

1. We go with having no distributed functionality when MPI is off. The distributed files (MPI bindings, distributed classes) are all _**not compiled**_ when MPI is off.

Just to clarify, the executor specific kernels of some distributed classes/functions will still be compiled into the corresponding device library regardless of the MPI status, since these kernels do not require MPI in any way.

@pratikvn pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Nov 2, 2021
MarcelKoch
MarcelKoch previously approved these changes Nov 3, 2021
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding all of this, especially the CI. I have some mostly smaller remarks, below. Larger issues like using value semantics for most mpi wrapper classes or using fully templated functions can be addressed at a later point.

* for our purposes. As the class or object goes out of scope, the communicator
* is freed.
*/
class communicator : public EnableSharedCreateMethod<communicator> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One slight issue with using value semantics, is that it requires a call to MPI_COMM_DUP, which is a collective operation, so the overhead would probably increase.

Copy link
Contributor

@Slaedr Slaedr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice job with the communicator wrapper. I noticed one mistake in the tests, and there are other relatively minor comments.

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The communicator implementation looks good if I look at the owning case, the non-owning case and its interaction with owning communicators still has a few rough edges to iron out. They are mainly visible in the assignment operators, but also to a lesser degree in the constructors:

Copy-assignment a = b

a owning a non-owning
b owning duplicate duplicate
b non-owning duplicate? duplicate?

Move-assignment a = std::move(b) should never duplicate, but should b be empty afterwards, if it was non-owning? Should a be turned from owning to non-owning by this operation?

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
Co-authored-by: Tobias Ribizel <ribizel@kit.edu>
@upsj
Copy link
Member

upsj commented Nov 25, 2021

Pratik and I discussed this behavior conundrum, and following the design of Boost.MPI, came to the conclusion that it makes sense to take care of conditional ownership + duplication only at the application-Ginkgo boundary, i.e. the communicator constructor, and store the communicator in a shared_ptr with a custom deleter to avoid unnecessary duplication if we have many distributed objects in a custom deleter.

@tcojean
Copy link
Member

tcojean commented Nov 25, 2021

What are the implications of this? What I understand from this is that you would then:

  • Decide owning/non-owning only at construction time (basically, Ginkgo's comms are owning, MPI non owning)
  • Do not change the owning/non-owning property on copy/move, maybe even remove that capacity.

Is there something I misunderstood?

@upsj
Copy link
Member

upsj commented Nov 25, 2021

Yes, basically this means that there are two states for the shared_ptr<MPI_Comm> internally, ones with a plain deleter (non-owning) that just deletes the MPI_Comm pointer, and ones with a deleter that calls MPI_Comm_free and deletes the pointer afterwards. This state is kept during copy construction and assignment (move construction and assignment are disabled/handled by copy construction and assignment)

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a few remaining small nits:

  • Formatting: We usually use single empty line between member functions, that's a bit inconsistent in the file right now
  • Caching rank/size/local rank inside the communicator
  • is_gpu_aware is vendor-specific, since we can now compile CUDA and ROCm simultaneously, maybe DPC++ in the future, as well?
  • you removed the request type, I think that could still be really useful with appropriate member functions .wait() and .test() and maybe static member functions wait_all() and test_all()?

Comment on lines +194 to +195
GKO_ASSERT(*comm != MPI_COMM_NULL);
GKO_ASSERT_NO_MPI_ERRORS(MPI_Comm_free(comm));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, not necessarily something we need to change: This throws/aborts inside a destructor, which may not be ideal. MPI_Comm_free probably also checks against MPI_COMM_NULL (considering how eager the MPI impls I checked are to abort on slight issues)

Suggested change
GKO_ASSERT(*comm != MPI_COMM_NULL);
GKO_ASSERT_NO_MPI_ERRORS(MPI_Comm_free(comm));
if (MPI_Comm_free(comm) != MPI_SUCCESS) {
// something like in CudaExecutor::raw_free
}

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the excellent work. I have only a minor remark on the constness of the member functions. Nearly every member function of communicator and window can be marked const. The underlying MPI object is always copied, and I think very few calls are documented to change the MPI object.

Also, I second @upsj suggestion to add a request type, to really make it consistent.

pratikvn and others added 2 commits November 27, 2021 09:58
Co-authored-by: Tobias Ribizel <ribizel@kit.edu>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Nov 27, 2021
@pratikvn pratikvn merged commit a5f5e93 into distributed-develop Nov 27, 2021
@pratikvn pratikvn deleted the mpi-bindings branch November 27, 2021 20:28
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@pratikvn pratikvn restored the mpi-bindings branch November 29, 2021 11:30
@pratikvn pratikvn deleted the mpi-bindings branch November 29, 2021 13:51
pratikvn added a commit that referenced this pull request Nov 29, 2021
This PR merges the basic distributed capability into Ginkgo, namely

1. MPI layer (#908)
2. Partition class and kernels (#909)

The discussion regarding the interfaces for the above functionalities can be found in the above linked PR's.

Related PR: #932
tcojean added a commit that referenced this pull request Nov 12, 2022
Advertise release 1.5.0 and last changes

+ Add changelog,
+ Update third party libraries
+ A small fix to a CMake file

See PR: #1195

The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as:
- MPI-based multi-node support for all matrix formats and most solvers;
- full DPC++/SYCL support,
- functionality and interface for GPU-resident sparse direct solvers,
- an interface for wrapping solvers with scaling and reordering applied,
- a new algebraic Multigrid solver/preconditioner,
- improved mixed-precision support,
- support for device matrix assembly,

and much more.

If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions).

Supported systems and requirements:
+ For all platforms, CMake 3.13+
+ C++14 compliant compiler
+ Linux and macOS
  + GCC: 5.5+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + NVHPC: 22.7+
  + Cray Compiler: 14.0.1+
  + CUDA module: CUDA 9.2+ or NVHPC 22.7+
  + HIP module: ROCm 4.0+
  + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: GCC 5.5+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.2+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148))
+ Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110),  [#1142](#1142))
+ Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082))
+ Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059))
+ Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982),  [#986](#986))
+ Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926))
+ Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117))
+ Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965))
+ Add support for mixed real/complex BLAS operations ([#864](#864))
+ Add a FFT LinOp for all but DPC++/SYCL ([#701](#701))
+ Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775))
+ Add CSR scaling ([#848](#848))
+ Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890))
+ Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901))
+ Add mixed precision SparsityCsr SpMV support ([#970](#970))
+ Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964))
+ Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942))


Deprecations and important changes:
+ Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)).
+ Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101))
+ Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052))
+ Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020))
+ Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)).
+ Drop official support for old CUDA < 9.2 ([#887](#887))


Improved performance additions:
+ Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028))
+ Add HIP unsafe atomic option for AMD ([#1091](#1091))
+ Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)).
+ Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809))


Fixes:
+ Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189))
+ Fix issues with hwloc-related tests ([#1074](#1074))
+ Fix include headers for GCC 12 ([#1071](#1071))
+ Fix for simple-solver-logging example ([#1066](#1066))
+ Fix for potential memory leak in Logger ([#1056](#1056))
+ Fix logging of mixin classes ([#1037](#1037))
+ Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753))
+ Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978))
+ Fix uninitialized data ([#958](#958))
+ Fix CUDA version requirement for cusparseSpSM ([#953](#953))
+ Fix several issues within bash-script ([#1016](#1016))
+ Fixes for `NVHPC` compiler support ([#1194](#1194))


Other additions:
+ Simplify and properly name GMRES kernels ([#861](#861))
+ Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109))
+ Improve gdb pretty printer ([#987](#987), [#1114](#1114))
+ Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035))
+ Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032))
+ Better CSR strategy defaults ([#969](#969))
+ Add `move_from` to `PolymorphicObject` ([#997](#997))
+ Remove unnecessary device_guard usage ([#956](#956))
+ Improvements to the generic accessor for mixed-precision ([#727](#727))
+ Add a naive lower triangular solver implementation for CUDA ([#764](#764))
+ Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897))
+ Add a L1 norm implementation ([#900](#900))
+ Add reduce_add for arrays ([#831](#831))
+ Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)).
+ Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123))
+ Make IDR random initilization deterministic ([#1116](#1116))
+ Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088))
+ Update CUDA archCoresPerSM ([#1175](#1116))
+ Add kernels for Csr sparsity pattern lookup ([#994](#994))
+ Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027))
+ Add a binary IO format for matrix data ([#984](#984))
+ Add a tuple zip_iterator implementation ([#966](#966))
+ Simplify kernel stubs and declarations ([#888](#888))
+ Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859))
+ Simplify copy to device in tests and examples ([#863](#863))
+ More verbose output to array assertions ([#858](#858))
+ Allow parallel compilation for Jacobi kernels ([#871](#871))
+ Change clang-format pointer alignment to left ([#872](#872))
+ Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137))
+ Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154))
+ Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903),  [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. 1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. mod:mpi This is related to the MPI module reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. reg:testing This is related to testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants