Distributed pgm #1403

yhmtsai · 2023-09-06T12:58:24Z

This pr adds distributed pgm support.
The distributed pgm only creates the aggregation map locally. That is, there's no aggregation across ranks.

build the aggregation map locally
communicate the aggregation map
aggregate the off-diagonal part
create coarse matrix

There are two new constructors for distributed matrix.

create the matrix by the existing mapping information

TODO:

it uses dispatch now ~~move the new constructor of distributed matrix to another pr. check the local-only matrix more carefully.~~

MarcelKoch

My initial impression is that we really need to update our distributed matrix constructors. Allowing creating it from some pre-existing communication data + linops seems crucial and should be the first step (probably in another PR).

MarcelKoch · 2023-09-11T08:06:43Z

include/ginkgo/core/distributed/matrix.hpp

+    explicit Matrix(std::shared_ptr<const Executor> exec,
+                    mpi::communicator comm, dim<2> size,
+                    std::shared_ptr<LinOp> local_linop);


What is this used for? If you are using only a local linop without communication (i.e. no send/recv sizes, offsets, etc.) then you can just use the linop directly.

include/ginkgo/core/distributed/matrix.hpp

greole

Some quick comments

test/mpi/multigrid/pgm.cpp

include/ginkgo/core/distributed/matrix.hpp

greole · 2023-09-27T07:04:25Z

core/multigrid/pgm.cpp

+{
+    using csr_type = matrix::Csr<ValueType, IndexType>;
+#if GINKGO_BUILD_MPI
+    if (auto matrix = std::dynamic_pointer_cast<


This implementation is quite lengthy, maybe you can split that up into several sub methods to make the if/else clause easier to read.

greole · 2023-09-27T07:06:07Z

core/multigrid/pgm.cpp

+    if (auto matrix = std::dynamic_pointer_cast<
+            const experimental::distributed::MatrixBase<IndexType>>(
+            system_matrix_)) {
+        // only work for the square local matrix


This comment seems to be out of place. I cannot see testing for the "squareness" of the matrix here.

greole · 2023-09-27T07:13:07Z

core/multigrid/pgm.cpp

@@ -227,10 +237,238 @@ void Pgm<ValueType, IndexType>::generate()

    // Construct the coarse matrix
    // TODO: improve it


Suggested change

// TODO: improve it

Either this should be removed, or more specific on how to improve it.

include/ginkgo/core/multigrid/pgm.hpp

MarcelKoch · 2023-09-28T14:40:01Z

core/multigrid/pgm.cpp

+        communicate(matrix, agg_, non_local_agg);
+        // generate non_local_col_map
+        non_local_agg.set_executor(exec->get_master());
+        array<IndexType> non_local_col_map(exec->get_master(), non_local_size);


I think computing the non_local_col_map here can be merged with a similar part in the Matrix::read_distributed. In both cases, what we are doing is, for an index sequence that is segmented by keys (the target rank), we map each index into the interval [0, U), where U is the number of unique indices in that segment. I will extract the relevant parts. That should simplify the following code a lot.

The branch compress-indices contains the relevant part of read_distributed extracted. Using the part_id as keys and the non_local_agg as indices should give you a mapping from the fine non-local columns to the coarse non-local columns. Since the output will also contain the part_ids for the coarse unique columns, it should also simplify computing the gather_idxs, and recv_sizes. I would guess that both can be computed nearly identically to the approach in read_distibuted

MarcelKoch · 2023-10-02T14:42:40Z

core/multigrid/pgm.cpp

+            non_local_col_map.get_data()[index.get_data()[i]] =
+                renumber.get_data()[i];
+        }
+        // get new recv_size and recv_offsets


I think you can also easily compute the send_sizes/offsets here, instead of communicating them.

pratikvn

In general, LGTM. But I think some of Marcel's comments regarding the simplifications and unification with the distributed code has not yet been completed ?

I am also not sure of the merge order here. Unlike distributed multigrid, this does heavily depend on the distributed stack, so maybe that should be merged first ?

pratikvn · 2024-04-25T20:27:55Z

common/unified/multigrid/pgm_kernels.cpp

@@ -311,6 +311,22 @@ GKO_INSTANTIATE_FOR_EACH_NON_COMPLEX_VALUE_AND_INDEX_TYPE(
    GKO_DECLARE_PGM_ASSIGN_TO_EXIST_AGG);


+template <typename IndexType>
+void gather_index(std::shared_ptr<const DefaultExecutor> exec,


nit:

Suggested change

void gather_index(std::shared_ptr<const DefaultExecutor> exec,

void gather_indices(std::shared_ptr<const DefaultExecutor> exec,

we also use gather_index in dense, so I keep the same name

pratikvn · 2024-04-25T20:29:41Z

core/distributed/matrix.cpp

+    this->set_size(size);
+    local_mtx_ = local_linop;
+    non_local_mtx_ = non_local_linop;
+    recv_offsets_ = recv_offsets;
+    recv_sizes_ = recv_sizes;
+    recv_gather_idxs_ = recv_gather_idxs;
+    // build send information from recv copy
+    // exchange step 1: determine recv_sizes, send_sizes, send_offsets
+    std::partial_sum(recv_sizes_.begin(), recv_sizes_.end(),
+                     recv_offsets_.begin() + 1);
+    comm.all_to_all(exec, recv_sizes_.data(), 1, send_sizes_.data(), 1);
+    std::partial_sum(send_sizes_.begin(), send_sizes_.end(),
+                     send_offsets_.begin() + 1);
+    send_offsets_[0] = 0;
+    recv_offsets_[0] = 0;
+
+    // exchange step 2: exchange gather_idxs from receivers to senders
+    auto use_host_buffer = mpi::requires_host_buffer(exec, comm);
+    if (use_host_buffer) {
+        recv_gather_idxs_.set_executor(exec->get_master());
+        gather_idxs_.clear();
+        gather_idxs_.set_executor(exec->get_master());
+    }
+    gather_idxs_.resize_and_reset(send_offsets_.back());
+    comm.all_to_all_v(use_host_buffer ? exec->get_master() : exec,
+                      recv_gather_idxs_.get_const_data(), recv_sizes_.data(),
+                      recv_offsets_.data(), gather_idxs_.get_data(),
+                      send_sizes_.data(), send_offsets_.data());
+    if (use_host_buffer) {
+        gather_idxs_.set_executor(exec);
+        recv_gather_idxs_.set_executor(exec);
+    }
+
+    one_scalar_.init(exec, dim<2>{1, 1});
+    one_scalar_->fill(one<value_type>());
+}


Do we merge this PR first, or the distributed stack first ?

test/mpi/matrix.cpp

core/multigrid/pgm.cpp

greole

Some comments from my side. Also, do we have an example or test demonstrating the use of distributed PGM in combination with the distributed multigrid solver?

greole · 2024-04-26T07:11:15Z

test/mpi/multigrid/pgm.cpp

+    // the rank 2 part of non local matrix of rank 1 are reordered.
+    // [0 -1 -2 0], 1st and 3rd are aggregated to the first group but the rest
+    // are aggregated to the second group. Thus, the aggregated result should be
+    // [-2 -1] not [-1, -2]


I find this comment quite hard to understand.

I try to update it a little bit. I hope it is better now.

include/ginkgo/core/distributed/matrix.hpp

greole · 2024-04-26T07:18:04Z

include/ginkgo/core/distributed/matrix.hpp

@@ -533,6 +585,31 @@ class Matrix
                                          mpi::communicator comm, dim<2> size,
                                          std::shared_ptr<LinOp> local_linop);

+    /**
+     * Creates distributed matrix with existent local and non-local LinOp and
+     * the corresponding mapping.


What exactly is the mapping here? I don't find it in the list of params either?

greole · 2024-04-26T07:24:45Z

core/distributed/matrix.cpp

@@ -89,6 +88,62 @@ Matrix<ValueType, LocalIndexType, GlobalIndexType>::Matrix(
    local_mtx_ = local_linop;
 }

+template <typename ValueType, typename LocalIndexType, typename GlobalIndexType>
+Matrix<ValueType, LocalIndexType, GlobalIndexType>::Matrix(


This introduces a new way to create and fill distributed matrices compared to the read_distributed approach. I find that a bit confusing because we don't really communicate why in one situation on use Matrix::create(..) followed by Matrix::read_distributed() vs Matrix::create(..) and passing the local and non-local matrix directly.

include/ginkgo/core/distributed/matrix.hpp

greole

LGTM.

include/ginkgo/core/multigrid/pgm.hpp

greole · 2024-05-07T06:13:52Z

core/base/dispatch_helper.hpp

+#if __cplusplus < 201703L
+    using ReturnType = std::result_of_t<Func(std::shared_ptr<K>, Args...)>;


Maybe add a quick comment why this is needed here. Otherwise after a while it will be hard to maintain.

core/base/dispatch_helper.hpp

pratikvn

LGTM! A minor concern about using implictly assuming that an array is on the host. I think an assertion GKO_ASSERT(exec == exec->get_master()) would probably be sufficient in those functions.

pratikvn · 2024-05-07T06:25:52Z

include/ginkgo/core/distributed/matrix.hpp

@@ -242,6 +252,7 @@ class Matrix
    friend class EnableDistributedPolymorphicObject<Matrix, LinOp>;
    friend class Matrix<next_precision<ValueType>, LocalIndexType,
                        GlobalIndexType>;
+    friend class multigrid::Pgm<ValueType, LocalIndexType>;


This is a bit weird to me. I guess this is because of the need to access internal members of Matrix ? I guess this is also temporary then, and probably will be removed later ?

Yes, because previous approach require additional features class with many functions. Because we only use these in PGM currently and there are some future changes, we use friend to put these still internal. When we have more than one class for that, we definitely should think about how to extract them to feature class or alwasy use dispatch.

pratikvn · 2024-05-07T06:33:19Z

reference/multigrid/pgm_kernels.cpp

+void gather_index(std::shared_ptr<const DefaultExecutor> exec,
+                  size_type num_res, const IndexType* orig,
+                  const IndexType* gather_map, IndexType* result)
+{
+    for (size_type i = 0; i < num_res; ++i) {
+        result[i] = orig[gather_map[i]];
+    }
+}
+
+GKO_INSTANTIATE_FOR_EACH_INDEX_TYPE(GKO_DECLARE_PGM_GATHER_INDEX);


I think this kernel could also be useful in other places. Maybe move it to reference/components ?

test/mpi/matrix.cpp

core/multigrid/pgm.cpp

core/base/dispatch_helper.hpp

Co-authored-by: Gregor Olenik <gregor.olenik@web.de> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Co-authored-by: Gregor Olenik <gregor.olenik@web.de> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

sonarqubecloud · 2024-05-10T11:22:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

MarcelKoch reviewed Sep 11, 2023

View reviewed changes

upsj requested review from greole and upsj September 18, 2023 12:38

greole reviewed Sep 27, 2023

View reviewed changes

MarcelKoch reviewed Sep 28, 2023

View reviewed changes

MarcelKoch reviewed Oct 2, 2023

View reviewed changes

pratikvn self-requested a review October 8, 2023 12:43

pratikvn added the type:distributed-functionality label Dec 8, 2023

yhmtsai force-pushed the distributed_pgm branch from f6c6240 to 58e6482 Compare December 21, 2023 14:33

yhmtsai force-pushed the distributed_pgm branch from 58e6482 to bf971dc Compare January 17, 2024 07:30

MarcelKoch force-pushed the distributed_pgm branch from bf971dc to 60889dc Compare February 14, 2024 12:48

yhmtsai force-pushed the distributed_pgm branch from 60889dc to bf971dc Compare February 14, 2024 13:51

MarcelKoch self-requested a review April 5, 2024 09:25

MarcelKoch added this to the Ginkgo 1.8.0 milestone Apr 5, 2024

yhmtsai self-assigned this Apr 8, 2024

pratikvn mentioned this pull request Apr 18, 2024

Add Distributed Multigrid. #1269

Merged

MarcelKoch requested a review from greole April 19, 2024 09:23

yhmtsai force-pushed the distributed_pgm branch from bf971dc to b795413 Compare April 20, 2024 14:36

pratikvn reviewed Apr 25, 2024

View reviewed changes

greole reviewed Apr 26, 2024

View reviewed changes

yhmtsai force-pushed the distributed_pgm branch from b795413 to abeee26 Compare May 5, 2024 14:44

MarcelKoch mentioned this pull request May 6, 2024

Adds distributed row gatherer #1589

Open

4 tasks

yhmtsai force-pushed the distributed_pgm branch from abeee26 to 1d19eb8 Compare May 6, 2024 20:40

yhmtsai added the 1:ST:ready-for-review This PR is ready for review label May 6, 2024

yhmtsai requested review from greole and pratikvn May 6, 2024 21:39

greole approved these changes May 7, 2024

View reviewed changes

pratikvn approved these changes May 7, 2024

View reviewed changes

greole reviewed May 8, 2024

View reviewed changes

core/base/dispatch_helper.hpp Outdated Show resolved Hide resolved

core/base/dispatch_helper.hpp Outdated Show resolved Hide resolved

yhmtsai force-pushed the distributed_pgm branch 2 times, most recently from a872e36 to 636bc71 Compare May 8, 2024 08:01

yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels May 8, 2024

yhmtsai and others added 11 commits May 9, 2024 10:00

move the local generation to another function

25b1e64

draft of implementation

f721746

fix/check the matrix constructor and apply

511f4e6

reference works correctly

d4426bb

work for different device

3277934

setup fine op and global index type

1fcd6bc

fix typo, format, and mpi enabled function

5a61170

update documentation, extract some lengthy part to function

f7f8ebd

Co-authored-by: Gregor Olenik <gregor.olenik@web.de> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

remove the index getter feature

ddb5283

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

update documentation, add assert, test

7152b9f

Co-authored-by: Gregor Olenik <gregor.olenik@web.de> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>

fix ambiguous on empty constructor

8667fa1

yhmtsai force-pushed the distributed_pgm branch from a10beb2 to 8667fa1 Compare May 9, 2024 08:00

yhmtsai merged commit 69fdc85 into develop May 9, 2024
12 of 15 checks passed

yhmtsai deleted the distributed_pgm branch May 9, 2024 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed pgm #1403

Distributed pgm #1403

yhmtsai commented Sep 6, 2023 •

edited

Loading

MarcelKoch left a comment

MarcelKoch Sep 11, 2023

greole left a comment

greole Sep 27, 2023

greole Sep 27, 2023

greole Sep 27, 2023

MarcelKoch Sep 28, 2023

MarcelKoch Oct 2, 2023

MarcelKoch Oct 2, 2023

pratikvn left a comment

pratikvn Apr 25, 2024

yhmtsai May 2, 2024

pratikvn Apr 25, 2024

greole left a comment

greole Apr 26, 2024

yhmtsai May 6, 2024

greole Apr 26, 2024

greole Apr 26, 2024

greole left a comment

greole May 7, 2024

pratikvn left a comment

pratikvn May 7, 2024

yhmtsai May 7, 2024

pratikvn May 7, 2024

sonarqubecloud bot commented May 10, 2024

		@@ -227,10 +237,238 @@ void Pgm<ValueType, IndexType>::generate()

		// Construct the coarse matrix
		// TODO: improve it

	void gather_index(std::shared_ptr<const DefaultExecutor> exec,
	void gather_indices(std::shared_ptr<const DefaultExecutor> exec,

		#if __cplusplus < 201703L
		using ReturnType = std::result_of_t<Func(std::shared_ptr<K>, Args...)>;

Distributed pgm #1403

Distributed pgm #1403

Conversation

yhmtsai commented Sep 6, 2023 • edited Loading

MarcelKoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greole left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greole left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

greole left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented May 10, 2024

Quality Gate passed

yhmtsai commented Sep 6, 2023 •

edited

Loading