Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed pgm #1403

Merged
merged 11 commits into from
May 9, 2024
Merged

Distributed pgm #1403

merged 11 commits into from
May 9, 2024

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Sep 6, 2023

This pr adds distributed pgm support.
The distributed pgm only creates the aggregation map locally. That is, there's no aggregation across ranks.

  1. build the aggregation map locally
  2. communicate the aggregation map
  3. aggregate the off-diagonal part
  4. create coarse matrix

There are two new constructors for distributed matrix.

  1. create the matrix by the existing mapping information

TODO:

  • it uses dispatch now move the new constructor of distributed matrix to another pr. check the local-only matrix more carefully.

@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:reference This is related to the reference module. mod:hip This is related to the HIP module. type:multigrid This is related to multigrid labels Sep 6, 2023
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial impression is that we really need to update our distributed matrix constructors. Allowing creating it from some pre-existing communication data + linops seems crucial and should be the first step (probably in another PR).

Comment on lines 590 to 586
explicit Matrix(std::shared_ptr<const Executor> exec,
mpi::communicator comm, dim<2> size,
std::shared_ptr<LinOp> local_linop);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? If you are using only a local linop without communication (i.e. no send/recv sizes, offsets, etc.) then you can just use the linop directly.

@upsj upsj requested review from greole and upsj September 18, 2023 12:38
Copy link
Collaborator

@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments

{
using csr_type = matrix::Csr<ValueType, IndexType>;
#if GINKGO_BUILD_MPI
if (auto matrix = std::dynamic_pointer_cast<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is quite lengthy, maybe you can split that up into several sub methods to make the if/else clause easier to read.

if (auto matrix = std::dynamic_pointer_cast<
const experimental::distributed::MatrixBase<IndexType>>(
system_matrix_)) {
// only work for the square local matrix
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems to be out of place. I cannot see testing for the "squareness" of the matrix here.

@@ -227,10 +237,238 @@ void Pgm<ValueType, IndexType>::generate()

// Construct the coarse matrix
// TODO: improve it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: improve it

Either this should be removed, or more specific on how to improve it.

communicate(matrix, agg_, non_local_agg);
// generate non_local_col_map
non_local_agg.set_executor(exec->get_master());
array<IndexType> non_local_col_map(exec->get_master(), non_local_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think computing the non_local_col_map here can be merged with a similar part in the Matrix::read_distributed. In both cases, what we are doing is, for an index sequence that is segmented by keys (the target rank), we map each index into the interval [0, U), where U is the number of unique indices in that segment. I will extract the relevant parts. That should simplify the following code a lot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch compress-indices contains the relevant part of read_distributed extracted. Using the part_id as keys and the non_local_agg as indices should give you a mapping from the fine non-local columns to the coarse non-local columns. Since the output will also contain the part_ids for the coarse unique columns, it should also simplify computing the gather_idxs, and recv_sizes. I would guess that both can be computed nearly identically to the approach in read_distibuted

non_local_col_map.get_data()[index.get_data()[i]] =
renumber.get_data()[i];
}
// get new recv_size and recv_offsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also easily compute the send_sizes/offsets here, instead of communicating them.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM. But I think some of Marcel's comments regarding the simplifications and unification with the distributed code has not yet been completed ?

I am also not sure of the merge order here. Unlike distributed multigrid, this does heavily depend on the distributed stack, so maybe that should be merged first ?

@@ -311,6 +311,22 @@ GKO_INSTANTIATE_FOR_EACH_NON_COMPLEX_VALUE_AND_INDEX_TYPE(
GKO_DECLARE_PGM_ASSIGN_TO_EXIST_AGG);


template <typename IndexType>
void gather_index(std::shared_ptr<const DefaultExecutor> exec,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
void gather_index(std::shared_ptr<const DefaultExecutor> exec,
void gather_indices(std::shared_ptr<const DefaultExecutor> exec,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also use gather_index in dense, so I keep the same name

Comment on lines +110 to +148
this->set_size(size);
local_mtx_ = local_linop;
non_local_mtx_ = non_local_linop;
recv_offsets_ = recv_offsets;
recv_sizes_ = recv_sizes;
recv_gather_idxs_ = recv_gather_idxs;
// build send information from recv copy
// exchange step 1: determine recv_sizes, send_sizes, send_offsets
std::partial_sum(recv_sizes_.begin(), recv_sizes_.end(),
recv_offsets_.begin() + 1);
comm.all_to_all(exec, recv_sizes_.data(), 1, send_sizes_.data(), 1);
std::partial_sum(send_sizes_.begin(), send_sizes_.end(),
send_offsets_.begin() + 1);
send_offsets_[0] = 0;
recv_offsets_[0] = 0;

// exchange step 2: exchange gather_idxs from receivers to senders
auto use_host_buffer = mpi::requires_host_buffer(exec, comm);
if (use_host_buffer) {
recv_gather_idxs_.set_executor(exec->get_master());
gather_idxs_.clear();
gather_idxs_.set_executor(exec->get_master());
}
gather_idxs_.resize_and_reset(send_offsets_.back());
comm.all_to_all_v(use_host_buffer ? exec->get_master() : exec,
recv_gather_idxs_.get_const_data(), recv_sizes_.data(),
recv_offsets_.data(), gather_idxs_.get_data(),
send_sizes_.data(), send_offsets_.data());
if (use_host_buffer) {
gather_idxs_.set_executor(exec);
recv_gather_idxs_.set_executor(exec);
}

one_scalar_.init(exec, dim<2>{1, 1});
one_scalar_->fill(one<value_type>());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we merge this PR first, or the distributed stack first ?

Copy link
Collaborator

@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from my side. Also, do we have an example or test demonstrating the use of distributed PGM in combination with the distributed multigrid solver?

Comment on lines 109 to 112
// the rank 2 part of non local matrix of rank 1 are reordered.
// [0 -1 -2 0], 1st and 3rd are aggregated to the first group but the rest
// are aggregated to the second group. Thus, the aggregated result should be
// [-2 -1] not [-1, -2]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this comment quite hard to understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to update it a little bit. I hope it is better now.

@@ -533,6 +585,31 @@ class Matrix
mpi::communicator comm, dim<2> size,
std::shared_ptr<LinOp> local_linop);

/**
* Creates distributed matrix with existent local and non-local LinOp and
* the corresponding mapping.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the mapping here? I don't find it in the list of params either?

@@ -89,6 +88,62 @@ Matrix<ValueType, LocalIndexType, GlobalIndexType>::Matrix(
local_mtx_ = local_linop;
}

template <typename ValueType, typename LocalIndexType, typename GlobalIndexType>
Matrix<ValueType, LocalIndexType, GlobalIndexType>::Matrix(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new way to create and fill distributed matrices compared to the read_distributed approach. I find that a bit confusing because we don't really communicate why in one situation on use Matrix::create(..) followed by Matrix::read_distributed() vs Matrix::create(..) and passing the local and non-local matrix directly.

@yhmtsai yhmtsai force-pushed the distributed_pgm branch from b795413 to abeee26 Compare May 5, 2024 14:44
@MarcelKoch MarcelKoch mentioned this pull request May 6, 2024
4 tasks
@yhmtsai yhmtsai force-pushed the distributed_pgm branch from abeee26 to 1d19eb8 Compare May 6, 2024 20:40
@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label May 6, 2024
@yhmtsai yhmtsai requested review from greole and pratikvn May 6, 2024 21:39
Copy link
Collaborator

@greole greole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines 177 to 182
#if __cplusplus < 201703L
using ReturnType = std::result_of_t<Func(std::shared_ptr<K>, Args...)>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a quick comment why this is needed here. Otherwise after a while it will be hard to maintain.

Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A minor concern about using implictly assuming that an array is on the host. I think an assertion GKO_ASSERT(exec == exec->get_master()) would probably be sufficient in those functions.

@@ -242,6 +252,7 @@ class Matrix
friend class EnableDistributedPolymorphicObject<Matrix, LinOp>;
friend class Matrix<next_precision<ValueType>, LocalIndexType,
GlobalIndexType>;
friend class multigrid::Pgm<ValueType, LocalIndexType>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird to me. I guess this is because of the need to access internal members of Matrix ? I guess this is also temporary then, and probably will be removed later ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because previous approach require additional features class with many functions. Because we only use these in PGM currently and there are some future changes, we use friend to put these still internal. When we have more than one class for that, we definitely should think about how to extract them to feature class or alwasy use dispatch.

Comment on lines +324 to +333
void gather_index(std::shared_ptr<const DefaultExecutor> exec,
size_type num_res, const IndexType* orig,
const IndexType* gather_map, IndexType* result)
{
for (size_type i = 0; i < num_res; ++i) {
result[i] = orig[gather_map[i]];
}
}

GKO_INSTANTIATE_FOR_EACH_INDEX_TYPE(GKO_DECLARE_PGM_GATHER_INDEX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this kernel could also be useful in other places. Maybe move it to reference/components ?

@yhmtsai yhmtsai force-pushed the distributed_pgm branch 2 times, most recently from a872e36 to 636bc71 Compare May 8, 2024 08:01
@yhmtsai yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels May 8, 2024
@yhmtsai yhmtsai force-pushed the distributed_pgm branch from a10beb2 to 8667fa1 Compare May 9, 2024 08:00
@yhmtsai yhmtsai merged commit 69fdc85 into develop May 9, 2024
12 of 15 checks passed
@yhmtsai yhmtsai deleted the distributed_pgm branch May 9, 2024 12:23
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. mod:reference This is related to the reference module. reg:build This is related to the build system. type:distributed-functionality type:multigrid This is related to multigrid
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants