-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed pgm #1403
Distributed pgm #1403
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial impression is that we really need to update our distributed matrix constructors. Allowing creating it from some pre-existing communication data + linops seems crucial and should be the first step (probably in another PR).
explicit Matrix(std::shared_ptr<const Executor> exec, | ||
mpi::communicator comm, dim<2> size, | ||
std::shared_ptr<LinOp> local_linop); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this used for? If you are using only a local linop without communication (i.e. no send/recv sizes, offsets, etc.) then you can just use the linop directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick comments
core/multigrid/pgm.cpp
Outdated
{ | ||
using csr_type = matrix::Csr<ValueType, IndexType>; | ||
#if GINKGO_BUILD_MPI | ||
if (auto matrix = std::dynamic_pointer_cast< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is quite lengthy, maybe you can split that up into several sub methods to make the if/else clause easier to read.
core/multigrid/pgm.cpp
Outdated
if (auto matrix = std::dynamic_pointer_cast< | ||
const experimental::distributed::MatrixBase<IndexType>>( | ||
system_matrix_)) { | ||
// only work for the square local matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems to be out of place. I cannot see testing for the "squareness" of the matrix here.
@@ -227,10 +237,238 @@ void Pgm<ValueType, IndexType>::generate() | |||
|
|||
// Construct the coarse matrix | |||
// TODO: improve it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO: improve it |
Either this should be removed, or more specific on how to improve it.
core/multigrid/pgm.cpp
Outdated
communicate(matrix, agg_, non_local_agg); | ||
// generate non_local_col_map | ||
non_local_agg.set_executor(exec->get_master()); | ||
array<IndexType> non_local_col_map(exec->get_master(), non_local_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think computing the non_local_col_map
here can be merged with a similar part in the Matrix::read_distributed
. In both cases, what we are doing is, for an index sequence that is segmented by keys (the target rank), we map each index into the interval [0, U)
, where U
is the number of unique indices in that segment. I will extract the relevant parts. That should simplify the following code a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch compress-indices
contains the relevant part of read_distributed
extracted. Using the part_id as keys and the non_local_agg as indices should give you a mapping from the fine non-local columns to the coarse non-local columns. Since the output will also contain the part_ids for the coarse unique columns, it should also simplify computing the gather_idxs, and recv_sizes. I would guess that both can be computed nearly identically to the approach in read_distibuted
core/multigrid/pgm.cpp
Outdated
non_local_col_map.get_data()[index.get_data()[i]] = | ||
renumber.get_data()[i]; | ||
} | ||
// get new recv_size and recv_offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can also easily compute the send_sizes/offsets here, instead of communicating them.
f6c6240
to
58e6482
Compare
58e6482
to
bf971dc
Compare
bf971dc
to
60889dc
Compare
60889dc
to
bf971dc
Compare
bf971dc
to
b795413
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM. But I think some of Marcel's comments regarding the simplifications and unification with the distributed code has not yet been completed ?
I am also not sure of the merge order here. Unlike distributed multigrid, this does heavily depend on the distributed stack, so maybe that should be merged first ?
@@ -311,6 +311,22 @@ GKO_INSTANTIATE_FOR_EACH_NON_COMPLEX_VALUE_AND_INDEX_TYPE( | |||
GKO_DECLARE_PGM_ASSIGN_TO_EXIST_AGG); | |||
|
|||
|
|||
template <typename IndexType> | |||
void gather_index(std::shared_ptr<const DefaultExecutor> exec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
void gather_index(std::shared_ptr<const DefaultExecutor> exec, | |
void gather_indices(std::shared_ptr<const DefaultExecutor> exec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also use gather_index
in dense, so I keep the same name
this->set_size(size); | ||
local_mtx_ = local_linop; | ||
non_local_mtx_ = non_local_linop; | ||
recv_offsets_ = recv_offsets; | ||
recv_sizes_ = recv_sizes; | ||
recv_gather_idxs_ = recv_gather_idxs; | ||
// build send information from recv copy | ||
// exchange step 1: determine recv_sizes, send_sizes, send_offsets | ||
std::partial_sum(recv_sizes_.begin(), recv_sizes_.end(), | ||
recv_offsets_.begin() + 1); | ||
comm.all_to_all(exec, recv_sizes_.data(), 1, send_sizes_.data(), 1); | ||
std::partial_sum(send_sizes_.begin(), send_sizes_.end(), | ||
send_offsets_.begin() + 1); | ||
send_offsets_[0] = 0; | ||
recv_offsets_[0] = 0; | ||
|
||
// exchange step 2: exchange gather_idxs from receivers to senders | ||
auto use_host_buffer = mpi::requires_host_buffer(exec, comm); | ||
if (use_host_buffer) { | ||
recv_gather_idxs_.set_executor(exec->get_master()); | ||
gather_idxs_.clear(); | ||
gather_idxs_.set_executor(exec->get_master()); | ||
} | ||
gather_idxs_.resize_and_reset(send_offsets_.back()); | ||
comm.all_to_all_v(use_host_buffer ? exec->get_master() : exec, | ||
recv_gather_idxs_.get_const_data(), recv_sizes_.data(), | ||
recv_offsets_.data(), gather_idxs_.get_data(), | ||
send_sizes_.data(), send_offsets_.data()); | ||
if (use_host_buffer) { | ||
gather_idxs_.set_executor(exec); | ||
recv_gather_idxs_.set_executor(exec); | ||
} | ||
|
||
one_scalar_.init(exec, dim<2>{1, 1}); | ||
one_scalar_->fill(one<value_type>()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we merge this PR first, or the distributed stack first ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments from my side. Also, do we have an example or test demonstrating the use of distributed PGM in combination with the distributed multigrid solver?
test/mpi/multigrid/pgm.cpp
Outdated
// the rank 2 part of non local matrix of rank 1 are reordered. | ||
// [0 -1 -2 0], 1st and 3rd are aggregated to the first group but the rest | ||
// are aggregated to the second group. Thus, the aggregated result should be | ||
// [-2 -1] not [-1, -2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this comment quite hard to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to update it a little bit. I hope it is better now.
@@ -533,6 +585,31 @@ class Matrix | |||
mpi::communicator comm, dim<2> size, | |||
std::shared_ptr<LinOp> local_linop); | |||
|
|||
/** | |||
* Creates distributed matrix with existent local and non-local LinOp and | |||
* the corresponding mapping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly is the mapping here? I don't find it in the list of params either?
@@ -89,6 +88,62 @@ Matrix<ValueType, LocalIndexType, GlobalIndexType>::Matrix( | |||
local_mtx_ = local_linop; | |||
} | |||
|
|||
template <typename ValueType, typename LocalIndexType, typename GlobalIndexType> | |||
Matrix<ValueType, LocalIndexType, GlobalIndexType>::Matrix( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces a new way to create and fill distributed matrices compared to the read_distributed approach. I find that a bit confusing because we don't really communicate why in one situation on use Matrix::create(..)
followed by Matrix::read_distributed()
vs Matrix::create(..)
and passing the local and non-local matrix directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
core/base/dispatch_helper.hpp
Outdated
#if __cplusplus < 201703L | ||
using ReturnType = std::result_of_t<Func(std::shared_ptr<K>, Args...)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a quick comment why this is needed here. Otherwise after a while it will be hard to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A minor concern about using implictly assuming that an array is on the host. I think an assertion GKO_ASSERT(exec == exec->get_master())
would probably be sufficient in those functions.
@@ -242,6 +252,7 @@ class Matrix | |||
friend class EnableDistributedPolymorphicObject<Matrix, LinOp>; | |||
friend class Matrix<next_precision<ValueType>, LocalIndexType, | |||
GlobalIndexType>; | |||
friend class multigrid::Pgm<ValueType, LocalIndexType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit weird to me. I guess this is because of the need to access internal members of Matrix
? I guess this is also temporary then, and probably will be removed later ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because previous approach require additional features class with many functions. Because we only use these in PGM currently and there are some future changes, we use friend to put these still internal. When we have more than one class for that, we definitely should think about how to extract them to feature class or alwasy use dispatch.
void gather_index(std::shared_ptr<const DefaultExecutor> exec, | ||
size_type num_res, const IndexType* orig, | ||
const IndexType* gather_map, IndexType* result) | ||
{ | ||
for (size_type i = 0; i < num_res; ++i) { | ||
result[i] = orig[gather_map[i]]; | ||
} | ||
} | ||
|
||
GKO_INSTANTIATE_FOR_EACH_INDEX_TYPE(GKO_DECLARE_PGM_GATHER_INDEX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this kernel could also be useful in other places. Maybe move it to reference/components
?
a872e36
to
636bc71
Compare
Co-authored-by: Gregor Olenik <gregor.olenik@web.de> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Gregor Olenik <gregor.olenik@web.de> Co-authored-by: Pratik Nayak <pratikvn@protonmail.com>
|
This pr adds distributed pgm support.
The distributed pgm only creates the aggregation map locally. That is, there's no aggregation across ranks.
There are two new constructors for distributed matrix.
TODO:
move the new constructor of distributed matrix to another pr. check the local-only matrix more carefully.