-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable Half in mpi #1759
enable Half in mpi #1759
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly concerned about using device buffers for the custom operations, and maybe moving the operations into a private header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing the heap allocation for predefined mpi ops, rest looks good.
} // namespace detail | ||
|
||
|
||
using op_manager = std::shared_ptr<MPI_Op>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just store the MPI_Op in a struct, so that you don't need to allocate/free anything for predefined MPI_Ops.
You could also keep the unique_ptr for the custom op, and only use the struct for the predefined ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it to class. Is it something in your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that looks like a good approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some using ...
statements are unused, but otherwise LGTM!
core/test/mpi/base/bindings.cpp
Outdated
namespace detail { | ||
|
||
|
||
template <typename ValueType> | ||
inline void min(void* input, void* output, int* len, MPI_Datatype* datatype) | ||
{ | ||
ValueType* input_ptr = static_cast<ValueType*>(input); | ||
ValueType* output_ptr = static_cast<ValueType*>(output); | ||
for (int i = 0; i < *len; i++) { | ||
if (input_ptr[i] < output_ptr[i]) { | ||
output_ptr[i] = input_ptr[i]; | ||
} | ||
} | ||
} | ||
|
||
|
||
} // namespace detail | ||
|
||
|
||
using gko::experimental::mpi::op_manager; | ||
|
||
template <typename ValueType, | ||
std::enable_if_t<std::is_arithmetic_v<ValueType>>* = nullptr> | ||
inline op_manager min() | ||
{ | ||
return op_manager( | ||
[]() { | ||
MPI_Op* operation = new MPI_Op; | ||
*operation = MPI_MIN; | ||
return operation; | ||
}(), | ||
[](MPI_Op* op) { delete op; }); | ||
} | ||
|
||
template <typename ValueType, | ||
std::enable_if_t<!std::is_arithmetic_v<ValueType>>* = nullptr> | ||
inline op_manager min() | ||
{ | ||
return op_manager( | ||
[]() { | ||
MPI_Op* operation = new MPI_Op; | ||
MPI_Op_create(&detail::min<ValueType>, 1, operation); | ||
return operation; | ||
}(), | ||
[](MPI_Op* op) { | ||
MPI_Op_free(op); | ||
delete op; | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be moved to mpi_op.hpp
, or entirely removed ? Or is there a reason to have it only here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only implement min
here because we currently only use it here.
// OpenMPI 5.0 have support from MPIX_C_FLOAT16 and MPICHv3.4a1 MPIX_C_FLOAT16 | ||
// Only OpenMPI support complex half | ||
// TODO: use native type when mpi is configured with half feature | ||
GKO_REGISTER_MPI_TYPE(half, MPI_UNSIGNED_SHORT); | ||
GKO_REGISTER_MPI_TYPE(std::complex<half>, MPI_FLOAT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will also need to consider whether other MPI implementations also natively support half, if we just want to use only the native support. Suppercomputers use their own variants: Cray-MPICH (Maybe this is similar to MPICH), IntelMPI etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have discussed it with @MarcelKoch . I think we will go for the custom implementation now, then later we might check whether to do native support. I had tried something in deb12f0 and it already shows it quite not consistent between OpenMPI and MPICH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only two small nits left.
} // namespace detail | ||
|
||
|
||
using op_manager = std::shared_ptr<MPI_Op>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that looks like a good approach.
742c5e1
to
35d52f2
Compare
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
|
This PR enables half precision in distributed environment by adding custom operation.
one-side operation like accumulation and fetch_and_op does not support custom operation.
Note. Newer version of mpi might support half precision natively (also for one-side operation) if the administrator build it with compiler supporting native half precision and enable the option.
TODO:
put the custom operation in gko::comm?it does not grow along with #nodes -> create/free when necessary