enable Half in mpi #1759

yhmtsai · 2024-12-30T16:01:04Z

This PR enables half precision in distributed environment by adding custom operation.

one-side operation like accumulation and fetch_and_op does not support custom operation.

Note. Newer version of mpi might support half precision natively (also for one-side operation) if the administrator build it with compiler supporting native half precision and enable the option.

TODO:

enable the rest distributed function with half
~~put the custom operation in gko::comm?~~ it does not grow along with #nodes -> create/free when necessary

MarcelKoch

I'm mainly concerned about using device buffers for the custom operations, and maybe moving the operations into a private header.

include/ginkgo/core/base/mpi.hpp

include/ginkgo/core/distributed/vector.hpp

include/ginkgo/core/base/mpi.hpp

include/ginkgo/core/distributed/vector.hpp

MarcelKoch

I would suggest removing the heap allocation for predefined mpi ops, rest looks good.

MarcelKoch · 2025-01-16T11:13:00Z

core/mpi/mpi_op.hpp

+}  // namespace detail
+
+
+using op_manager = std::shared_ptr<MPI_Op>;


Maybe just store the MPI_Op in a struct, so that you don't need to allocate/free anything for predefined MPI_Ops.
You could also keep the unique_ptr for the custom op, and only use the struct for the predefined ones.

I have changed it to class. Is it something in your mind?

yes, that looks like a good approach.

core/test/utils.hpp

common/cuda_hip/distributed/assembly_kernels.cpp

pratikvn

Some using ... statements are unused, but otherwise LGTM!

pratikvn · 2025-01-22T09:45:02Z

core/test/mpi/base/bindings.cpp

+namespace detail {
+
+
+template <typename ValueType>
+inline void min(void* input, void* output, int* len, MPI_Datatype* datatype)
+{
+    ValueType* input_ptr = static_cast<ValueType*>(input);
+    ValueType* output_ptr = static_cast<ValueType*>(output);
+    for (int i = 0; i < *len; i++) {
+        if (input_ptr[i] < output_ptr[i]) {
+            output_ptr[i] = input_ptr[i];
+        }
+    }
+}
+
+
+}  // namespace detail
+
+
+using gko::experimental::mpi::op_manager;
+
+template <typename ValueType,
+          std::enable_if_t<std::is_arithmetic_v<ValueType>>* = nullptr>
+inline op_manager min()
+{
+    return op_manager(
+        []() {
+            MPI_Op* operation = new MPI_Op;
+            *operation = MPI_MIN;
+            return operation;
+        }(),
+        [](MPI_Op* op) { delete op; });
+}
+
+template <typename ValueType,
+          std::enable_if_t<!std::is_arithmetic_v<ValueType>>* = nullptr>
+inline op_manager min()
+{
+    return op_manager(
+        []() {
+            MPI_Op* operation = new MPI_Op;
+            MPI_Op_create(&detail::min<ValueType>, 1, operation);
+            return operation;
+        }(),
+        [](MPI_Op* op) {
+            MPI_Op_free(op);
+            delete op;
+        });
+}


Can be moved to mpi_op.hpp, or entirely removed ? Or is there a reason to have it only here ?

I only implement min here because we currently only use it here.

pratikvn · 2025-01-22T09:48:59Z

include/ginkgo/core/base/mpi.hpp

+// OpenMPI 5.0 have support from MPIX_C_FLOAT16 and MPICHv3.4a1 MPIX_C_FLOAT16
+// Only OpenMPI support complex half
+// TODO: use native type when mpi is configured with half feature
+GKO_REGISTER_MPI_TYPE(half, MPI_UNSIGNED_SHORT);
+GKO_REGISTER_MPI_TYPE(std::complex<half>, MPI_FLOAT);


We will also need to consider whether other MPI implementations also natively support half, if we just want to use only the native support. Suppercomputers use their own variants: Cray-MPICH (Maybe this is similar to MPICH), IntelMPI etc.

I have discussed it with @MarcelKoch . I think we will go for the custom implementation now, then later we might check whether to do native support. I had tried something in deb12f0 and it already shows it quite not consistent between OpenMPI and MPICH

MarcelKoch

LGTM, only two small nits left.

core/mpi/mpi_op.hpp

MarcelKoch · 2025-02-13T14:04:35Z

core/mpi/mpi_op.hpp

+}  // namespace detail
+
+
+using op_manager = std::shared_ptr<MPI_Op>;


yes, that looks like a good approach.

core/test/utils.hpp

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

…core

sonarqubecloud · 2025-02-20T01:56:24Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

yhmtsai added the 1:ST:WIP This PR is a work in progress. Not ready for review. label Dec 30, 2024

yhmtsai self-assigned this Dec 30, 2024

ginkgo-bot added reg:testing This is related to testing. type:solver This is related to the solvers type:preconditioner This is related to the preconditioners mod:all This touches all Ginkgo modules. labels Dec 30, 2024

yhmtsai added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Jan 2, 2025

yhmtsai requested a review from a team January 2, 2025 08:30

MarcelKoch self-requested a review January 7, 2025 08:03

MarcelKoch requested changes Jan 9, 2025

View reviewed changes

yhmtsai force-pushed the half_mpi branch from 68a7045 to 8739d81 Compare January 14, 2025 09:57

yhmtsai added the 1:ST:run-full-test label Jan 15, 2025

yhmtsai requested a review from MarcelKoch January 15, 2025 16:53

MarcelKoch requested changes Jan 16, 2025

View reviewed changes

pratikvn self-requested a review January 17, 2025 09:17

pratikvn approved these changes Jan 22, 2025

View reviewed changes

yhmtsai force-pushed the half_mpi branch from d802ba9 to 46dabc7 Compare February 12, 2025 10:43

yhmtsai requested a review from MarcelKoch February 13, 2025 11:48

MarcelKoch requested changes Feb 13, 2025

View reviewed changes

yhmtsai requested a review from MarcelKoch February 13, 2025 14:53

MarcelKoch approved these changes Feb 13, 2025

View reviewed changes

yhmtsai force-pushed the half_mpi branch 2 times, most recently from 742c5e1 to 35d52f2 Compare February 18, 2025 12:34

yhmtsai added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 18, 2025

yhmtsai force-pushed the half_mpi branch from 93b910e to aa68307 Compare February 18, 2025 14:39

yhmtsai added 3 commits February 19, 2025 00:39

WIP: half distributed

a6ec9ec

WIP: change the instantiate type and implement the draft

59decd9

implement and test

eef90bc

yhmtsai and others added 11 commits February 19, 2025 00:39

enable vector half

1fefdef

enable half in distributed matrix/vector/pgm/gmres

662fc76

move the half custom op

6b89298

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

add support when mpi support half natively

cfe2a81

change the implementation type because MPICH use int for MPI_Op

4c1a895

remove WIP native half mpi

c19bfa5

update format

142a58c

use class rather than pointer directly

d1c5bec

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

typo and remove unused testing type

2f375a5

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

move custom mpi max to the test only because we do not use in ginkgo …

46682d3

…core

enable half in distributed multigrid

189d179

yhmtsai force-pushed the half_mpi branch from aa68307 to 189d179 Compare February 18, 2025 23:39

yhmtsai merged commit d7c7a7c into develop Feb 19, 2025
9 of 11 checks passed

yhmtsai deleted the half_mpi branch February 19, 2025 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable Half in mpi #1759

enable Half in mpi #1759

yhmtsai commented Dec 30, 2024 •

edited

Loading

MarcelKoch left a comment

MarcelKoch left a comment

MarcelKoch Jan 16, 2025

yhmtsai Feb 13, 2025

MarcelKoch Feb 13, 2025

pratikvn left a comment

pratikvn Jan 22, 2025

yhmtsai Feb 12, 2025

pratikvn Jan 22, 2025

yhmtsai Feb 13, 2025

MarcelKoch left a comment

MarcelKoch Feb 13, 2025

sonarqubecloud bot commented Feb 20, 2025

		} // namespace detail


		using op_manager = std::shared_ptr<MPI_Op>;

enable Half in mpi #1759

enable Half in mpi #1759

Conversation

yhmtsai commented Dec 30, 2024 • edited Loading

MarcelKoch left a comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pratikvn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 20, 2025

Quality Gate passed

yhmtsai commented Dec 30, 2024 •

edited

Loading