You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a setup (using OpenMPI 4.1.3) with >14,000 processes, we noticed an unusually long initialization time. While investigating this, we found out that ~60 consecutive calls to MPI_Group_difference involving a group, which contained all processes of the run, took several minutes. I suspect that the implementation of ompi_group_dense_overlap (used by MPI_Group_difference) is sub optimal for such cases, because it seems to use an algorithm with a time complexity of O(n²) .
We could replicate a similar functionality using a collective MPI_Allreduce, which was many times faster, even though MPI_Group_difference is a local operation.
A more sophisticated algorithm (by for example by using sorted lists of the processes of each group) should be able to improve the performance significantly.
The text was updated successfully, but these errors were encountered:
This is a request for a performance improvement of MPI_Group_difference(). It is unlikely that we'll take such an improvement back on the v4.1.x series -- that series is (slowly) being retired in favor of the v5.0.x series. I.e., we're still actively taking bug fixes, but not necessarily new features / overhauls of existing algorithms.
In a setup (using OpenMPI 4.1.3) with >14,000 processes, we noticed an unusually long initialization time. While investigating this, we found out that ~60 consecutive calls to
MPI_Group_difference
involving a group, which contained all processes of the run, took several minutes. I suspect that the implementation ofompi_group_dense_overlap
(used byMPI_Group_difference
) is sub optimal for such cases, because it seems to use an algorithm with a time complexity of O(n²) .We could replicate a similar functionality using a collective
MPI_Allreduce
, which was many times faster, even thoughMPI_Group_difference
is a local operation.A more sophisticated algorithm (by for example by using sorted lists of the processes of each group) should be able to improve the performance significantly.
The text was updated successfully, but these errors were encountered: