You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For high resolution runs of ACME using large MPI process counts, time spent in CPL:RUN can be very large (larger than the cost of ATM on Mira, for example). Upon investigation, it was determined that the existing MPI algorithms in the rearrange_ routine can be inefficient. The swapm variant of the MPI_AlltoallV operator was ported from PIO1 and modified to work in the MCT environment. The option to call this was then added to rearrange_ for calls originating from sMatAvMult_SMPlus_ . Experiments indicate that it is not as efficient to apply this change to all calls to rearrange_ , but other call sites may find it advantageous to use this new algorithm - TBD. In at least one high res. case, using the swapm algorithm decreases CPL:RUN cost by a factor of 5 at high process counts. This will be critical for near term production runs.
Long term, the routines calling rearrange_ should be modified to allow the user to specify an MPI algorithm and communication protocol, but this will require more extensive modifications to MCT.
The MCT PR is a bit rough, and MCT developers may need to rework this a bit to make it conform to MCT coding style and conventions. Hopefully it is a high priority for @rljacob :-).
The text was updated successfully, but these errors were encountered:
This issue is to document MCT PR number 38.
For high resolution runs of ACME using large MPI process counts, time spent in CPL:RUN can be very large (larger than the cost of ATM on Mira, for example). Upon investigation, it was determined that the existing MPI algorithms in the rearrange_ routine can be inefficient. The swapm variant of the MPI_AlltoallV operator was ported from PIO1 and modified to work in the MCT environment. The option to call this was then added to rearrange_ for calls originating from sMatAvMult_SMPlus_ . Experiments indicate that it is not as efficient to apply this change to all calls to rearrange_ , but other call sites may find it advantageous to use this new algorithm - TBD. In at least one high res. case, using the swapm algorithm decreases CPL:RUN cost by a factor of 5 at high process counts. This will be critical for near term production runs.
Long term, the routines calling rearrange_ should be modified to allow the user to specify an MPI algorithm and communication protocol, but this will require more extensive modifications to MCT.
The MCT PR is a bit rough, and MCT developers may need to rework this a bit to make it conform to MCT coding style and conventions. Hopefully it is a high priority for @rljacob :-).
The text was updated successfully, but these errors were encountered: