Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalability issue in MCT when called from CPL:RUN #1107

Closed
worleyph opened this issue Oct 20, 2016 · 2 comments
Closed

scalability issue in MCT when called from CPL:RUN #1107

worleyph opened this issue Oct 20, 2016 · 2 comments
Assignees

Comments

@worleyph
Copy link
Contributor

worleyph commented Oct 20, 2016

This issue is to document MCT PR number 38.

For high resolution runs of ACME using large MPI process counts, time spent in CPL:RUN can be very large (larger than the cost of ATM on Mira, for example). Upon investigation, it was determined that the existing MPI algorithms in the rearrange_ routine can be inefficient. The swapm variant of the MPI_AlltoallV operator was ported from PIO1 and modified to work in the MCT environment. The option to call this was then added to rearrange_ for calls originating from sMatAvMult_SMPlus_ . Experiments indicate that it is not as efficient to apply this change to all calls to rearrange_ , but other call sites may find it advantageous to use this new algorithm - TBD. In at least one high res. case, using the swapm algorithm decreases CPL:RUN cost by a factor of 5 at high process counts. This will be critical for near term production runs.

Long term, the routines calling rearrange_ should be modified to allow the user to specify an MPI algorithm and communication protocol, but this will require more extensive modifications to MCT.

The MCT PR is a bit rough, and MCT developers may need to rework this a bit to make it conform to MCT coding style and conventions. Hopefully it is a high priority for @rljacob :-).

@bishtgautam
Copy link
Contributor

Updated the comment to include url to MCT PR

@worleyph
Copy link
Contributor Author

worleyph commented Jul 4, 2017

I believe that this optimization has been merged into ACME from MCT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants