-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EAMxx: avoid MPI tags in GridImportExport gather/scatter methods #7007
Conversation
…MPI tags * Tags were 1-1 with grid GIDs, which at large resolutions can exceed the TAG limit for certain MPI distributions * Use a pack/unpack approach to send a single msg per pid
I ran an ne30 test with output remapped to ne4, and cmp against what I get with master, and the two are BFB. This PR is ready to integrate upon review. |
I was finally able to verify that this fix allows a ne1024 case to run one day using Amazing work to find/fix this so quickly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a minor comment about a comment. It's in a test so feel free to merge without addressing.
The only CI fails are unrelated, and are currently being dealt with. Merging. |
Fixes an issue encountered during testing of PR #6977 (see this comment), where the tags we used went above the max tag allowed by the MPI distribution
[BFB]
Instead of sending one msg per grid point, we follow the common pack/unpack paradigm, and send/recv a single message per PID, allowing us to just always use 0 as a tag. It's worth reminding that the gather/scatter routines are used only during initialization, to set up remappers weights (we read map files "evenly" across ranks, and then ship the CRS matrix triplets to the ranks that need them).
@trey-ornl This PR should fix the tags issue. The unit tests for this struct pass on my laptop, but I won't claim it works until I run some tests where coarsening remappers are used (our CI runs ne4 only, so no coarsened output). But if you want to give it a shot, you could try to integrate it on the branch of the Frontier software update, and give it a try on Frontier...
@tcclevenger I don't know if you've ever taken a look at this part of the code. Let me know if you're not familiar with it, and you'd rather walk through it together.