-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix addresses issue with mulitple mpi comm groups when dyn_npes<npes #1604
Conversation
Addresses an issue in HOMME where multiple communication groups are generated in cases where dyn_npes < npes for the ATM model. This causes the code to crash for certain choices of dyn_npes with respects to npes. * Change to initmp subroutine for the case of CAM defined * Change to namelist read for the case of CAM defined * Change to dynamics initialization (dyn_init1) to be compatible with changes to HOMME [BFB] - Bit-For-Bit
@lazarusM3B can you add a real name to your github profile? All the usernames are hard to keep track of. |
@lazarusM3B (aka Aaron Donahue as indicated in the branch name),
and make the assignments to |
This commit is less invasive by leaving the readnl subroutine alone and just having the ATM model skip calling the subroutine for processors not assigned to the dynamics solve. * Reverted the homme file namelist_mod.F90 to original version * Changed dyn_init1 in dyn_comp.F90 to skip the namelist read for cores not assigned to solve dynamics. [BFB] - Bit-For-Bit
I pushed a new commit to the branch with the above changes made. I am about to leave on vacation for a few weeks so I may not be able to respond to any new comments on this PR until I get back. |
does this work?
all processor will compute tstep, but in this case the wont all have the same value of se_nsplit, since it's in the namelist. In this case it is harmless, becuase non-dycore pes would never use 'tstep'. Sorry to waffle on this, but your other solution might be better. Or, how about making a fake "par", type (parallel_t) :: par _all par=initmp(npes_se)
|
old comment: (moved from issue #1593 ) adding here for reference: So are you saying that there are dycore namelist variables that are needed by the physics procs? I would have hoped that anything in the namelist read by the dycore is not needed by the physics, but that might be wishfull thinking. An alternative would be to call readnl from the dp coupling layer (since it wont be a big deal to call it twice). I can see your solution may be the best way - but my first reaction is I dont like dycore code sometimes working with one communicator, and other times working with a different communicator. |
Fixed issue with GET_REFCASE not being set correctly that was introduced with PR#1561 and an issue with X compset mapping files that was introduced in #1586 Test suite: scripts_regression_tests.py Test baseline: Test namelist changes: Test status: bit for bit Fixes User interface changes?: Code review:
Where are we with this PR? |
@rljacob @mt5555 @golaz , I've been working on this this morning. Sorry it took so long to get back but I wasn't able to work on this until I got up and running on LC machines, which we just got working last week. But to summarize what I've got so far. I took a look at implementing the suggestions made by @mt5555 up above but this lead to another set of errors further down the pipe in the code. I was trying to hunt down exactly why. As it stands the fix that is in this commit appears to work, but as @mt5555 pointed out it means that non-dynamics processors won't read in the dynamics namelist. So far this doesn't look like a problem, but could be hard to test for. If the reviewers are ok with this then we can probably call it finished, but if there are still concerns I am happy to continue working on this issue. |
@AaronDonahue : looks good - your knowledge of this is deeper than mine, so I defer to your judgement as to the best approach. |
@mt5555 , Excellent, thank you. @singhbalwinder, I would like to take one more look at this code before we go ahead an integrate it. It has been over a month since I started this PR and I want to make sure that its ready to go. |
@AaronDonahue please merge this to next today. |
@rljacob : Is your comment above for this PR? This PR was created by @AaronDonahue and I am the integrator. @AaronDonahue indicated in his comment above that he would like to work on the suggestions made by reviewers before merging it. Is that still the case @AaronDonahue ? |
@singhbalwinder , yes, I was planning on making my final check this morning before giving the green light. |
Sounds good. Let me know when you think you are okay with this PR and I will merge it to next. |
Sorry, I mean you should integrate it @singhbalwinder. |
@singhbalwinder @rljacob , I've dug deeper into potential issues with handling communication groups in this way and found that the code can run into problems with a history write. Based on this I will need some more time to work on the issue. |
@AaronDonahue in that case, please close the PR and you can reopen it when its ready. |
@rljacob , absolutely, will do |
Addresses issue #1593
Addresses an issue in HOMME where multiple communication groups are
generated in cases where dyn_npes < npes for the ATM model. This
causes the code to crash for certain choices of dyn_npes with respects
to npes.
* Change to initmp subroutine for the case of CAM defined
* Change to namelist read for the case of CAM defined
* Change to dynamics initialization (dyn_init1) to be
compatible with changes to HOMME
[BFB] - Bit-For-Bit