-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update partitions across cores for 4 MPAS-Ocean meshes #5568
Conversation
This merge points to new partition files for each of the following 4 MPAS-Ocean meshes. Each mesh has about 400 files that are expected to support nearly any conceivable core count. Meshes with updated partitions: * EC30to60E2r2 * ECwISC30to60E2r1 * SOwISC12to60E2r4 * WC14to60E2r3 See MPAS-Dev/compass#563 for more details on how the core counts were determined.
See E3SM-Ocean-Discussion#43 for a bit of related discussion. |
@jonbob, are there other tags to be added? |
@amametjanov, I'm asking you to review because we discussed this on Slack and want to make sure you're onboard. It would be useful to know if you already see partition files missing from any of these 4 locations:
|
@darincomeau, I don't know if you want to do any further testing or review beyond what you already did for E3SM-Ocean-Discussion#43. If not, feel free to approve based on E3SM-Ocean-Discussion#43. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
@xylar -- are these new partition files? And in any way different from the old ones? |
Looks good. Just FYI, we ran into missing partitions case most recently while testing coupled model runs on Frontier for E3SM-MMF. With ocean, we got away with metis but Jon Wolfe pointed out need for the workflow for creating new ice partitions for performance. Ocean and Ice would run on CPUs on Frontier and there are 56 available cores per node. So, it would be desirable to have certain multiples of 56 in your available partitions. It's great if you already have those, or something to keep in mind while generating the ice partitions. |
@sarats, that's important to know. Thanks for letting me know. I will add multiples of 56. Some should be there already but probably not that many.
Can you be more specific about what "certain multiples" should be? All multiples between 1 and 10? Something much larger? It will be much easier to add these now than one-by-one later on. |
Adding @philipwjones and @xyuan: For the coupled MMF runs at high-node counts, what would you like to see? @PeterCaldwell and @mt5555: Just pinging you to see if you have any comments anticipating the needs of coupled SCREAM+Ocean in the future. |
I've definitely struggled with mpas partition files in the past, but have no specific comments myself right now. @singhbalwinder is working on coupled SCREAM runs right now so may have more specific thoughts. I know @ndkeen has borne the brunt of our past partition struggles, so he might also have some thoughts. Thanks for asking. |
@jonbob has been very kind in generating the needed partition files for me. He made it look simple but I think the process might be non-trivial. If there is a method to the process, it would be useful to get the recipe so that anyone can generate these as needed without bothering the ocean/ice teams. |
@PeterCaldwell (and @ndkeen @singhbalwinder, @philipwjones, @xyuan, @mt5555), what I would specifically need to know about (preferably now, but in the future if problems arise) are about "unusual" processor counts like @sarats examples (multiples of 56 cores). It's not feasible to generate every possible core count in advance but I've done about 400 that seems plausible (those with small prime factors). But it seems like there are some pretty odd core counts that E3SM likes to use on certain machines. Given that we don't anticipate having online partition generation anytime soon, we'd like to cover our bases. |
The ocean partitions are pretty trivial to generate. I will be following up shortly with balanced sea-ice partitions and those aren't straightforward to generate. The process is described here: The overall goal is to have sea-ice partitions that are better balanced than the defaults. If folks generate sea-ice partitions the old way, we will end up with a mix of balanced and non-balanced partitions and that will be confusing when we try to understand performance problems. |
Polling around the room, what are other special numbers out there? Summit uses 42/84 MPI ranks/node for CPU runs. |
@sarats (and everyone), currently, the only small multiples of 42, 56, 84, etc. that wouldn't be available would be the prime multiples 7 and above (so 7, 11, 13, 19, etc.). Adding 7 would be easy enough (so that multiples of 49 cores would be allowed). Do you anticipate running prime numbers of nodes in general (11, 13 or 19 nodes). If so, there's almost no way to restrict the number of partition files I generate to a reasonable number. |
In the future we really need to make this an online process, either within the CIME workflow or in the executable itself. Its ridiculous to have to anticipate and pre-generate 100s of partition files. |
@rljacob, do you have staff time you can allocate to this process? The current process for creating load-balanced sea-ice partition files would slow down E3SM init too much to be feasible. It involves remapping a data file from one MPAS mesh to another, then doing a flood fill to expand, then breaking the MPAS mesh into a polar and equatorial portion using a culling tool, then combining the partitions for each region into a single partition file. Rewriting this to work online would require a complete redesign of the algorithm. I certainly think that's a good goal but I'm trying to be realistic with the staff time we actually have. This process is a pain but it's a solved problem in some sense. |
No which is why its a wish for "the future" :) |
In that case, I totally, totally agree. I've dumped about as much time as I can stand into the process as it exists. |
I do have an online capability that could be used for the ocean (it at least calls the metis library and can reproduce the off-line metis partitions). Just haven't had the chance to integrate it into the MPAS-O configuration. If we need it sooner, I can try to figure that out. And that wouldn't help the sea-ice process so we'd have to figure that out too. |
Noting in case this gets drowned out by larger questions: He was also trying to get sea-ice partitions in coupled configs but had to settle for PE layouts with existing partitions. |
@jonbob, I'm answering this here for others' benefit (since we chatted about this on the MPAS DevOps call already). These partition files are likely the same as the old ones whenever the core counts are the same, though I did remake them for all core counts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved based on a visual inspection and developer communication
@xylar -- testing is showing non-BFB results, at least in globalStats files. I compared the old graph files with the new ones and they are different, so I suppose we should expect this. It does not change the ocean state at all, so the cpl files do not show any differences. |
Update partitions across cores for 4 MPAS-Ocean meshes This merge points to new partition files for each of the following 4 MPAS-Ocean meshes. Each mesh has about 400 files that are expected to support nearly any conceivable core count. Meshes with updated partitions: * EC30to60E2r2 * ECwISC30to60E2r1 * SOwISC12to60E2r4 * WC14to60E2r3 Partition files have been moved into a partitions subdirectory in each ocn/mpas-o/<mesh> directory. We decided this was better than putting partition files in share/meshes/mpas/ocean. [NML] [non-BFB] only for mpaso globalStats files for these meshes does not change the ocean state
passes:
as expected, because we did not change partition files for these ocean meshes NML DIFF and DIFF in mpaso.hist.am.globalStats for:
merged to next |
@jonbob, just for clarity, I don't think the graph files have changed. When I run:
I don't see any differences. But the partition files would very likely have changed because of, e.g., differences in the version of |
Thanks very much for testing! |
merged to master and expected DIFFs (and NML DIFFs) blessed. two tests on perlmutter failed due to missing graph files, which were not automatically downloaded testing. I manually downloaded them and the tests should now pass |
Thanks so much @jonbob! Everyone, please let me know if you run into missing partition files in the LCRC |
This merge updates the E3SM-Project submodule from [c292bec000](https://github.com/E3SM-Project/E3SM/tree/c292bec000) to [4b3e611fee](https://github.com/E3SM-Project/E3SM/tree/4b3e611fee). This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list): - [ ] (ocn) E3SM-Project/E3SM#5418 - [ ] (ocn) E3SM-Project/E3SM#5447 - [ ] (ocn) E3SM-Project/E3SM#5568 - [ ] (ocn) E3SM-Project/E3SM#5583 - [ ] (ocn) E3SM-Project/E3SM#5575 - [ ] (ocn) E3SM-Project/E3SM#5600
This merge points to new partition files for each of the following 4 MPAS-Ocean meshes. Each mesh has about 400 files that are expected to support nearly any conceivable core count.
Meshes with updated partitions:
See MPAS-Dev/compass#563 for more details on how the core counts were determined.
Partition files have been moved into a
partitions
subdirectory in eachocn/mpas-o/<mesh>
directory. We decided this was better than putting partition files inshare/meshes/mpas/ocean
.[NML]
[non-BFB] only for mpaso globalStats files for these meshes -- does not change the ocean state