Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update partitions across cores for 4 MPAS-Ocean meshes #5568

Merged
merged 1 commit into from
Apr 10, 2023

Conversation

xylar
Copy link
Contributor

@xylar xylar commented Mar 29, 2023

This merge points to new partition files for each of the following 4 MPAS-Ocean meshes. Each mesh has about 400 files that are expected to support nearly any conceivable core count.

Meshes with updated partitions:

  • EC30to60E2r2
  • ECwISC30to60E2r1
  • SOwISC12to60E2r4
  • WC14to60E2r3

See MPAS-Dev/compass#563 for more details on how the core counts were determined.

Partition files have been moved into a partitions subdirectory in each ocn/mpas-o/<mesh> directory. We decided this was better than putting partition files in share/meshes/mpas/ocean.

[NML]
[non-BFB] only for mpaso globalStats files for these meshes -- does not change the ocean state

This merge points to new partition files for each of the following
4 MPAS-Ocean meshes.  Each mesh has about 400 files that are
expected to support nearly any conceivable core count.

Meshes with updated partitions:
* EC30to60E2r2
* ECwISC30to60E2r1
* SOwISC12to60E2r4
* WC14to60E2r3

See MPAS-Dev/compass#563 for more details
on how the core counts were determined.
@xylar xylar added mpas-ocean BFB PR leaves answers BFB labels Mar 29, 2023
@xylar
Copy link
Contributor Author

xylar commented Mar 29, 2023

See E3SM-Ocean-Discussion#43 for a bit of related discussion.

@xylar
Copy link
Contributor Author

xylar commented Mar 29, 2023

@jonbob, are there other tags to be added?

@xylar
Copy link
Contributor Author

xylar commented Mar 29, 2023

@amametjanov, I'm asking you to review because we discussed this on Slack and want to make sure you're onboard. It would be useful to know if you already see partition files missing from any of these 4 locations:

/lcrc/group/e3sm/public_html/inputdata/ocn/mpas-o/EC30to60E2r2/partitions
/lcrc/group/e3sm/public_html/inputdata/ocn/mpas-o/ECwISC30to60E2r1/partitions
/lcrc/group/e3sm/public_html/inputdata/ocn/mpas-o/SOwISC12to60E2r4/partitions
/lcrc/group/e3sm/public_html/inputdata/ocn/mpas-o/WC14to60E2r3/partitions

@xylar
Copy link
Contributor Author

xylar commented Mar 29, 2023

@darincomeau, I don't know if you want to do any further testing or review beyond what you already did for E3SM-Ocean-Discussion#43. If not, feel free to approve based on E3SM-Ocean-Discussion#43.

Copy link
Member

@darincomeau darincomeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jonbob
Copy link
Contributor

jonbob commented Mar 30, 2023

@xylar -- are these new partition files? And in any way different from the old ones?

@sarats
Copy link
Member

sarats commented Mar 30, 2023

Looks good.

Just FYI, we ran into missing partitions case most recently while testing coupled model runs on Frontier for E3SM-MMF. With ocean, we got away with metis but Jon Wolfe pointed out need for the workflow for creating new ice partitions for performance.

Ocean and Ice would run on CPUs on Frontier and there are 56 available cores per node. So, it would be desirable to have certain multiples of 56 in your available partitions. It's great if you already have those, or something to keep in mind while generating the ice partitions.

@jonbob jonbob added the inputdata Changes affecting inputdata collection on blues label Mar 30, 2023
@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

@sarats, that's important to know. Thanks for letting me know. I will add multiples of 56. Some should be there already but probably not that many.

it would be desirable to have certain multiples of 56 in your available partitions

Can you be more specific about what "certain multiples" should be? All multiples between 1 and 10? Something much larger? It will be much easier to add these now than one-by-one later on.

@sarats
Copy link
Member

sarats commented Mar 31, 2023

Adding @philipwjones and @xyuan: For the coupled MMF runs at high-node counts, what would you like to see?

@PeterCaldwell and @mt5555: Just pinging you to see if you have any comments anticipating the needs of coupled SCREAM+Ocean in the future.

@PeterCaldwell
Copy link
Contributor

PeterCaldwell and mt5555: Just pinging you to see if you have any comments anticipating the needs of coupled SCREAM+Ocean in the future.

I've definitely struggled with mpas partition files in the past, but have no specific comments myself right now. @singhbalwinder is working on coupled SCREAM runs right now so may have more specific thoughts. I know @ndkeen has borne the brunt of our past partition struggles, so he might also have some thoughts. Thanks for asking.

@singhbalwinder
Copy link
Contributor

@jonbob has been very kind in generating the needed partition files for me. He made it look simple but I think the process might be non-trivial. If there is a method to the process, it would be useful to get the recipe so that anyone can generate these as needed without bothering the ocean/ice teams.

@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

@PeterCaldwell (and @ndkeen @singhbalwinder, @philipwjones, @xyuan, @mt5555), what I would specifically need to know about (preferably now, but in the future if problems arise) are about "unusual" processor counts like @sarats examples (multiples of 56 cores). It's not feasible to generate every possible core count in advance but I've done about 400 that seems plausible (those with small prime factors). But it seems like there are some pretty odd core counts that E3SM likes to use on certain machines. Given that we don't anticipate having online partition generation anytime soon, we'd like to cover our bases.

@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

If there is a method to the process, it would be useful to get the recipe so that anyone can generate these as needed without bothering the ocean/ice teams.

The ocean partitions are pretty trivial to generate. I will be following up shortly with balanced sea-ice partitions and those aren't straightforward to generate. The process is described here:
https://acme-climate.atlassian.net/wiki/spaces/DOC/pages/48529761/Make+a+new+graph+partition+for+MPAS
and here:
http://mpas-dev.github.io/MPAS-Tools/stable/seaice/partition.html
But at least for now I would prefer that you all did bug me (or if you can't get me, someone else on the the ocean and sea-ice teams) to make the partition files. It would be better to understand which partition sizes are missing and why than to keep generating them in an ad hoc way.

The overall goal is to have sea-ice partitions that are better balanced than the defaults. If folks generate sea-ice partitions the old way, we will end up with a mix of balanced and non-balanced partitions and that will be confusing when we try to understand performance problems.

@sarats
Copy link
Member

sarats commented Mar 31, 2023

Polling around the room, what are other special numbers out there? Summit uses 42/84 MPI ranks/node for CPU runs.

@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

@sarats (and everyone), currently, the only small multiples of 42, 56, 84, etc. that wouldn't be available would be the prime multiples 7 and above (so 7, 11, 13, 19, etc.). Adding 7 would be easy enough (so that multiples of 49 cores would be allowed). Do you anticipate running prime numbers of nodes in general (11, 13 or 19 nodes). If so, there's almost no way to restrict the number of partition files I generate to a reasonable number.

@rljacob
Copy link
Member

rljacob commented Mar 31, 2023

In the future we really need to make this an online process, either within the CIME workflow or in the executable itself. Its ridiculous to have to anticipate and pre-generate 100s of partition files.

@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

@rljacob, do you have staff time you can allocate to this process?

The current process for creating load-balanced sea-ice partition files would slow down E3SM init too much to be feasible. It involves remapping a data file from one MPAS mesh to another, then doing a flood fill to expand, then breaking the MPAS mesh into a polar and equatorial portion using a culling tool, then combining the partitions for each region into a single partition file. Rewriting this to work online would require a complete redesign of the algorithm. I certainly think that's a good goal but I'm trying to be realistic with the staff time we actually have. This process is a pain but it's a solved problem in some sense.

@rljacob
Copy link
Member

rljacob commented Mar 31, 2023

No which is why its a wish for "the future" :)

@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

In that case, I totally, totally agree. I've dumped about as much time as I can stand into the process as it exists.

@philipwjones
Copy link
Contributor

I do have an online capability that could be used for the ocean (it at least calls the metis library and can reproduce the off-line metis partitions). Just haven't had the chance to integrate it into the MPAS-O configuration. If we need it sooner, I can try to figure that out. And that wouldn't help the sea-ice process so we'd have to figure that out too.

@sarats
Copy link
Member

sarats commented Mar 31, 2023

Noting in case this gets drowned out by larger questions:
@xyuan was trying to run oRRS18to6v3 on large node-counts on Frontier (512,1k, 2k etc.)
https://pace.ornl.gov/search/machine:frontier%20res:oRRS18to6v3

He was also trying to get sea-ice partitions in coupled configs but had to settle for PE layouts with existing partitions.
https://pace.ornl.gov/search/machine:frontier%20res:ne120pg2_r0125_oRRS18to6v3%20compset:WCYCL1950-MMF1

@xylar
Copy link
Contributor Author

xylar commented Mar 31, 2023

@sarats and @xyuan, oRRS18to6v3 is not a mesh I'm addressing here but I could make new partition files (both ocean and sea-ice) for it as a follow-up step.

@xylar
Copy link
Contributor Author

xylar commented Apr 5, 2023

are these new partition files? And in any way different from the old ones?

@jonbob, I'm answering this here for others' benefit (since we chatted about this on the MPAS DevOps call already). These partition files are likely the same as the old ones whenever the core counts are the same, though I did remake them for all core counts.

Copy link
Contributor

@jonbob jonbob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved based on a visual inspection and developer communication

@jonbob jonbob added the NML label Apr 6, 2023
@jonbob
Copy link
Contributor

jonbob commented Apr 6, 2023

@xylar -- testing is showing non-BFB results, at least in globalStats files. I compared the old graph files with the new ones and they are different, so I suppose we should expect this. It does not change the ocean state at all, so the cpl files do not show any differences.

@jonbob jonbob added non-BFB PR makes roundoff changes to answers. and removed BFB PR leaves answers BFB labels Apr 6, 2023
jonbob added a commit that referenced this pull request Apr 6, 2023
Update partitions across cores for 4 MPAS-Ocean meshes

This merge points to new partition files for each of the following
4 MPAS-Ocean meshes. Each mesh has about 400 files that are expected to
support nearly any conceivable core count.

Meshes with updated partitions:
* EC30to60E2r2
* ECwISC30to60E2r1
* SOwISC12to60E2r4
* WC14to60E2r3

Partition files have been moved into a partitions subdirectory in each
ocn/mpas-o/<mesh> directory. We decided this was better than putting
partition files in share/meshes/mpas/ocean.

[NML]
[non-BFB] only for mpaso globalStats files for these meshes
          does not change the ocean state
@jonbob
Copy link
Contributor

jonbob commented Apr 6, 2023

passes:

  • SMS_D_Ld3.T62_oQU120.CMPASO-IAF.chrysalis_intel
  • ERS.ne11_oQU240.WCYCL1850NS.chrysalis_intel

as expected, because we did not change partition files for these ocean meshes

NML DIFF and DIFF in mpaso.hist.am.globalStats for:

  • SMS_D_Ld1.ne30pg2_EC30to60E2r2.WCYCL1850.chrysalis_intel.allactive-wcprod

merged to next

@xylar
Copy link
Contributor Author

xylar commented Apr 8, 2023

@jonbob, just for clarity, I don't think the graph files have changed. When I run:

$ diff mpas-o.graph.info.200904 partitions/mpas-o.graph.info.230313

I don't see any differences. But the partition files would very likely have changed because of, e.g., differences in the version of gpmetis being used. It hadn't occurred to me that that would change global sums but it makes sense.

@xylar
Copy link
Contributor Author

xylar commented Apr 8, 2023

Thanks very much for testing!

@jonbob jonbob merged commit c7f67cb into E3SM-Project:master Apr 10, 2023
@jonbob
Copy link
Contributor

jonbob commented Apr 10, 2023

merged to master and expected DIFFs (and NML DIFFs) blessed.

two tests on perlmutter failed due to missing graph files, which were not automatically downloaded testing. I manually downloaded them and the tests should now pass

@xylar xylar deleted the ocn/new-partition-files branch April 12, 2023 09:57
@xylar
Copy link
Contributor Author

xylar commented Apr 12, 2023

Thanks so much @jonbob!

Everyone, please let me know if you run into missing partition files in the LCRC public_html/inputdata space. (I don't think there's much I can do about the missing downloads from PEM tests and the confusion with permissions outside of that space.)

xylar added a commit to xylar/compass that referenced this pull request Apr 18, 2023
This merge updates the E3SM-Project submodule from [c292bec000](https://github.com/E3SM-Project/E3SM/tree/c292bec000) to [4b3e611fee](https://github.com/E3SM-Project/E3SM/tree/4b3e611fee).

This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list):
- [ ]  (ocn) E3SM-Project/E3SM#5418
- [ ]  (ocn) E3SM-Project/E3SM#5447
- [ ]  (ocn) E3SM-Project/E3SM#5568
- [ ]  (ocn) E3SM-Project/E3SM#5583
- [ ]  (ocn) E3SM-Project/E3SM#5575
- [ ]  (ocn) E3SM-Project/E3SM#5600
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inputdata Changes affecting inputdata collection on blues mpas-ocean NML non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants