Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sandiatoss3 pe-layouts for failing tri-grid tests #3295

Merged
merged 3 commits into from
Nov 11, 2019
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions cime/config/e3sm/allactive/config_pesall.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1276,6 +1276,80 @@
</pes>
</mach>
</grid>
<grid name="a%ne30np4_l%r05_oi%oEC60to30v3_r%r05_g%null_w%null_z%null_m%oEC60to30v3">
<mach name="sandiatoss3">
<pes compset="any" pesize="any">
<comment>none</comment>
<ntasks>
<ntasks_atm>256</ntasks_atm>
<ntasks_lnd>256</ntasks_lnd>
<ntasks_rof>256</ntasks_rof>
<ntasks_ice>256</ntasks_ice>
<ntasks_ocn>256</ntasks_ocn>
<ntasks_glc>256</ntasks_glc>
<ntasks_wav>256</ntasks_wav>
<ntasks_cpl>256</ntasks_cpl>
</ntasks>
<nthrds>
<nthrds_atm>1</nthrds_atm>
<nthrds_lnd>1</nthrds_lnd>
<nthrds_rof>1</nthrds_rof>
<nthrds_ice>1</nthrds_ice>
<nthrds_ocn>1</nthrds_ocn>
<nthrds_glc>1</nthrds_glc>
<nthrds_wav>1</nthrds_wav>
<nthrds_cpl>1</nthrds_cpl>
</nthrds>
<rootpe>
<rootpe_atm>0</rootpe_atm>
<rootpe_lnd>0</rootpe_lnd>
<rootpe_rof>0</rootpe_rof>
<rootpe_ice>0</rootpe_ice>
<rootpe_ocn>0</rootpe_ocn>
<rootpe_glc>0</rootpe_glc>
<rootpe_wav>0</rootpe_wav>
<rootpe_cpl>0</rootpe_cpl>
</rootpe>
</pes>
</mach>
</grid>
<grid name="a%r05_l%r05_oi%null_r%r05_g%null_w%null_z%null_m%oEC60to30v3">
<mach name="sandiatoss3">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to set <mach name="any"> for both grids, tests with these grids are running into out-of-memory errors on multiple machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking, @amametjanov . Do all the machines we test on have the resources? I can't remember how big melvin is...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wadeburgess can you try

./cime/scripts/create_test SMS_P256.r05_r05.I1850CLM45CN SMS_D_Ld1_P256.ne30_r05_oECv3.A_WCYCL1850

on Melvin?

Copy link
Contributor Author

@jonbob jonbob Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMS.r05_r05.I1850CLM45CN.melvin_gnu passes without changing the layout, so it's not having the memory issues....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Melvin is a single node with 48 cores and very large memory. Oversubscribing (256 tasks on 48 cores) is allowed.

Another option is to set

<mach name="sandiatoss3|cori-knl|cori-haswell|theta|anvil|bebop">

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Melvin has a ton of memory but not 256 cores (we oversubscribe up to 64 on melvin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your help, @amametjanov and @jgfouca. I will change it to Az's second option, listing all the machine names...

<pes compset="any" pesize="any">
<comment>none</comment>
<ntasks>
<ntasks_atm>256</ntasks_atm>
<ntasks_lnd>256</ntasks_lnd>
<ntasks_rof>256</ntasks_rof>
<ntasks_ice>256</ntasks_ice>
<ntasks_ocn>256</ntasks_ocn>
<ntasks_glc>256</ntasks_glc>
<ntasks_wav>256</ntasks_wav>
<ntasks_cpl>256</ntasks_cpl>
</ntasks>
<nthrds>
<nthrds_atm>1</nthrds_atm>
<nthrds_lnd>1</nthrds_lnd>
<nthrds_rof>1</nthrds_rof>
<nthrds_ice>1</nthrds_ice>
<nthrds_ocn>1</nthrds_ocn>
<nthrds_glc>1</nthrds_glc>
<nthrds_wav>1</nthrds_wav>
<nthrds_cpl>1</nthrds_cpl>
</nthrds>
<rootpe>
<rootpe_atm>0</rootpe_atm>
<rootpe_lnd>0</rootpe_lnd>
<rootpe_rof>0</rootpe_rof>
<rootpe_ice>0</rootpe_ice>
<rootpe_ocn>0</rootpe_ocn>
<rootpe_glc>0</rootpe_glc>
<rootpe_wav>0</rootpe_wav>
<rootpe_cpl>0</rootpe_cpl>
</rootpe>
</pes>
</mach>
</grid>
<grid name="any">
<mach name="anlworkstation">
<pes compset="any" pesize="any">
Expand Down