-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update sandiatoss3 pe-layouts for failing tri-grid tests #3295
Conversation
I don't think I should review this. I am unfamiliar with pe-layouts. |
<grid name="a%r05_l%r05_oi%null_r%r05_g%null_w%null_z%null_m%oEC60to30v3"> | ||
<mach name="sandiatoss3"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to set <mach name="any">
for both grids, tests with these grids are running into out-of-memory errors on multiple machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking, @amametjanov . Do all the machines we test on have the resources? I can't remember how big melvin is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wadeburgess can you try
./cime/scripts/create_test SMS_P256.r05_r05.I1850CLM45CN SMS_D_Ld1_P256.ne30_r05_oECv3.A_WCYCL1850
on Melvin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SMS.r05_r05.I1850CLM45CN.melvin_gnu passes without changing the layout, so it's not having the memory issues....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, Melvin is a single node with 48 cores and very large memory. Oversubscribing (256 tasks on 48 cores) is allowed.
Another option is to set
<mach name="sandiatoss3|cori-knl|cori-haswell|theta|anvil|bebop">
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Melvin has a ton of memory but not 256 cores (we oversubscribe up to 64 on melvin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your help, @amametjanov and @jgfouca. I will change it to Az's second option, listing all the machine names...
Update sandiatoss3 pe-layouts for failing tri-grid tests This PR adds two specific pe-layouts for sandiatoss3 and other platforms to run two failing tri-grid tests: * SMS.r05_r05.I1850CLM45CN.sandiatoss3_intel * SMS_D_Ld1.ne30_r05_oECv3.A_WCYCL1850.sandiatoss3_intel These layouts were tested in SES-875 and now pass. [BFB] for all previously tested configurations
merged to next |
Update sandiatoss3 pe-layouts for failing tri-grid tests This PR adds two specific pe-layouts for sandiatoss3 and other platforms to run two failing tri-grid tests: * SMS.r05_r05.I1850CLM45CN.sandiatoss3_intel * SMS_D_Ld1.ne30_r05_oECv3.A_WCYCL1850.sandiatoss3_intel These layouts were tested in SES-875 and now pass. [BFB] for all previously tested configurations
merged to master |
These variables should get reasonable initial values if the namelist isn't read. Test suite: scripts_regression_tests.py had testing of SMS_Ld5.f09_g17.B1850G.cheyenne_intel.allactive-cism-test_coupling Test baseline: Test namelist changes: Test status: bit for bit Fixes #3294 User interface changes?: Update gh-pages html (Y/N)?: Code review:
This PR adds two specific pe-layouts for sandiatoss3 to run two failing tri-grid tests:
These layouts were tested in SES-875 and now pass.
[BFB] for all previously tested configurations