Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sandiatoss3 pe-layouts for failing tri-grid tests #3295

Merged
merged 3 commits into from
Nov 11, 2019

Conversation

jonbob
Copy link
Contributor

@jonbob jonbob commented Nov 7, 2019

This PR adds two specific pe-layouts for sandiatoss3 to run two failing tri-grid tests:

  • SMS.r05_r05.I1850CLM45CN.sandiatoss3_intel
  • SMS_D_Ld1.ne30_r05_oECv3.A_WCYCL1850.sandiatoss3_intel
    These layouts were tested in SES-875 and now pass.

[BFB] for all previously tested configurations

@jonbob jonbob added BFB PR leaves answers BFB Testing Anything related to unit/system tests labels Nov 7, 2019
@jonbob jonbob requested a review from wadeburgess November 7, 2019 16:22
@jonbob jonbob self-assigned this Nov 7, 2019
@wadeburgess
Copy link
Contributor

I don't think I should review this. I am unfamiliar with pe-layouts.

@jonbob jonbob requested review from rljacob and removed request for wadeburgess November 7, 2019 17:14
Comment on lines 1316 to 1317
<grid name="a%r05_l%r05_oi%null_r%r05_g%null_w%null_z%null_m%oEC60to30v3">
<mach name="sandiatoss3">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to set <mach name="any"> for both grids, tests with these grids are running into out-of-memory errors on multiple machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking, @amametjanov . Do all the machines we test on have the resources? I can't remember how big melvin is...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wadeburgess can you try

./cime/scripts/create_test SMS_P256.r05_r05.I1850CLM45CN SMS_D_Ld1_P256.ne30_r05_oECv3.A_WCYCL1850

on Melvin?

Copy link
Contributor Author

@jonbob jonbob Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SMS.r05_r05.I1850CLM45CN.melvin_gnu passes without changing the layout, so it's not having the memory issues....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, Melvin is a single node with 48 cores and very large memory. Oversubscribing (256 tasks on 48 cores) is allowed.

Another option is to set

<mach name="sandiatoss3|cori-knl|cori-haswell|theta|anvil|bebop">

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Melvin has a ton of memory but not 256 cores (we oversubscribe up to 64 on melvin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your help, @amametjanov and @jgfouca. I will change it to Az's second option, listing all the machine names...

jonbob added a commit that referenced this pull request Nov 8, 2019
Update sandiatoss3 pe-layouts for failing tri-grid tests

This PR adds two specific pe-layouts for sandiatoss3 and other platforms to run
two failing tri-grid tests:
* SMS.r05_r05.I1850CLM45CN.sandiatoss3_intel
* SMS_D_Ld1.ne30_r05_oECv3.A_WCYCL1850.sandiatoss3_intel
These layouts were tested in SES-875 and now pass.

[BFB] for all previously tested configurations
@jonbob
Copy link
Contributor Author

jonbob commented Nov 8, 2019

merged to next

jonbob added a commit that referenced this pull request Nov 11, 2019
Update sandiatoss3 pe-layouts for failing tri-grid tests

This PR adds two specific pe-layouts for sandiatoss3 and other platforms to run
two failing tri-grid tests:
* SMS.r05_r05.I1850CLM45CN.sandiatoss3_intel
* SMS_D_Ld1.ne30_r05_oECv3.A_WCYCL1850.sandiatoss3_intel
These layouts were tested in SES-875 and now pass.

[BFB] for all previously tested configurations
@jonbob jonbob merged commit 33d71e6 into master Nov 11, 2019
@jonbob
Copy link
Contributor Author

jonbob commented Nov 11, 2019

merged to master

@jonbob jonbob deleted the jonbob/scripts/fix-trigrid-tests branch November 11, 2019 18:34
rljacob pushed a commit that referenced this pull request Apr 21, 2021
These variables should get reasonable initial values if the namelist isn't read.

Test suite: scripts_regression_tests.py had testing of SMS_Ld5.f09_g17.B1850G.cheyenne_intel.allactive-cism-test_coupling
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #3294

User interface changes?:

Update gh-pages html (Y/N)?:

Code review:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB Testing Anything related to unit/system tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants