Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update V2 benchmarking grids and compsets #4551

Merged
merged 9 commits into from
Oct 15, 2021

Conversation

amametjanov
Copy link
Member

@amametjanov amametjanov commented Sep 22, 2021

Update V2 benchmarking grids and compsets (in part by #4488 too):

  1. low-res F: ne30_ne30.FC5AV1C-L -> ne30pg2_EC30to60E2r2.F2010
  2. low-res G: T62_oEC60to30v3.GMPAS-IAF > T62_EC30to60E2r2.GMPAS-IAF
  3. low-res WC: ne30_oECv3.A_WCYCL1850S_CMIP6 -> ne30pg2_EC30to60E2r2.WCYCL1850
  4. high-res F: ne120_ne120.FC5AV1C-H01A -> ne120pg2_r05_EC30to60E2r2.F2010
  5. high-res WC: ne120_oRRS18v3.A_WCYCL1950S_CMIP6_HR -> ne120pg2_r0125_oRRS18to6v3.WCYCL1950

[BFB]

@amametjanov amametjanov added BFB PR leaves answers BFB Performance labels Sep 22, 2021
@amametjanov amametjanov requested review from wlin7 and jonbob September 22, 2021 17:10
@amametjanov amametjanov self-assigned this Sep 22, 2021
"PFS_P14400.ne120pg2_r05_oECv3.F2010.eam-bench-noio",
"PFS_P21600.ne120pg2_r05_oECv3.F2010.eam-bench-noio",
"PFS_P43200.ne120pg2_r05_oECv3.F2010.eam-bench-noio",
"PFS_P86400.ne120pg2_r05_oECv3.F2010.eam-bench-noio",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we're trying to move to the EC30to60E2r2 for the low-res v2 grid. I'm guessing we don't have ne120pg2_r05_EC30to60E2r2 defined, but we can in order to update this test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an error with SMS.ne120pg2_r05_EC30to60E2r2.F2010:

ERROR: no alias ne120pg2_r05_EC30to60E2r2 defined

Please feel free to push updates to this branch or create a PR and we'll make this PR dependent on that PR.

Also, do we need to update these too?

  1. high-res G-compset: --res T62_oRRS18to6v3 --compset GMPAS-IAF
  2. low-res G-compset: --res T62_oEC60to30v3 --compset GMPAS-IAF
  3. high-res WC-compset: --res ne120pg2_r0125_oRRS18to6v3 --compset WCYCL1950:
ERROR: grid alias ne120pg2_r0125_oRRS18to6v3 not valid for compset 1950SOI_EAM%CMIP6_ELM%SPBC_MPASSI_MPASO_MOSART_SGLC_SWAV_SIAC_SESP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we'll start a list -- and I'll check to see if the ocn/ice team wants to use oRRS18to6v3 for the high-res testing or the WC14to60E2r3. I'm pretty sure we want to move to EC30to60E2r2 for low-res. I'll push a fix for ne120pg2_r05_EC30to60E2r2 in just a minute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our official project metrics, the high-res F,G benchmarks need to be as consistent as possible so that we can measure progress across versions. So we should be trying to keep the RRS18to6 for those (and ne120 for atm).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's leave oRRS18to6v3 for high-res and move to EC30to60E2r2 for low-res. Does that work for you, @philipwjones ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amametjanov - the ne120pg2_r0125_oRRS18to6v3 alias had been set for:

compset="(DOCN|XOCN|SOCN|AQP1)"

I'll remove that limitation so it should work for WCYCL1950 and commit to this branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing the grid, why not change the compset to F2010-CICE ? Then you could keep ne120_ne120 and compare with older results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting an error while setting up SMS.ne120pg2_ne120pg2.F2010-CICE:

ERROR: Command: '/global/u2/a/azamat/cori/repos/E3SM-master/components/elm/bld/build-namelist -infile /global/cscratch1/sd/azamat/e3sm_scratch/cori-knl/tests/bench/20210917-sanity01/chk-05-f-cice/SMS.ne120pg2_ne120pg2.F2010-CICE.cori-knl_intel.20210923/Buildconf/elmconf/namelist  
-csmdata /global/cfs/cdirs/e3sm/inputdata 
-inputdata /global/cscratch1/sd/azamat/e3sm_scratch/cori-knl/tests/bench/20210917-sanity01/chk-05-f-cice/SMS.ne120pg2_ne120pg2.F2010-CICE.cori-knl_intel.20210923/Buildconf/elm.input_data_list 
-ignore_ic_year -namelist " &elm_inparm  start_ymd=00010101  /" 
-use_case 2010_CMIP6_control  -res ne120np4.pg2  -clm_start_type default 
-envxml_dir /global/cscratch1/sd/azamat/e3sm_scratch/cori-knl/tests/bench/20210917-sanity01/chk-05-f-cice/SMS.ne120pg2_ne120pg2.F2010-CICE.cori-knl_intel.20210923 
-l_ncpl 96 -lnd_frac /global/cfs/cdirs/e3sm/inputdata/share/domains/UNSET 
-glc_nec 0 -co2_ppmv 388.717 -co2_type diagnostic  -ncpl_base_period day  
-config /global/cscratch1/sd/azamat/e3sm_scratch/cori-knl/tests/bench/20210917-sanity01/chk-05-f-cice/SMS.ne120pg2_ne120pg2.F2010-CICE.cori-knl_intel.20210923/Buildconf/elmconf/config_cache.xml 
-bgc sp -mask gx1v6' 
failed with error 'File::Glob::glob() will disappear in perl 5.30. Use File::Glob::bsd_glob() instead. at /global/u2/a/azamat/cori/repos/E3SM-master/components/elm/bld/ELMBuildNamelist.pm line 3511.

ERROR : CLM build-namelist::ELMBuildNamelist::setup_cmdl_resolution() : res has a value ('ne120np4.pg2') that is NOT valid. 
Valid values are: '512x1024' '360x720cru' '128x256' '64x128' '48x96' '32x64' '8x16' '94x192' '0.23x0.31' '0.9x1.25' '1.9x2.5' '2.5x3.33' '4x5' '10x15' '5x5_amazon' '1x1_tropicAtl' '1x1_camdenNJ' '1x1_vancouverCAN' '1x1_mexicocityMEX' '1x1_asphaltjungleNJ' '1x1_brazil' '1x1_urbanc_alpha' '1x1_numaIA' '1x1_smallvilleIA' '0.1x0.1' '0.5x0.5' '3x3min' '5x5min' '10x10min' '0.33x0.33' 'ne4np4' 'ne4np4.pg2' 'ne11np4' 'ne16np4' 'ne30np4' 'ne30np4.pg2' 'ne60np4' 'ne120np4' 'ne240np4' 'ne1024np4.pg2' '1km-merge-10min' 'ne0np4_arm_x8v3_lowcon' 'ne0np4_conus_x4v1_lowcon' 'ne0np4_enax4v1' 'ne0np4_twpx4v1' 'r2' 'r05' 'r0125' 'NLDAS' 'ne0np4_northamericax4v1.pg2''
from dir '/global/cscratch1/sd/azamat/e3sm_scratch/cori-knl/tests/bench/20210917-sanity01/chk-05-f-cice/SMS.ne120pg2_ne120pg2.F2010-CICE.cori-knl_intel.20210923/Buildconf/elmconf'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do pg2. Convert your existing tests to ne120_ne120.F2010-CICE. Then also add ne120pg2_r05_oECv3.F2010 and run both until CICE really goes away. Not sure how expensive that makes the whole thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting an error setting up SMS.ne120_ne120.F2010-CICE too:

ERROR : CLM build-namelist::ELMBuildNamelist::add_default() : No default value found for fsurdat.

@amametjanov amametjanov assigned ndkeen and unassigned amametjanov Sep 22, 2021
@amametjanov amametjanov force-pushed the azamat/benching/update-v2-grids-compsets branch from 9735c1e to c76dcfe Compare September 22, 2021 20:17
@wlin7
Copy link
Contributor

wlin7 commented Sep 23, 2021

Still need a fsurdat for ne120pg2_r0125_oRRS18to6v3.WCYCL1950. I am going to generate one and update elm's namelist default file.

@wlin7
Copy link
Contributor

wlin7 commented Sep 24, 2021

1950 r0125 land surfdata ready on inputdata server and NERSC inputdata path, and enabled in elm's namelist_defaults.xml. Also updated the grids for high-res tests.

@amametjanov amametjanov force-pushed the azamat/benching/update-v2-grids-compsets branch from ac12140 to da9ef47 Compare September 24, 2021 23:35
@amametjanov amametjanov marked this pull request as ready for review September 24, 2021 23:41
@amametjanov
Copy link
Member Author

amametjanov commented Sep 28, 2021

Data from some benchmarking runs is at https://gist.github.com/amametjanov/44dd05b3333a7bc286b1bc258720df38.
Briefly:

Low-resolution WC-compset
PEs     nodes sypd
Large   101   5.15 
Medium   51   3.67
Small    25   2.13 

Low-resolution G-compset
nodes sypd
 5    1.50
 8    2.02
10    2.50
15    3.11
20    3.77

Low-resolution F-compset
nodes sypd
22    4.42
43    6.50
85    8.37

High-resolution WC-compset
PEs    nodes sypd
Large  1025  1.14 
Medium  448  0.60 
Small   207  0.36 

High-resolution F-compset
nodes sypd
 113  0.90
 135  1.05
 169  1.28
 225  1.61
 338  2.17
 675  3.04
1350  3.37

High-resolution G-compset
nodes sypd
 100  0.47
 113  0.51
 128  0.62
 150  0.54
 175  0.64
 188  0.70
 200  0.60
 250  0.81
 256  0.69
 300  0.41
 338  0.67
 350  0.70
 375  0.40
 400  0.69
 407  0.79
 438  0.81
 450  0.84
 469  0.82
 500  0.78
 563  0.90
 750  0.85

@@ -7094,7 +7094,7 @@
</pes>
</mach>
</grid>
<grid name="a%ne120np4_">
<grid name="a%ne120np4">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you making this change? I added the underscores here to distinguish between np4 and pg2 grids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I didn't see that underscore was specifically added in #3204 to disable running pg2 cases with existing PE-layouts.
I was just trying to get some data from old PE-layouts. I added a GH issue #4568 to note the need for new PE-layouts.

@ndkeen
Copy link
Contributor

ndkeen commented Sep 29, 2021

Note that the compset/grids using MPASSI will hit an error if using GNU v9 or higher. Would need to use the -CICE option.

@amametjanov
Copy link
Member Author

I think we want to compare V1 (maint-1.0) with Intel compiler vs. V2 (current master) also with Intel. I haven't tried V1 runs with GNU compilers.

Due to compset changes, we can look at per-component (instead of Total) timings: e.g.
V2:

PFS_P1350.ne30pg2_EC30to60E2r2.F2010.cori-knl_intel.eam-bench-noio.20210924
1350: stacked 22x64x2 (nodes x tasks x threads)
    ATM Run Time:     850.705 seconds       42.535 seconds/mday         5.57 myears/wday

V1:

PFS_P1350.ne30_ne30.FC5AV1C-L.cori-knl_intel.cam-bench-noio.20210924
1350: stacked 22x64x2 (nodes x tasks x threads)
    ATM Run Time:    2210.522 seconds      110.526 seconds/mday         2.14 myears/wday

@wlin7
Copy link
Contributor

wlin7 commented Sep 29, 2021

Note that the compset/grids using MPASSI will hit an error if using GNU v9 or higher. Would need to use the -CICE option.

@ndkeen , we have a few test suites using gnu. Any of them facing such a problem?

@rljacob
Copy link
Member

rljacob commented Oct 11, 2021

Is this ready?

@amametjanov
Copy link
Member Author

Yes, this is ready to merge.

ndkeen added a commit that referenced this pull request Oct 14, 2021
…4551)

Update V2 benchmarking grids and compsets (in part by #4488 too):

low-res F: ne30_ne30.FC5AV1C-L -> ne30pg2_EC30to60E2r2.F2010
low-res G: T62_oEC60to30v3.GMPAS-IAF > T62_EC30to60E2r2.GMPAS-IAF
low-res WC: ne30_oECv3.A_WCYCL1850S_CMIP6 -> ne30pg2_EC30to60E2r2.WCYCL1850
high-res F: ne120_ne120.FC5AV1C-H01A -> ne120pg2_r05_EC30to60E2r2.F2010
high-res WC: ne120_oRRS18v3.A_WCYCL1950S_CMIP6_HR -> ne120pg2_r0125_oRRS18to6v3.WCYCL1950

[BFB]
@ndkeen
Copy link
Contributor

ndkeen commented Oct 14, 2021

merged to next

@ndkeen ndkeen merged commit 55d0464 into master Oct 15, 2021
@amametjanov amametjanov deleted the azamat/benching/update-v2-grids-compsets branch November 1, 2021 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants