Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ufs_da to checkout, build, and link #529

Closed
RussTreadon-NOAA opened this issue Dec 15, 2021 · 7 comments
Closed

Add ufs_da to checkout, build, and link #529

RussTreadon-NOAA opened this issue Dec 15, 2021 · 7 comments
Assignees

Comments

@RussTreadon-NOAA
Copy link
Contributor

This issue is opened to document the addition of UFS-DA from NOAA-EMC/JEDI-T2O into NOAA-EMC/global-workflow. This addition is in support of the goals stated in the JEDI Prototype 3DVar Cycling Capability project.

@RussTreadon-NOAA RussTreadon-NOAA self-assigned this Dec 15, 2021
@RussTreadon-NOAA
Copy link
Contributor Author

Create feature/ufsda as a copy of develop at dd03ed0.

@RussTreadon-NOAA
Copy link
Contributor Author

Test b3e8156 on orion and hera as follows:

  • clone global-workflow repo
  • checkout feature/ufsda
  • cd sorc
  • ./checkout.sh
  • ./build_all.sh
  • ./link_workflow.sh emc [orion, hera]

sorc/ufs_da.fd was created and populated by checkout.sh. build_all.sh executed build_ufs_da.sh which built applications in ufs_da.fd. Note that at present ufs_da.fd can only be built on orion and hera. UFS-DA modulefiles of machines will be added at a later date. It is unlikely modulefiles will be created for WCOSS_C or WCOSS_D since these machines will be decommissioned the summer of 2022.

link_workflow.sh linked ufsda executables to the exec directory. There may be other files (e.g., parm, fix, yaml, scripts, etc) which should be linked or copied from ufsda.fd to specific workflow directions. If so, let me know what needs to be added.

The fv3jedi executables in the exec directory were not tested. ctests may be run from the ufs_da.fd directory. This will be done and reported in a subsequent comment in this issue.

@RussTreadon-NOAA
Copy link
Contributor Author

UFS-DA ctests have been run on Orion and Hera.

Some reminders regarding the user environment before running ctest

  • load the machine specific UFSDA module from ufs_da.fd/UFSDA/modulefiles/
  • set SLURM_ACCOUNT, SALLOC_ACCOUNT, SALLOC_ACCOUNT, SLURM_QOS. For example,
export SLURM_ACCOUNT=${SLURM_ACCOUNT:-"da-cpu"}
export SALLOC_ACCOUNT=${SALLOC_ACCOUNT:-$SLURM_ACCOUNT}
export SBATCH_ACCOUNT=${SBATCH_ACCOUNT:-$SLURM_ACCOUNT}
export SLURM_QOS=${SLURM_QOS:-"debug"}
  • set ulimit -s unlimited. Note. this operation is not permitted on Hera.

All ctests run on Orion passed

100% tests passed, 0 tests failed out of 1003

Label Time Summary:
HofX            =  37.77 sec*proc (9 tests)
QC              =  30.01 sec*proc (9 tests)
aircraft        =  13.66 sec*proc (2 tests)
crtm            =  69.13 sec*proc (81 tests)
executable      =  98.59 sec*proc (107 tests)
femps           =   8.70 sec*proc (1 test)
fortran         =   0.19 sec*proc (1 test)
fv3-jedi        = 914.91 sec*proc (83 tests)
fv3jedi         = 919.57 sec*proc (84 tests)
instrument      =  67.78 sec*proc (18 tests)
ioda            = 105.74 sec*proc (72 tests)
ioda-engines    =  10.82 sec*proc (119 tests)
mpi             = 2290.81 sec*proc (269 tests)
oops            =  45.92 sec*proc (37 tests)
openmp          = 765.98 sec*proc (80 tests)
ozone           =   0.45 sec*proc (2 tests)
radarVAD        =   7.00 sec*proc (2 tests)
rass            =   7.11 sec*proc (2 tests)
saber           = 775.95 sec*proc (188 tests)
satwinds        =   6.39 sec*proc (2 tests)
scatwinds       =   6.37 sec*proc (2 tests)
script          = 2726.99 sec*proc (778 tests)
sfcLand         =   8.37 sec*proc (2 tests)
sfcMarine       =   8.98 sec*proc (2 tests)
sonde           =   9.45 sec*proc (2 tests)
ufo             = 900.56 sec*proc (422 tests)

Total Test time (real) = 2838.86 sec

3 ctest failed on Hera. 1000 passed.

99% tests passed, 3 tests failed out of 1003

Label Time Summary:
HofX            =  27.75 sec*proc (9 tests)
QC              =  31.57 sec*proc (9 tests)
aircraft        =  13.39 sec*proc (2 tests)
crtm            = 221.39 sec*proc (81 tests)
executable      = 122.35 sec*proc (107 tests)
femps           =   8.76 sec*proc (1 test)
fortran         =   0.30 sec*proc (1 test)
fv3-jedi        = 1472.01 sec*proc (83 tests)
fv3jedi         = 1474.66 sec*proc (84 tests)
instrument      =  59.32 sec*proc (18 tests)
ioda            = 171.39 sec*proc (72 tests)
ioda-engines    =  15.00 sec*proc (119 tests)
mpi             = 1767.07 sec*proc (269 tests)
oops            =  58.01 sec*proc (37 tests)
openmp          = 477.20 sec*proc (80 tests)
ozone           =   0.85 sec*proc (2 tests)
radarVAD        =   5.73 sec*proc (2 tests)
rass            =   6.02 sec*proc (2 tests)
saber           = 491.26 sec*proc (188 tests)
satwinds        =   6.34 sec*proc (2 tests)
scatwinds       =   7.21 sec*proc (2 tests)
script          = 3132.93 sec*proc (778 tests)
sfcLand         =   5.66 sec*proc (2 tests)
sfcMarine       =   6.36 sec*proc (2 tests)
sonde           =   7.76 sec*proc (2 tests)
ufo             = 829.80 sec*proc (422 tests)

Total Test time (real) = 3272.42 sec

The following tests FAILED:
        736 - test_ufo_obserror_assign_unittests (Failed)
        802 - test_ufo_function_drawvaluefromfile (Failed)
        896 - test_interpolate_data_from_file_predictor (Failed)
Errors while running CTest
Output from these tests are in: /scratch1/NCEPDEV/da/Russ.Treadon/git/global-workflow/ufsda/sorc/ufs_da.fd/UFS-DA/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

JEDI-T2O PR #5 reports similar findings from others developers on Orion and Hera. Interestingly, when ctests were previously run on Hera the following three tests failed

The following tests FAILED:
        730 - test_ufo_obserror_assign_unittests (Failed)
        795 - test_ufo_function_drawvaluefromfile (Failed)
        809 - test_ufo_function_satwind_indiv_errors (Failed)

The first two ctest failures are identical from both runs. The last ctest failure differs between today's run and the previous run (12/2/2021). It would be worthwhile to dig a bit deeper into these failures.

@RussTreadon-NOAA
Copy link
Contributor Author

Cory noted that the 3 failures on Hera may be due to issues with the user environment, sample data, or test. Since all tests pass on Orion, it's not the best investment of time at present to dig into the three Hera failures. A better use of time would be to understand the inputs and outputs needed to run FV3-JEDI 3DVar or hybrid-3DEnVar. Good advice! Doing so keeps this project moving foward.

@RussTreadon-NOAA
Copy link
Contributor Author

ufsda branch cleanup
Merge feature/ufsda-rocoto-jobs and feature/ufsda-parm into feature/ufsda. After this, delete feature/ufsda-rocoto-jobs and feature/ufsda-parm. Details are provided below.

@WalterKolczynski-NOAA
Copy link
Contributor

Is this different than what was added in #871 as gdas.cd? Did that PR actually satisfy this as well as #521? Or am I missing something?

@RussTreadon-NOAA
Copy link
Contributor Author

#521 is already closed. I'll close this issue as well. New issues can be opened as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants