Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub Issue NOAA-EMC/GSI#112. A refactor of CMake build framework. #327

Merged
merged 91 commits into from
May 18, 2022
Merged

GitHub Issue NOAA-EMC/GSI#112. A refactor of CMake build framework. #327

merged 91 commits into from
May 18, 2022

Conversation

aerorahul
Copy link
Contributor

@aerorahul aerorahul commented Mar 3, 2022

What does this PR do:

  1. Completely refactor the GSI CMake framework
    a. Leverage packaged dependencies e.g. NCEPLibs.
    b. Provide mechanism to import built and installed ncdiag, gsi, enkf in external packages along with their transitive dependencies.
    c. Remove hard-wired platform specifics in the build framework
  2. Support GNU compiler set
    a. Fix codes that fail to build with GNU Fortran compilers
    b. Fixes many non-standard practices that are uncovered when building with the GNU compiler set. E.g. use of .eqv. when comparing logicals instead of ==.
    c. Adds modulefile.ProdGSI.hera.gnu to build on Hera with GNU. All codes were successfully built. It is recommended that the GSI is now always build tested with GNU going forward. Several warnings are detected with GNU compilers due to the use of deprecated Fortran coding standards.
  3. Remove dependency on gitsubmodule GSI-libsrc (See PRs GitHub Issue NOAA-EMC/GSI#302. Removal of libsrc from the authoritative repository #329 and GitHub Issue NOAA-EMC/GSI#320. Replaced fv3gfs_ncio with hpc-stack built ncio. #340)
    a. Remove options to build third-party libraries e.g. NCEPlibs
    b. Modify codes that previously built NCEPLIBS-ncio internally in the GSI.
  4. Enables building the GSI, EnKF, Utilities as well as ncdiag and GSDCloud as separate projects. This functionality enables 1b. Monitoring utilities should be their own project, but that is not pursued in this PR.
  5. Replace README.cmake with INSTALL.md that outlines the dependencies, build and installation as well as regression testing guidelines and removing references to specific machines.
  6. An update to the build script. build_all_cmake.sh is replaced by build.sh. Minor updates to other build related scripts are provided.

Tagging @MichaelLueken-NOAA @christopherwharrop-noaa @christinaholtNOAA @arunchawla-NOAA @JacobCarley-NOAA

aerorahul and others added 30 commits February 9, 2021 14:43
…3gfs_ncio with ncio module, replace bufr_4_DA with bufr_d
@aerorahul
Copy link
Contributor Author

Actually, on second thought (and thinking further), we do load more modules that are not needed for compiling (e.g. python).
I am fine with loading prod_util in the modulefile.ProdGSI.platform.lua
This will maintain current capability and have no disruptions.

@aerorahul
Copy link
Contributor Author

@aerorahul As @RussTreadon-NOAA said, this is certainly something I can do when I send out the email following the merging of this work to the authoritative repository.

@MichaelLueken-NOAA @RussTreadon-NOAA
I have restored loading of prod_util in the modulefile.ProdGSI.platform.lua

@MichaelLueken
Copy link
Contributor

@aerorahul I am unable to build the GSI on WCOSS - the cmake module isn't being loaded. I was able to correct this issue by altering modulefiles/gsi_wcoss_dell_p3.lua. Since cmake/3.20.0 is only available through the stack, hpc, hpc-ips, and hpc-impi needs to be set before cmake can be set. My gsi_wcoss_dell_p3.lua file looks like:

prepend_path("MODULEPATH", "/usrx/local/nceplibs/dev/hpc-stack/libs/hpc-stack/modulefiles/stack")

...

load(pathJoin("hpc", hpc_ver))
load(pathJoin("hpc-ips", hpc_ips_ver))
load(pathJoin("hpc-impi", hpc_impi_ver))

load(pathJoin("python", python_ver))
load(pathJoin("lsf", lsf_ver))
load(pathJoin("cmake", cmake_ver))

After making this change, the code built and the regression tests successfully ran.

The code fails to compile on WCOSS2 as well. In ush/detect_machine.sh, I had to add:

clogin01.cactus.wcoss2.ncep.noaa.gov) MACHINE_ID=wcoss2 ;; ### cactus

through clogin09 and:

dlogin01.dogwodd.wcoss2.ncep.noaa.gov) MACHINE_ID=wcoss2 ;; ### dogwood

through dlogin09. Finally, in ush/module_setup.sh, I had to add a section for WCOSS2:

+elif [[ $MACHINE_ID = wcoss2* ]]; then
+    # We are on WCOSS2
+    module reset
+    export prod_util_ver=2.0.10
+

With these changes, I was able to build the executables on WCOSS2 and a test was submitted, which successfully ran.

There were no issues encountered while attempting to compile on Hera using both the GNU and Intel compilers. The global_T62 was submitted and passed without issue for the Intel compiler. A single global_T62 test was run using the GNU executable, and it ran through to completion without issue.

While building ncdiag and util/NMC_Bkerror, there were several warning messages that should be addressed before this work is submitted to the review committee and merged into the authoritative develop branch. For src/ndiag:

/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/src/ncdiag/ncdf_string_m.f90(31): warning #6717: This name has not been given an explicit type.   [NCDF_NCDF_STRING_CLEAR]
              ncdf_ncdf_string_clear 
--------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/src/ncdiag/ncdf_string_m.f90(179): warning #6717: This name has not been given an explicit type.   [I]
        do i = 1, length
-----------^

Looking in src/ncdiag/ncdf_string_m.f90, there is no ncdf_ncdf_string_clear routine. There is a ncdf_string_clear routine. I removed the first ncdf_ and this corrected the line 31 issue. Adding i to the integer declaration on line 175:

integer :: i,length

corrects the line 179 issue.

For util/NMC_Bkerror:

/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc_aero/mat.f90(823): warning #7919: The value was too small when converting to REAL(KIND=4); the result is zero.   [1.E-60]
      PARAMETER(CRIT=1.E-60)
---------------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc_aero/mat.f90(2963): warning #7919: The value was too small when converting to REAL(KIND=4); the result is zero.   [1.D-60]
      PARAMETER(CRIT=1.D-60)
---------------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc_aero/mat.f90(3198): warning #7919: The value was too small when converting to REAL(KIND=4); the result is zero.   [1.E-60]
       IF(ABS(T).LE.1.E-60)THEN
--------------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc_aero/mat.f90(3667): remark #8291: Recommended relationship between field width 'W' and the number of fractional digits 'D' in this edit descriptor is 'W>=D+7'.
        PRINT'(1X,I3,10(1X,E11.5))',I,(A(I,J),J=J1,J2)
----------------------------^

and:

/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc/mat.f90(823): warning #7919: The value was too small when converting to REAL(KIND=4); the result is zero.   [1.E-60]
      PARAMETER(CRIT=1.E-60)
---------------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc/mat.f90(2963): warning #7919: The value was too small when converting to REAL(KIND=4); the result is zero.   [1.D-60]
      PARAMETER(CRIT=1.D-60)
---------------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc/mat.f90(3198): warning #7919: The value was too small when converting to REAL(KIND=4); the result is zero.   [1.E-60]
       IF(ABS(T).LE.1.E-60)THEN
--------------------^
/lfs/h2/emc/da/noscrub/Michael.Lueken/WCOSS2/util/NMC_Bkerror/sorc/mat.f90(3667): remark #8291: Recommended relationship between field width 'W' and the number of fractional digits 'D' in this edit descriptor is 'W>=D+7'.
        PRINT'(1X,I3,10(1X,E11.5))',I,(A(I,J),J=J1,J2)
----------------------------^

Replacing 1.E-60 and 1.D-60 with 1.E-37 and 1.D-37, respectively, corrected the issues on lines 823, 2963, and 3198. Replacing E11.5 with E12.5 corrected the remark for line 3667.

aerorahul added 2 commits May 11, 2022 16:37
update detect_machine.sh and module-setup.sh for wcoss2.
be smart in auto-detection.
fix warnings in source codes.
@aerorahul
Copy link
Contributor Author

Thanks @MichaelLueken-NOAA for the review, noting the issues and solutions.
Thanks for identifying the cmake load order in wcoss_dell_p3.

I don't know where the ncdf warning crept in and why it wasn't flagged before.
The NMC_Bkerror was never compiled as part of the utilities before by default. It is not a utility routinely used, so its good to catch any warnings.
I have pushed changes that should fix these errors for WCOSS2.

I have tested the builds on WCOSS2, WCOSS Dell, Orion and Hera. On Hera, I built with Intel and GNU.
They were all successful.

Copy link
Contributor

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerorahul Thank you very much for making the changes to ensure that the code builds on WCOSS, WCOSS2, Hera, and Orion. I have found one last minor issue that needs to be addressed, then this work will go out to the review committee. On line 45 of src/enkf/gridinfo_fv3reg.f90, please explicitly declare the components used from mpi. I have noted the necessary components in this review.

@@ -42,7 +42,8 @@ module gridinfo
!
!$$$

use mpisetup, only: nproc, mpi_integer, mpi_real4, mpi_comm_world
use mpi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All modules need to explicitly declare the components used. Please replace:

use mpi

with

use mpi, only: mpi_real4,mpi_comm_world

@MichaelLueken MichaelLueken linked an issue May 12, 2022 that may be closed by this pull request
@christopherwharrop-noaa
Copy link
Contributor

@MichaelLueken-NOAA / @aerorahul - The location of hpc-stack on Cheyenne has moved (replaced by Spack-based one managed by EPIC) and I'd like to get the Cheyenne modulefiles updated to reflect that because other parts of the UFS are already making those updates. Cheyenne is down all this week for annual maintenance, so I've been unable to test there. If this PR goes off to the review committee, would there still be an opportunity to update modulefiles/gsi.cheyenne.gnu.lua and modulefiles/gsi.cheyenne.intel.lua once build tests can be conducted on Cheyenne? Or would that need to be updated in a later PR. If possible, I'd really like to get it correct the first time, especially given the cadence at which PRs move through the review process here. Once Cheyenne is back online, it will only take me a few minutes to test, and the necessary change would probably be confined to the line that updates the MODULEPATH for loading hpc-stack.

@MichaelLueken
Copy link
Contributor

@christopherwharrop-noaa Generally, once changes are submitted to the review committee, no additional changes should be made. However, since the modification would be to the path of the modulefiles for Cheyenne, an exception could be made to allow changes to these two files after they have been submitted.

Would it be best to apply these changes now, before submitting the work to the review committee? It is easier to back out of changes rather than apply changes in the middle of the process.

So long as I can get these changes out to the review committee today, this work should be merged to the authoritative repository by next Wednesday, May 18.

@christopherwharrop-noaa
Copy link
Contributor

@MichaelLueken-NOAA - I do not want to delay the merging of this PR as it represents a important update that a lot of downstream projects are waiting for. And I certainly understand the rule you explained. I would have already proposed the update were it not for the fact that Cheyenne is down for annual maintenance (for a week) and I can't verify the change I'd propose. I think* that all that will be needed is to replace:

prepend_path("MODULEPATH", "/glade/p/ral/jntp/GMTB/tools/hpc-stack-v1.2.0/modulefiles/stack")

with

prepend_path("MODULEPATH", "/glade/work/epicufsrt/GMTB/tools/gnu/10.1.0/hpc-stack-v1.2.0/modulefiles/stack") # for GNU

prepend_path("MODULEPATH","/glade/work/epicufsrt/GMTB/tools/intel/2022.1/hpc-stack-v1.2.0_6eb6/modulefiles/stack") # For Intel

I'd prefer not to make the change without testing it first because the old hpc-stack is still in place (and still works until they remove it), and I don't want to risk merging broken settings. Cheyenne can sometimes yield surprising behavior.

@christopherwharrop-noaa
Copy link
Contributor

@MichaelLueken-NOAA - If I were to submit a new PR (after this one is merged) that ONLY touches the modulefiles for Cheyenne and nothing else, what would be the estimated turnaround time for getting that merged? Would that be something we could fast-track?

@MichaelLueken
Copy link
Contributor

@MichaelLueken-NOAA - If I were to submit a new PR (after this one is merged) that ONLY touches the modulefiles for Cheyenne and nothing else, what would be the estimated turnaround time for getting that merged? Would that be something we could fast-track?

@christopherwharrop-noaa Non-source code changes don't go out for review, so the modification to the modulefiles would be able to be merged to the authoritative repository immediately (so long as nothing else has been submitted to the review committee before hand).

@christopherwharrop-noaa
Copy link
Contributor

@MichaelLueken-NOAA - In that case, I don't want to perturb this PR with the Cheyenne module stuff. Let's leave this one as is. Once Cheyenne comes back online, I'll test and submit a new PR with those small Cheyenne build updates.

@MichaelLueken
Copy link
Contributor

@aerorahul At your earliest convenience, please add , only: mpi_real4,mpi_comm_world to use mpi (line 45) in src/enkf/gridinfo_fv3reg.f90. I will submit this work to the review committee and merge the work to the authoritative repository next Wednesday. I will then await the update to the Cheyenne modulefiles paths from @christopherwharrop-noaa, before submitting the next update to the review committee.

@aerorahul
Copy link
Contributor Author

, only: mpi_real4,mpi_comm_world

done.

@MichaelLueken MichaelLueken changed the title A refactor of CMake build framework GitHub Issue NOAA-EMC/GSI#112. A refactor of CMake build framework. May 18, 2022
@MichaelLueken
Copy link
Contributor

The due date for feedback from the review committee has passed without comment, so I will now give final approval to these changes and merge them to the authoritative repository.

@MichaelLueken MichaelLueken merged this pull request into NOAA-EMC:develop May 18, 2022
@MichaelLueken
Copy link
Contributor

@christopherwharrop-noaa This work has been merged to the authoritative repository. As discussed previously, should we move forward with changing the MODULEPATH for gsi_cheyenne.intel.lua and gsi_cheyenne.gnu.lua at this time?

@aerorahul aerorahul deleted the feature/cmake-update branch May 18, 2022 16:19
@christopherwharrop-noaa
Copy link
Contributor

@MichaelLueken-NOAA - Thank you! I have the Cheyenne update working for Intel, but it looks like the the newer hpc-stack on Cheyenne is missing an w3nco package (a dependency for nemsio) for Gnu. It's not a GSI issue, it's a hpc-stack issue. Most likely we just need to submit a request to have the appropriate w3nco installed. They are usually pretty fast with those requests.

@MichaelLueken
Copy link
Contributor

@MichaelLueken-NOAA - Thank you! I have the Cheyenne update working for Intel, but it looks like the the newer hpc-stack on Cheyenne is missing an w3nco package (a dependency for nemsio) for Gnu. It's not a GSI issue, it's a hpc-stack issue. Most likely we just need to submit a request to have the appropriate w3nco installed. They are usually pretty fast with those requests.

@christopherwharrop-noaa Okay, sounds good. Please let me know when everything is ready, create a new issue for the work, then follow the guidance here:

https://github.com/NOAA-EMC/GSI/wiki/GSI-GitHub-General-User-Information#preparing-a-pull-request-pr

to create a new PR to bring in the updated modulefiles for Cheyenne.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update DA cmake build
6 participants