Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix CMakeLists.txt of NCEP_bufr for gnu compiler #26

Merged

Conversation

gmao-cda
Copy link
Contributor

This PR fixes the compiling flags for NCEP_bufr when using GNU compilers (fix #25 ).
GNU compiler will use the definition -DDYNAMIC_ALLOCATION instead of -DDYNAMIC_ALLOCATION when processing the C files.

Add definition `-DDYNAMIC_ALLOCATION` for GNU C compiler
when compiling NCEP_bufr
@gmao-cda gmao-cda requested a review from a team as a code owner November 14, 2022 20:02
@gmao-cda
Copy link
Contributor Author

gmao-cda commented Nov 14, 2022

@mathomp4 Hi Matt, I wanted to submit a PR merging this to the version of NCEP_Shared with the tag v1.2.0, since components.yaml under GEOSadas develop branch uses NCEP_Shared v1.2.0. But It seems github does not allow me to do this. Instead, it only allows me to merge this PR to the main or other feature/ branches.

Is there a way to let me merge into v1.2.0? Thank you very much!

@mathomp4
Copy link
Member

@gmao-cda Well, v1.2.0 and main are roughly the same:

https://github.com/GEOS-ESM/NCEP_Shared/compare/main..v1.2.0

and we always make PRs against branches (so main).

Though now as I look at that, the CI is wildly out of date. So I might need to make a PR of my own because I'm not happy. Hold on to your PR. You might need to update your fork and do some merging...

@tclune
Copy link
Collaborator

tclune commented Nov 14, 2022

@gmao-cda v1.2.0 is a tag, not a branch. If this is a bug (see separate response), then you would want to do this PR as a hotfix onto main. (which is indeed what you did). Matt can then release v1.2.1.

(We don't use develop on this repo, but we do on many GEOS repos.)

@tclune
Copy link
Collaborator

tclune commented Nov 14, 2022

@mathomp4 I'm wary that this PR is showing as out-of-date given that main is the latest release. @gmao-cda which branch did you start from for this work?

Comment on lines +378 to +382
if (CMAKE_C_COMPILER_ID MATCHES Intel)
target_compile_definitions (${this} PRIVATE $<$<COMPILE_LANGUAGE:C>:STATIC_ALLOCATION>)
elseif (CMAKE_C_COMPILER_ID MATCHES "GNU")
target_compile_definitions (${this} PRIVATE $<$<COMPILE_LANGUAGE:C>:DYNAMIC_ALLOCATION>)
endif ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmao-cda Why do we not want both compilers to do the same thing?

The differences you reported elsewhere suggest that there was a change in the library that requires an update to DYNAMIC, but then it should be for all compilers. E..g, the logic above is problematic for say the NAG compiler. (Granted we only use NAG for MAPL at the moment, but ... we have ambitions.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tclune The main issue is that Intel at the moment uses STATIC_ALLOCATION. I did ask about this in #17 but I am wary of touching code that is only really tested in the ADAS. I think we should move to DYNAMIC_ALLOCATION for all compilers, but I don't know how to test this stuff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have Intel compilers installed on my old macbook air. When I return home after work, I can test if switching to "DYNAMIC_ALLOCATION" for Intel will generate workable bufr libs when decoding my benchmark observations stored in BUFR format.

At least we can get some clues from this test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we cannot test with ADAS, then maybe we should instead introduce a CMAKE "option", with the default being STATIC. One could then specify DYNAMIC on the command line.

The utility of this suggestion depends on how many non-DAS people need this library, and on whether we can come up with a good name for the option.

Just thinking out loud.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathomp4 To have more thorough tests on BUFRLIB, essentially those subs read_*.f90 under GSI will decode bufr and prepbufr files and write out their decoded datasets in binary. We can compare if those decoded binary datasets have same results using the original and modified bufrlibs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recompiled bufr libs with intel compilers (comp/intel/2021.2.0) on Discover,
with DYNAMIC_ALLOCATION applied tor both ifort and icc.

Results:

There is no obvious error using DYNAMIC_ALLOCATION for Intel compiler (comp/intel/2021.2.0):

  1. combfr.x can still link correctly to NCEP_bufr_r4i4,
  2. One experiment decoding my own PrepBUFR data shows the bufr data is correctly decoded.
cda@discover34:bufr_utils>./validate_build.sh
================================================================================
                  SYNOPSIS OF PREPBUFR DECODER using BUFRLIB
                              by TAKEMASA MIYOSHI
--------------------------------------------------------------------------------
 TOTAL NUMBER OF READ-IN RECORDS:    370686
 TOTAL NUMBER OF WRITTEN RECORDS:    301945 (levels/variables separately)
--------------------------------------------------------------------------------
  ADPUPA  AIRCAR  AIRCFT  SATWND  PROFLR  VADWND  SATEMP  ADPSFC  SFCSHP  SFCBOG
    1370       0    1468  117085     327    1580       0  108483   12804       0
   99576       0    1514   95648   12312   13432       0   49393   18264       0
--------------------------------------------------------------------------------
  SPSSMI  SYNDAT  ERS1DA  GOESND  QKSWND  MSONET  GPSIPW  RASSDA  WDSATR  ASCATW
       0       0       0       0       0       0     838     155    5903  120673
       0       0       0       0       0       0       0       0   11806       0
--------------------------------------------------------------------------------
  OTHERS
       0
       0
================================================================================
-------------------------------------------------------
check difference for time step: t-3
 fnin, nobs=t-3.dat       19050
 fnin2, nobs2=t-3.dat_test       19050
 rdiff=  9.9999998E-03
 total error #=           0
-------------------------------------------------------
check difference for time step: t-2
 fnin, nobs=t-2.dat       20077
 fnin2, nobs2=t-2.dat_test       20077
 rdiff=  9.9999998E-03
 total error #=           0
-------------------------------------------------------
check difference for time step: t-1
 fnin, nobs=t-1.dat      102671
 fnin2, nobs2=t-1.dat_test      102671
 rdiff=  9.9999998E-03
 total error #=           0
-------------------------------------------------------
check difference for time step: t
 fnin, nobs=t.dat      116746
 fnin2, nobs2=t.dat_test      116746
 rdiff=  9.9999998E-03
 total error #=           0
-------------------------------------------------------
check difference for time step: t+1
 fnin, nobs=t+1.dat       18250
 fnin2, nobs2=t+1.dat_test       18250
 rdiff=  9.9999998E-03
 total error #=           0
-------------------------------------------------------
check difference for time step: t+2
 fnin, nobs=t+2.dat       18953
 fnin2, nobs2=t+2.dat_test       18953
 rdiff=  9.9999998E-03
 total error #=           0
-------------------------------------------------------
check difference for time step: t+3
 fnin, nobs=t+3.dat        6198
 fnin2, nobs2=t+3.dat_test        6198
 rdiff=  9.9999998E-03
 total error #=           0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ping @gmao-msienkie and see if she thinks this looks good. If so, we might do a two-step process here. One, take in the code in this PR where it's still static on Intel and make a release. Then maybe make another release where we go full DYNAMIC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was checking on this issue. It looks like there are a number of the "read_*" routines where the code has "call closbf" prior to any "call openbf" - which is what had caused a crash before with the version of BUFRLIB we are currently using when compiled with dynamic allocation. I had started a branch to fix this problem in 'read_iasi.f90' but killed my branch when I realized the extent of this "feature" in our code.
You noted in #17 that the newer version of the BUFRLIB would just print a warning message instead of exiting. That could work but I don't think that Ricardo would care to have numerous warning messages (one for each processor executing a problematic "read" routine) in the analysis log file. It looks like that we are still on 11.3.0 (see bvers.f), are there any plans to update the library.
I happened to get a question today regarding a newer version of 'read_cris.f90' someone copied from NCEP. When comparing with our current code I noticed the "call closbf" prior to the "call openbf" had been removed. Perhaps there has been some code cleaning as well as updates for data changes and science changes.
I think that NCEP is currently using a newer version of the BUFR library (11.4.0) in their operations. It looks like from the release notes that a number of features have been added to the library in the latest versions, as well as switching to compiling only with dynamic allocation. I am kind of thinking that when we bring in newer code for the GSI from NCEP we will want to also update the BUFR library at least to the version corresponding to what they are using with their code. At that point the people responsible for components that use the BUFRlib can carry out zero-diff tests to satisfy themselves that the code is working and not changing results.

@gmao-cda
Copy link
Contributor Author

I checked out tag v1.2.0 and create a branch from there

git clone -b v1.2.0 git@github.com:GEOS-ESM/NCEP_Shared.git
git checkout -b bugfix/cda/bufr_v1_2_0_cmakelists_gnu

@mathomp4
Copy link
Member

@mathomp4 I'm wary that this PR is showing as out-of-date given that main is the latest release. @gmao-cda which branch did you start from for this work?

@tclune My guess is he started from v1.2.0 and not from main. If he did that, it would be up to date. main is "ahead" of v1.2.0 only in minor ways (CI and boilerplate updates).

Now once I get in #27 , then main will be slightly more ahead as the CI gets up to modern day.

@mathomp4
Copy link
Member

@gmao-cda Can you re-sync your fork with main and then merge your branch on your fork with main then push? That should then get this to be in line with the main here.

@gmao-cda gmao-cda force-pushed the bugfix/cda/bufr_v1_2_0_cmakelists_gnu branch from f6f6d72 to 213016b Compare November 15, 2022 12:56
@gmao-cda
Copy link
Contributor Author

@mathomp4 Thank you very much Matt. I have synced my main fork with remote main, and merged the updated main into my branch.

git checkout main
git pull origin main
git checkout bugfix/cda/bufr_v1_2_0_cmakelists_gnu
git merge main
git push -f gmao-cda bugfix/cda/bufr_v1_2_0_cmakelists_gnu

Does this work?

@mathomp4
Copy link
Member

@gmao-cda I think that is right. I suppose if it isn't, you can always try again. I sometimes have to do that a few times...

@mathomp4
Copy link
Member

Actually, it does look good. You triggered the new CI!

@gmao-cda
Copy link
Contributor Author

Thank you Matt!

@mathomp4 mathomp4 added the 0 diff The changes in this pull request have verified to be zero-diff with the target branch. label Nov 17, 2022
@mathomp4 mathomp4 self-requested a review November 17, 2022 13:52
@mathomp4 mathomp4 merged commit f6902c5 into GEOS-ESM:main Nov 17, 2022
@gmao-cda
Copy link
Contributor Author

Thank you very much @mathomp4 ! This is my first PR for the codes related to GSI. I learned to write large Fortran programs from GSI, and I'm always grateful to GSI and its developers recorded in the codes. After seeing the contributors of this repo, I never thought one day I could also contribute to it. This is so exciting! Thank you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 diff The changes in this pull request have verified to be zero-diff with the target branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Static libraries generated by NCEP_bufr cannot be linked correctly with GNU compiler
4 participants