Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/ISSUE] TOMAS simulations failing GCHP and GCClassic integration tests #1741

Closed
4 tasks done
msulprizio opened this issue Apr 7, 2023 · 49 comments · Fixed by #1959
Closed
4 tasks done

[BUG/ISSUE] TOMAS simulations failing GCHP and GCClassic integration tests #1741

msulprizio opened this issue Apr 7, 2023 · 49 comments · Fixed by #1959
Assignees
Labels
category: Bug Something isn't working never stale Never label this issue as stale topic: Benchmarking and Testing Related to CI, integration tests, or scientific benchmarking

Comments

@msulprizio
Copy link
Contributor

Name and Institution (Required)

Name: Melissa Sulprizio
Institution: Harvard/GCST

Confirm you have reviewed the following documentation

Description of your issue or question

Following the merge of #1378 and #1569 in dev/14.2.0, TOMAS simulations are failing GCHP and GCClassic integration tests. TOMAS passes compilation tests for both models but fails during the simulations. It appears to be out-of-bounds errors. Log files are attached below.

We will look into fixing this during the 14.2.0 benchmarking or for a subsequent version release.

Tagging @BettyCroft @yantosca

@msulprizio msulprizio added category: Bug Something isn't working Aerosols WG labels Apr 7, 2023
@BettyCroft
Copy link
Contributor

@msulprizio I think that TOMAS40 will not work with GCHP, only TOMAS15 - additional code developments are needed for TOMAS40. It is TOMAS15 that fails with GCHP or TOMAS40?

@BettyCroft
Copy link
Contributor

@msulprizio looking at those files - it does look to be an issue with TOMAS15

@yantosca
Copy link
Contributor

yantosca commented Apr 7, 2023

@msulprizio, @BettyCroft: There is an out-of-bounds error in state_diag%tomasmnfixcheck3mass. I think this may be because the diagnostic slot indexing may be off.

@BettyCroft
Copy link
Contributor

@msulprizio @yantosca - should it be ok if I checkout /dev/14.2.0 at this point - and try a few tests with GCHP-TOMAS in this version? I was trying to compile on WashU cluster just now and ran into a compilation error

[ 80%] Building Fortran object src/GCHP_GridComp/HEMCO_GridComp/HEMCO/src/Extensions/CMakeFiles/HCOX.dir/hcox_tomas_dustdead_mod.F.o
/storage1/fs1/rvmartin/Active/bcroft/GCHP_apr7_2023/src/MAPL/gridcomps/ExtData2G/ExtDataGridCompNG.F90(1129): error #6632: Keyword arguments are invalid without an explicit interface. [RC]
call ESMF_InfoGetFromHost(grid,infoh,rc=status)
---------------------------------------------^

@msulprizio
Copy link
Contributor Author

@BettyCroft Could you checkout dev/tomas instead? It's at the current state of dev/14.2.0 but I will be reverting dev/14.2.0 in the next few minutes to just before the TOMAS merges. Our benchmark tests showed differences with respect to the reference run before the TOMAS updates, which we were not expecting. (If the updates were for TOMAS simulations they should not impact simulations with TOMAS off.) See #1742 and feel free to follow up there as well.

After checking out dev/tomas in the GCHP or GCClassic superproject, make sure to do git submodule update --init --recursive to ensure all submodules are at the proper commit for that version.

@BettyCroft
Copy link
Contributor

Thanks @msulprizio I'm trying from the main branch to type git checkout dev/tomas and it seems to be unknown - whereas I was able to chekcout dev/14.2.0 - should I wait before trying?

[bcroft@compute1-client-1 GCHP_apr7_2023]$ git checkout main
M src/GCHP_GridComp/GEOSChem_GridComp/geos-chem
M src/GCHP_GridComp/HEMCO_GridComp/HEMCO
M src/MAPL
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
Post-checkout: Fixing file modes... done.
[bcroft@compute1-client-1 GCHP_apr7_2023]$ git checkout dev/tomas
error: pathspec 'dev/tomas' did not match any file(s) known to git

@msulprizio
Copy link
Contributor Author

@BettyCroft Can you try git fetch -p and then git checkout dev/tomas?

@BettyCroft
Copy link
Contributor

@msulprizio thanks! That worked. Should I be able to use that version of the code for a few tests?

I'm also interested in working with v14 because I'm finding some memory leak issues with the v13 code that I have for GCHP-TOMAS.

@msulprizio
Copy link
Contributor Author

Should I be able to use that version of the code for a few tests?

Please do! If you are able to fix any issues we welcome you to create a new pull request so we can pull them in on our end.

@BettyCroft
Copy link
Contributor

BettyCroft commented Apr 7, 2023

@msulprizio thanks again! I'd be happy to pass along any fixes - the compilation (esm:intel-2021.1.2) in dev/tomas aborts with an error related to RC in some MAPL subroutines - would you understand and be able to advice on a fix for this compilation issue? Those are not subroutines that I usually interact with.

[ 80%] Building Fortran object src/GCHP_GridComp/HEMCO_GridComp/HEMCO/src/Core/CMakeFiles/HCO.dir/hco_driver_mod.F90.o
/storage1/fs1/rvmartin/Active/bcroft/GCHP_apr7_2023/src/MAPL/gridcomps/ExtData2G/ExtDataGridCompNG.F90(1129): error #6632: Keyword arguments are invalid without an explicit interface.   [RC]
        call ESMF_InfoGetFromHost(grid,infoh,rc=status)
---------------------------------------------^
/storage1/fs1/rvmartin/Active/bcroft/GCHP_apr7_2023/src/MAPL/gridcomps/ExtData2G/ExtDataGridCompNG.F90(1131): error #6632: Keyword arguments are invalid without an explicit interface.   [RC]
        isPresent = ESMF_InfoIsPresent(infoh,'STRETCH_FACTOR',rc=status)
--------------------------------------------------------------^

@msulprizio
Copy link
Contributor Author

We use gcc 10.2.0 here and on AWS and do not get this error.

@lizziel @Jourdan-He Have you seen this error before? Has anyone attempted to build GCHP with intel-2021.1.2?

@yantosca
Copy link
Contributor

yantosca commented Apr 7, 2023

@BettyCroft: You may be still using ESMF 8.0.0 or 8.0.1. A recent update now requires at least ESMF 8.1.0. Updating the ESMF library should get past this error.

@yantosca
Copy link
Contributor

yantosca commented Apr 7, 2023

@BettyCroft: Also see our uploaded library instructions in the "latest" ReadTheDocs. We now instruct users to use GCC instead of Intel compilers.

@BettyCroft
Copy link
Contributor

@yantosca Thanks! I'll work on making the suggested updates.

@BettyCroft
Copy link
Contributor

@msulprizio - thanks again for your help with dev/tomas. I've been able to compile GCHP with TOMAS after checking out dev/tomas and updating submodules - and now am trying a few test simulations with GCHP-TOMAS - and have a quick question.

Is GET_TAUb in GeosUtil/time_mod.F90 becoming obsolete?

tomas_mod.F90 uses GET_TAUb() and the value is printing out as the year 1960 value from CAP.rc - as opposed to the year 2019 value that I have in cap_restart for the start time. This issue causes the simulation to abort due to unacceptable conditions outside of a spinup period for TOMAS . Would you have suggestions on on to pass the value in cap_restart to tomas_mod.F90. I have sourced setCommonRunSettings.sh. I can make the simulation run by overwriting GET_TAUb() in tomas_mod.F90 - but am looking for a longterm solution.

@yantosca
Copy link
Contributor

tomas_mod.F90 uses GET_TAUb() and the value is printing out as the year 1960 value from CAP.rc - as opposed to the year 2019 value that I have in cap_restart for the start time.

Tagging @lizziel who may have an idea about this.

@lizziel
Copy link
Contributor

lizziel commented Apr 28, 2023

Hi @BettyCroft, I am looking into why the time values are not correct. In general we prefer using elapsed time rather than differencing start and current time so I am also looking into whether elapsed time has a problem. I will report back soon.

@BettyCroft
Copy link
Contributor

Hi @lizziel - thanks!

@lizziel
Copy link
Contributor

lizziel commented Apr 28, 2023

Both begin date and elapsed time in time_mod.F90 reflect BEG_DATE in CAP.rc, which is counter-intuitively the ESMF clock start in MAPL. I do not think this is an issue in current GCHP simulations because we try to steer clear of date-dependent code, even relative in time. That being said, I am going to put in a fix and see if there is an impact. The easiest quick fix for you is to change BEG_DATE to be the same as what is in cap_restart. You can do this manually or edit setCommonRunSettings.sh to do it automatically along with all of the other config file updates.

Fyi, I will be taking over for Melissa in GCHP TOMAS validation and preparation for inclusion in the model. I will be in touch about this soon.

@BettyCroft
Copy link
Contributor

Thanks @lizziel, I made the suggested change to CAP.rc, as a quick fix. Look forward to hearing from you about the GCHP-TOMAS validation and preparation for inclusion in the model.

@stale
Copy link

stale bot commented Jun 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the stale No recent activity on this issue label Jun 9, 2023
@msulprizio msulprizio added never stale Never label this issue as stale and removed stale No recent activity on this issue labels Jun 10, 2023
@BettyCroft
Copy link
Contributor

Hi @lizziel, @msulprizio and @yantosca, As an update on the TOMAS developments, I've been able to successfully checkout the dev/tomas branch, compile and run with TOMAS15 for both the GCClassic and GCHP implementations when using dev/tomas. I found that 1) the HISTORY.rc.template needed an update in order to run the GCClassic and that 2) the dust tuning parameter was not properly defined for TOMAS for GCHP. I can send along these updates and would be great if we could move towards getting these TOMAS updates in the standard release.

@msulprizio
Copy link
Contributor Author

Hi @BettyCroft. Thanks for these updates. Please do send us your updates via a pull request off of the dev/tomas branch. I will incorporate that into dev/14.3.0 as early as next week.

@yantosca
Copy link
Contributor

yantosca commented Dec 8, 2023

@msulprizio @BettyCroft I'll take a look

@yantosca
Copy link
Contributor

yantosca commented Dec 8, 2023

@msulprizio @BettyCroft: I've found an out-of-bounds error in hco_interp_mod.F90 as well. This might be the root cause.

Fortran runtime error: Index '48' of dimension 3 of array 'regr_4d' above upper bound of 47

Error termination. Backtrace:
#0  0x14d3b75 in __hco_interp_mod_MOD_regrid_mapa2a
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_interp_mod.F90:398
#1  0x1473c34 in __hcoio_read_mod_MOD_hcoio_read
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hcoio_read_std_mod.F90:1489
#2  0x142bfb6 in __hcoio_dataread_mod_MOD_hcoio_dataread
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hcoio_dataread_mod.F90:206
#3  0x14133a6 in readlist_fill
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_readlist_mod.F90:502
#4  0x1414c98 in __hco_readlist_mod_MOD_readlist_read
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_readlist_mod.F90:371
#5  0x13bc491 in __hco_driver_mod_MOD_hco_run
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/HEMCO/src/Core/hco_driver_mod.F90:165
#6  0x58739b in __hco_interface_gc_mod_MOD_hcoi_gc_run
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/GeosCore/hco_interface_gc_mod.F90:998
#7  0x4ced0f in __emissions_mod_MOD_emissions_run
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/GeosCore/emissions_mod.F90:185
#8  0x4076d4 in geos_chem
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/main.F90:701
#9  0x40cfa5 in main
        at /n/holyscratch01/jacob_lab/ryantosca/tests/tomas/test_tomas/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/main.F90:32

This is happening in this loop:

          ! REGR_4D are the remapped fractions.
          DO T  = 1, NTIME
          DO L  = 1, HcoState%NZ
          DO J  = 1, HcoState%NY
          DO I2 = 1, HcoState%NX
             IF ( REGFRACS(I2,J,L,T) > MAXFRACS(I2,J,L,T) ) THEN
                MAXFRACS(I2,J,L,T) = REGR_4D(I2,J,L,T)
                INDECES (I2,J,L,T) = IVAL
             ENDIF
          ENDDO
          ENDDO
          ENDDO
          ENDDO

It is because the REGR_4D array is dimensioned with NLEV (which is 47) but the loop goes up to HcoState%NZ
(which is 72):

       ALLOCATE( REGR_4D(HcoState%NX,HcoState%NY,NLEV,NTIME), STAT=AS )

I think resizing REGR_4D would fix this.

@BettyCroft
Copy link
Contributor

Hi @msulprizio and @yantosca - GCHP-TOMAS in dev/14.3.0 appears to be compiling and executing successfully on my local cluster with the resizing of REGR_4D in hco_interp_mod.F90. I'll check the output today and plan to send a pull request by tomorrow for the updates to tomas_mod.F90 to address the TOMAS diagnostic collection issue and also to add TOMAS tracers to SpeciesConc diagnostic collection.

Copy link
Contributor

Thanks @BettyCroft. We'll await your PR.

@msulprizio
Copy link
Contributor Author

Thanks @BettyCroft. I suspect the GCHP TOMAS runs only fail with the additional debug flags utilized in the integration tests. Have you tried building with -DCMAKE_BUILD_TYPE=Debug?

@BettyCroft
Copy link
Contributor

Hi @msulprizio and @yantosca - thanks for clarifying. As you suspected, I did not have the debug flag turned on. I will try again with this flag on. Please let me know if you have any additional insights on what might be causing the issue.

@BettyCroft
Copy link
Contributor

BettyCroft commented Dec 12, 2023

Hi @msulprizio and @yantosca- I am now reproducing the error that Melissa had coming from hcox_tomas_jeagle_mod.F90. I printed out the values of HcoState%NY and HcoState%NX in this subroutine and they print out as 6 and 8 respectively for my C24 simulation - as a result model is not able to print out those hard-coded values of 50 and 10 as in the debug statement (copied below). My understanding of the GCHP grid setup is not perfect - are those values of HcoState%NY and HcoState%NX correct? Could we address the issue by reassigning the values of 50 and 10 to ones that will match any grid?

!### Debug
print*, 'Aerosol number AT 50, 10,: ', Inst%TC1(ii,jj,1,k)

@yantosca
Copy link
Contributor

Ah OK. I see the issue. I think you can I = 1..24, J=1..24, since C24 is the minimum resolution for GCHP. @lizziel, is that right?

@lizziel
Copy link
Contributor

lizziel commented Dec 12, 2023

Every processor will have a subset of the global grid and therefore I=50 and J=10 will only work if you are using few cores. Are 50 and 10 chosen as a random grid box or should they correspond to a specific lat/lon? If you need a value at any grid box then use I,J = 1,1. Note that it will be printed by ALL cores which is something we avoid.

@lizziel
Copy link
Contributor

lizziel commented Dec 12, 2023

@BettyCroft
Copy link
Contributor

BettyCroft commented Dec 12, 2023

Thanks @lizziel ! I think that the 50 and 10 were originally chosen as random points over the oceans. The second debug print that you identified as needing modification is the one that was causing the simulation to abort because those indices do not match with GCHP resolutions.

@BettyCroft
Copy link
Contributor

BettyCroft commented Dec 12, 2023

Hi @lizziel - I have also identified an issue at lines 6501-6510 in tomas_mod.F90 where there are hard-coded GC-Classic grid resolutions such as IF ( TRIM(State_Grid%GridRes) == '4.0x5.0' etc. Would you be able to provide me with some guidance on the syntax to use to modify this IF block to include GCHP grid resolutions?

@lizziel
Copy link
Contributor

lizziel commented Dec 13, 2023

Hi Betty, all of GEOS-Chem needs to be grid-independent to be compatible with GCHP and external models that use it. If there is any grid-dependence, i.e. our dust mass scaling factors, they should be input via configuration file and not hard-coded in the source code. The exception to this is vertical grids. Would it be possible to rewrite the tomas_mod.F90 code block for 4x5 to be independent of horizontal grid? The code is here:

IF ( TRIM(State_Grid%GridRes) == '4.0x5.0' ) THEN
if( (pres / 100.) .lt. cplevels(9) ) then
ionrate = cosmic_ions(i,j,9) * boxmass / boxvol
if(State_Met%FRCLND(I,J) .gt. 0.2) then
ionrate = ionrate + soil_ions(9)
endif
elseif((pres/100.) .gt. cplevels(1)) then
ionrate = cosmic_ions(i,j,1) * boxmass / boxvol
if( State_Met%FRCLND(I,J) .gt. 0.2 ) then
ionrate = ionrate + soil_ions(1)
endif
else
lev=2
do while (pres / 100. .lt. cplevels(lev))
lev=lev+1
enddo
weight=( cplevels( lev - 1 ) - pres / 100. ) / &
( cplevels( lev - 1 ) - cplevels(lev) )
ionrate=( cosmic_ions(i,j,lev ) * weight + &
cosmic_ions(i,j,lev-1) * (1.e+0_fp - weight) ) &
* boxmass / boxvol
if( State_Met%FRCLND(I,J) .gt. 0.2) then
ionrate=ionrate + ( soil_ions( lev ) * weight + &
soil_ions( lev-1 ) * (1.e+0_fp-weight) )
endif
endif
. If you are seeing it at lines 6501-6510 we must be using different code bases. Are you using dev/14.3.0?

If you need a debug print for a random point over the ocean I suggest using the ocean mask variable in State_Met to find a grid box. It's complicated, however, since some core regions may be over land only. I'll think more about the best way to do it.

@BettyCroft
Copy link
Contributor

BettyCroft commented Dec 13, 2023

Hi @lizziel - thanks for your helpful message! Sorry- I meant to indicate lines 6358-6367 in tomas_mod.F90 - yes I'm using dev/14.3.0 but forgot I had already added a few lines of code updates - that changed the numbering I was viewing.

Looks like we need to rewrite the lines at 529-540 and also 6358-6367 in tomas_mod.F90 in order to be fully grid independent. We could add something with the same approach as for dust - but we need to decide on the values for SGCTSCALE and also ionrate for the various horizontal resolutions of GCHP. I'll add @theloniuspunk here in case Jeff has suggestions on the settings to use across resolutions.

@msulprizio msulprizio removed this from the 14.3.0 milestone Jan 8, 2024
@BettyCroft
Copy link
Contributor

Hi @msulprizio, @lizziel and @yantosca - as an update - I have implemented the fixes that we discussed above and the model is now running with debug option on in dev/14.3.0 (on both on the WashU and Dal platforms). @theloniuspunk and I will review the model output in the next day or two - and thereafter I plan to make a pull request for these fixes. Thanks again for your advice!

@msulprizio
Copy link
Contributor Author

Thanks for the updates @BettyCroft!

@BettyCroft
Copy link
Contributor

Hi @msulprizio - is the timeline too tight for getting these updates into v14.3.0?

@msulprizio
Copy link
Contributor Author

@BettyCroft Yes, probably, but we can release Z versions as needed so the fixes could go into 14.3.1 or later version.

@BettyCroft
Copy link
Contributor

Thanks @msulprizio! I'll prioritize sending the pull request as soon as possible.

@BettyCroft
Copy link
Contributor

BettyCroft commented Jan 12, 2024

Hi @msulprizio @lizziel @yantosca, I have just made pull requests on geos-chem (2116) HEMCO (259) and GCHP (369). These updates are expected to enable TOMAS simulations to integrate successfully in debug mode in the version dev/14.3.0. I think that that have uploaded these code updates correctly - however it looks like I indicated that I wanted to merge them into main and I meant to indicate dev/14.3.0. Please let me know if you enable any issues with implementation. Thanks again for your help!

@BettyCroft
Copy link
Contributor

Hi @msulprizio - as a post script, please let me know if you'd like me to provide a restart file with TOMAS tracers for dev/14.3.0 - I have ones readily available.

@msulprizio msulprizio linked a pull request Jan 17, 2024 that will close this issue
@yantosca yantosca added the topic: Benchmarking and Testing Related to CI, integration tests, or scientific benchmarking label Mar 29, 2024
@yantosca
Copy link
Contributor

yantosca commented Jun 5, 2024

We can close out this issue as TOMAS integration tests for GCClassic and GCHP now run successfully in 14.4.0.

@yantosca yantosca closed this as completed Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working never stale Never label this issue as stale topic: Benchmarking and Testing Related to CI, integration tests, or scientific benchmarking
Projects
None yet
4 participants