Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible MAPL Bug with SCM Debug #1549

Closed
mathomp4 opened this issue Jun 7, 2022 · 4 comments · Fixed by #1551
Closed

Possible MAPL Bug with SCM Debug #1549

mathomp4 opened this issue Jun 7, 2022 · 4 comments · Fixed by #1551
Assignees
Labels
🪲 Bugfix This fixes a bug!

Comments

@mathomp4
Copy link
Member

mathomp4 commented Jun 7, 2022

@sanAkel discovered what I think is a bug in MAPL when you run the SCM with debugging options on. Maybe it's GOCART, but I think it's more of an uncovered case in MAPL.

To wit, when you run the arm_97jul case, you get a traceback:

forrtl: severe (408): fort: (2): Subscript #1 of the array COORDS has value 2 which is greater than the upper bound of 1

Image              PC                Routine            Line        Source
GEOSgcm.x          000000000F45324F  Unknown               Unknown  Unknown
libMAPL.base.so    00002B2401248782  mapl_base.base_im        3174  Base_Base_implementation.F90
libMAPL.base.so    00002B240124500C  mapl_base_mp_mapl        3103  Base_Base_implementation.F90
GEOSgcm.x          000000000115FE88  su2g_gridcompmod_         865  SU2G_GridCompMod.F90
GEOSgcm.x          000000000115A2CD  su2g_gridcompmod_         710  SU2G_GridCompMod.F90
libesmf.so         00002B23EDDA39E4  _ZN5ESMCI6FTable1     Unknown  Unknown
libesmf.so         00002B23EDDA752F  ESMCI_FTableCallE     Unknown  Unknown
...

The call tree from SU2G is this:

       call MAPL_GetHorzIJIndex(self%nVolc, iPointVolc, jPointVolc, &
                                grid = grid,               &
                                lon  = self%vLon/real(MAPL_RADIANS_TO_DEGREES), &
                                lat  = self%vLat/real(MAPL_RADIANS_TO_DEGREES), &
                                rc   = status)

which goes to:

call calc_edges_1d(elons,lons_1d,IM)

and then:

subroutine calc_edges_1d(ecoords,coords,idim)
integer, intent(in) :: idim
real(ESMF_KIND_R8), intent(in) :: coords(idim)
real(ESMF_KIND_R8), intent(out) :: ecoords(idim+1)
ecoords(1) = coords(1) - 0.5 * ( coords(2) - coords(1) )
ecoords(2:idim) = 0.5 * ( coords(1:idim-1)+coords(2:idim) )
ecoords(idim+1) = coords(idim) + 0.5 * (coords(idim) - coords(idim-1))
return
end subroutine calc_edges_1d

My guess is we might need a sort of "short-circuit" for single column in MAPL_GetHorzIJIndex when it goes through the lat-lon code. It's not a "standard" but I think all SCM cases sort of "hardcode" the latlon of their point in the gridname a la:

      AGCM.GRIDNAME: XY1x1-C_34N_100W_38N_95W

I'm guessing when MAPL makes the 1x1 grid it doesn't care...I can't see any reason why it would. But maybe we could figure out a way to pull it. Either make the SCM gridname a standard and extract from it, or have a bit of "If grid is 1x1, then we require AGCM.LATS and AGCM.LONS or something?"

I defer to @tclune, @bena-nasa and @atrayano the best way to support this.

@mathomp4 mathomp4 added the 🪲 Bugfix This fixes a bug! label Jun 7, 2022
@bena-nasa
Copy link
Collaborator

bena-nasa commented Jun 7, 2022

I'll fix this. Maybe the refactoring we did to horzij broke this. This is going through the non cubed-sphere path which well we did not have a use case for other than this since we don't run lat-lon.

@bena-nasa
Copy link
Collaborator

found the problem, will commit a hotfix on main here once I just do a check this doesn't affect the model (it shouldn't)

@bena-nasa
Copy link
Collaborator

bena-nasa commented Jun 7, 2022

@sanAkel, don't jump for joy too much, once I fixed this, my scm case crashed on the very first step in the turbulence code when using the the debug build. I'll open an issue in gcm just for the record so we don't forget and to have a record, but at the first glance it's beyond me what could be wrong. The release build runs (according to @mathomp4 is always did) but something is not happy with the debug build so don't know if this points to a lurky problem or something else that is benign.

@sanAkel
Copy link

sanAkel commented Jun 7, 2022

Thanks! @bena-nasa.

@mathomp4 suggested I test other test cases.

Would you both like me to try them? If so, if debug or release?

Maybe for a few of them if you could try both debug and release (wait for my MAPL bug fix). I guess the question is, is that crash endemic of the scm in general or perhaps it is just the restarts for that particular case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 Bugfix This fixes a bug!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants