Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate errors in SetServices upwards #1002

Conversation

LiamBindle
Copy link
Contributor

@LiamBindle LiamBindle commented Aug 26, 2021

Description

I don't think errors in the GEOS-Chem gridded component's SetServices are getting caught properly. A failed _VERIFY in SetServices doesn't stop the simulation.

Is there supposed to be a _VERIFY(STATUS) and _VERIFY(SS_STATUS), where SS_STATUS is the userRC from ESMF_GridCompSetServices, in MAPL_AddChildFromMeta?

Related Issue

Motivation and Context

I'm working on improving the error reporting in GCHP. Errors related to the config files often result in hard crashes and the error messages can be hard to decipher. I believe this is because MAPL_AddChildFromMeta isn't verifying the userRC (STATUS from the grid comp's SetServices).

How Has This Been Tested?

I ran a GCHP simulation with an intentional error in the root config (GCHP.rc) file. The intentional error was what happened in geoschem/GCHP#134.

Log without this PR
 Starting pFIO input server on Clients
 Starting pFIO output server on Clients
          SHMEM: INFO: NumCores per Node = 6
          SHMEM: INFO: NumNodes in use   = 1
          SHMEM: INFO: Total PEs         = 6
          SHMEM: INFO: NumNodes in use  = 1
 Integer*4 Resource Parameter: HEARTBEAT_DT:600
 NOT using buffer I/O for file: cap_restart
 Character Resource Parameter: ROOT_CF:GCHP.rc
 Character Resource Parameter: ROOT_NAME:GCHP
 Character Resource Parameter: HIST_CF:HISTORY.rc
 Character Resource Parameter: MAPL_ENABLE_TIMERS:YES
 Character Resource Parameter: MAPL_ENABLE_MEMUTILS:YES
 Continuing after adding AtmosChemSetServices:           0           2
 Continuing after adding AtmosChemSetServices:           0           2
 Character Resource Parameter: DYCORE:OFF
 Continuing after adding AtmosChemSetServices:           0           2
 Continuing after adding AtmosChemSetServices:           0           2
 Continuing after adding AtmosChemSetServices:           0           2
 Continuing after adding AtmosChemSetServices:           0           2
pe=00005 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00000 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00001 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00002 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00003 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00004 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
NOTE from PE     0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to    32768.
&MPP_IO_NML
 HEADER_BUFFER_VAL=16384      ,
 GLOBAL_FIELD_ON_ROOT_PE=T,
 IO_CLOCKS_ON=F,
 SHUFFLE=0          ,
 DEFLATE_LEVEL=-1         ,
 CF_COMPLIANCE=F,
 /
NOTE from PE     0: MPP_IO_SET_STACK_SIZE: stack size set to     131072.
NOTE from PE     0: MPP_DOMAINS_SET_STACK_SIZE: stack size set to 20000000.
 Integer*4 Resource Parameter: NX:1
 Integer*4 Resource Parameter: NY:6
 Integer*4 Resource Parameter: IM:24
 Integer*4 Resource Parameter: JM:144
 Integer*4 Resource Parameter: RUN_DT:600
 Cubic: cubed-sphere domain decomposition
whalo =    3, ehalo =    3, shalo =    3, nhalo =    3
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    1, ehalo =    1, shalo =    1, nhalo =    1
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    3, ehalo =    3, shalo =    3, nhalo =    3
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    1, ehalo =    1, shalo =    1, nhalo =    1
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    3, ehalo =    3, shalo =    3, nhalo =    3
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    1, ehalo =    1, shalo =    1, nhalo =    1
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    3, ehalo =    3, shalo =    3, nhalo =    3
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    1, ehalo =    1, shalo =    1, nhalo =    1
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    3, ehalo =    3, shalo =    3, nhalo =    3
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    1, ehalo =    1, shalo =    1, nhalo =    1
  X-AXIS =   24
  Y-AXIS =   24
 For k_split (remapping)=           1
n_split is set to 02 for resolution-dt=0025x0025x6-  600.000
Using n_zfilter : 000
Using n_sponge : 001
Using non_ortho :       T
 Starting PEs :            6
 Starting Threads :            1
 Cubic: cubed-sphere domain decomposition
whalo =    3, ehalo =    3, shalo =    3, nhalo =    3
  X-AXIS =   24
  Y-AXIS =   24
 Cubic: cubed-sphere domain decomposition
whalo =    1, ehalo =    1, shalo =    1, nhalo =    1
  X-AXIS =   24
  Y-AXIS =   24
NOTE from PE     0: tracer_manager_init : No tracers are available to be registered.
NOTE from PE     0: tracer_manager_init : No tracers are available to be registered.
NOTE from PE     0: tracer_manager_init : No tracers are available to be registered.
NOTE from PE     0: tracer_manager_init : No tracers are available to be registered.
NOTE from PE     0: tracer_manager_init : No tracers are available to be registered.
 ncnst=           0  num_prog=           0  pnats=           0  dnats=           0  num_family=           0
 
 Grid distance at face edge (km)=   326768.43532825337     
 Corner interpolation coefficient=   1.4728933761817289     
 Corner interpolation coefficient=   1.4728933761817309     
 Corner interpolation coefficient=   1.4728933761817260     
 Corner interpolation coefficient=   1.4728933761817327     
 Corner interpolation coefficient=   1.4728933761817156     
RADIUS (m): 0.63710000000000E+07
PI: 0.31415926535898E+01
MAX    AREA (m*m): 0.21280966055028E+12          MIN AREA (m*m): 0.95959708752603E+11
GLOBAL AREA (m*m): 0.51006447190982E+15 IDEAL GLOBAL AREA (m*m): 0.51006447190979E+15
 
  Cubed-Sphere Grid Stats :           25 x          25 x           6
      Grid Length               : min: 326768.44 max: 461715.98 avg: 386515.92 min/max:      0.71
      Deviation from Orthogonal : min:      0.00 max:     28.13 avg:      8.60
      Aspect Ratio              : min:      1.00 max:      1.08 avg:      1.03
 
 Grid_init          72           1   100.00000000000000     
 Hybrid Sigma-P: minimum allowable surface pressure (hpa)=   14.888888888889058     
 Corner interpolation coefficient=   1.4728933761817253     
 da_max/da_min=   2.2176980663721220     
 da_max_c/da_min_c=   2.2606506684909289     
  
 Divergence damping Coefficients
 For small dt=   300.00000000000000     
 External mode del-2 (m**2/s)=   6284027.9018087257     
 Internal mode del-2 SMAG dimensionless coeff=   0.0000000000000000     
 Internal mode del-2 background diff=   0.0000000000000000     
 Internal mode del-4 background diff=  0.16000000000000000     
 Vorticity del-4 (m**4/s)=   0.0000000000000000     
 tracer del-2 diff=   0.0000000000000000     
 Vorticity del-4 (m**4/s)=   0.0000000000000000     
 beta=   0.0000000000000000     
  

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
pe=00002 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00002 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00002 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00002 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00002 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00002 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00002 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00002 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00002 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00002 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00003 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00003 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00003 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00003 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00003 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00003 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00003 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00003 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00003 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00003 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer INITIALIZE needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer GenInitTot needs to be set first
 ERROR: Timer --GenInitMine needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer INITIALIZE needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer GenInitTot needs to be set first
 ERROR: Timer --GenInitMine needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer INITIALIZE needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer GenInitTot needs to be set first
 ERROR: Timer --GenInitMine needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer INITIALIZE needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer GenInitTot needs to be set first
 ERROR: Timer --GenInitMine needs to be set first
                                                       Mem/Swap Used (MB) at GCHPctmEnvMAPL_GenericInitialize=  5.941E+04  2.000E+00
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer INITIALIZE needs to be set first
 ERROR: Timer TOTAL needs to be set first
 ERROR: Timer GenInitTot needs to be set first
 ERROR: Timer --GenInitMine needs to be set first
pe=00005 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00005 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00005 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00005 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00005 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00005 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00005 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00005 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00005 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00005 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00000 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00000 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00000 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00000 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00000 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00000 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00000 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00000 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00000 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00000 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00001 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00001 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00001 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00001 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00001 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00001 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00001 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00001 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
pe=00001 FAIL at line=00096    MAPL_Profiler.F90                        <unknown error>
pe=00001 FAIL at line=05217    MAPL_Generic.F90                         <status=-1>
#0  0x7f2c727c0dfd in ???
#1  0x7f2c727c0013 in ???
#2  0x7f2c71fba3af in ???
#3  0x7f2c720eec20 in ???
#4  0x7f2c729d7f5a in ???
#5  0x131e3d2 in __mapl_profmod_MOD_mapl_profclockon
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/base/MAPL_Profiler.F90:88
#6  0x138777c in __mapl_genericmod_MOD_mapl_genericstateclockon
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/base/MAPL_Generic.F90:5216
#7  0x42b1ad in initialize_
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/GCHP_GridComp/GEOSChem_GridComp/geos-chem/Interfaces/GCHP/Chem_GridCompMod.F90:1598
#8  0x1fc3a85 in ???
#9  0x1fc3d29 in ???
#10  0x1d2cdd5 in ???
#11  0x1fe1a40 in ???
#12  0x1fc275b in ???
#13  0x1b845ca in ???
#14  0x1f815a5 in ???
#15  0x1397568 in mapl_genericwrapper
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/base/MAPL_Generic.F90:1838
#16  0x1fc3a85 in ???
#17  0x1fc3d29 in ???
#18  0x1d2cdd5 in ???
#19  0x1fe1a40 in ???
#20  0x1fc275b in ???
#21  0x1b845ca in ???
#22  0x1f815a5 in ???
#23  0x139e9f4 in __mapl_genericmod_MOD_mapl_genericinitialize
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/base/MAPL_Generic.F90:1505
#24  0x41f257 in initialize
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/GCHP_GridComp/GCHP_GridCompMod.F90:332
#25  0x1fc3a85 in ???
#26  0x1fc3d29 in ???
#27  0x1d2cdd5 in ???
#28  0x1fe1a40 in ???
#29  0x1fc275b in ???
#30  0x1b845ca in ???
#31  0x1f815a5 in ???
#32  0x1397568 in mapl_genericwrapper
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/base/MAPL_Generic.F90:1838
#33  0x1fc3a85 in ???
#34  0x1fc3d29 in ???
#35  0x1d2cdd5 in ???
#36  0x1fe1a40 in ???
#37  0x1fc275b in ???
#38  0x1b845ca in ???
#39  0x1f815a5 in ???
#40  0xd9a07d in initialize_gc
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/gridcomps/Cap/MAPL_CapGridComp.F90:614
#41  0x1fc3a85 in ???
#42  0x1fc3d29 in ???
#43  0x1d2cdd5 in ???
#44  0x1fe1a40 in ???
#45  0x1fc275b in ???
#46  0x1b845ca in ???
#47  0x1f815a5 in ???
#48  0xd9631a in __mapl_capgridcompmod_MOD_initialize
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/gridcomps/Cap/MAPL_CapGridComp.F90:907
#49  0xd90536 in __mapl_capmod_MOD_run_model
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/gridcomps/Cap/MAPL_Cap.F90:244
#50  0xd8fe55 in __mapl_capmod_MOD_run_member
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/gridcomps/Cap/MAPL_Cap.F90:211
#51  0xd9007e in __mapl_capmod_MOD_run_ensemble
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/gridcomps/Cap/MAPL_Cap.F90:154
#52  0xd90161 in __mapl_capmod_MOD_run
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/MAPL/gridcomps/Cap/MAPL_Cap.F90:129
#53  0x41dafb in gchpctm_main
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/GCHPctm.F90:30
#54  0x41c248 in main
	at /scratch1/fs1/rvmartin/bs6/liam.bindle/temp/2021-08-26/GCHP/src/GCHPctm.F90:15
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 4 with PID 0 on node compute1-exec-4 exited on signal 11 (Segmentation fault).
Log after this PR
 Starting pFIO input server on Clients
 Starting pFIO output server on Clients
          SHMEM: INFO: NumCores per Node = 6
          SHMEM: INFO: NumNodes in use   = 1
          SHMEM: INFO: Total PEs         = 6
          SHMEM: INFO: NumNodes in use  = 1
 Integer*4 Resource Parameter: HEARTBEAT_DT:600
 NOT using buffer I/O for file: cap_restart
 Character Resource Parameter: ROOT_CF:GCHP.rc
 Character Resource Parameter: ROOT_NAME:GCHP
 Character Resource Parameter: HIST_CF:HISTORY.rc
 Character Resource Parameter: MAPL_ENABLE_TIMERS:YES
 Character Resource Parameter: MAPL_ENABLE_MEMUTILS:YES
pe=00003 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00003 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00003 FAIL at line=04714    MAPL_Generic.F90                         <status=8>
pe=00003 FAIL at line=00158    GCHP_GridCompMod.F90                     <status=8>
pe=00003 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00003 FAIL at line=00535    MAPL_CapGridComp.F90                     <status=8>
pe=00003 FAIL at line=00908    MAPL_CapGridComp.F90                     <status=8>
pe=00001 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00001 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00001 FAIL at line=04714    MAPL_Generic.F90                         <status=8>
pe=00005 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00005 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00005 FAIL at line=04714    MAPL_Generic.F90                         <status=8>
pe=00005 FAIL at line=00158    GCHP_GridCompMod.F90                     <status=8>
pe=00003 FAIL at line=00245    MAPL_Cap.F90                             <status=8>
pe=00003 FAIL at line=00211    MAPL_Cap.F90                             <status=8>
pe=00003 FAIL at line=00154    MAPL_Cap.F90                             <status=8>
pe=00003 FAIL at line=00129    MAPL_Cap.F90                             <status=8>
pe=00003 FAIL at line=00030    GCHPctm.F90                              <status=8>
pe=00000 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00000 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00000 FAIL at line=04714    MAPL_Generic.F90                         <status=8>
pe=00000 FAIL at line=00158    GCHP_GridCompMod.F90                     <status=8>
pe=00000 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00005 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00005 FAIL at line=00535    MAPL_CapGridComp.F90                     <status=8>
pe=00005 FAIL at line=00908    MAPL_CapGridComp.F90                     <status=8>
pe=00005 FAIL at line=00245    MAPL_Cap.F90                             <status=8>
pe=00005 FAIL at line=00211    MAPL_Cap.F90                             <status=8>
pe=00005 FAIL at line=00154    MAPL_Cap.F90                             <status=8>
pe=00005 FAIL at line=00129    MAPL_Cap.F90                             <status=8>
pe=00005 FAIL at line=00030    GCHPctm.F90                              <status=8>
pe=00002 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00002 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00002 FAIL at line=04714    MAPL_Generic.F90                         <status=8>
pe=00002 FAIL at line=00158    GCHP_GridCompMod.F90                     <status=8>
pe=00002 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00002 FAIL at line=00535    MAPL_CapGridComp.F90                     <status=8>
pe=00004 FAIL at line=00574    Chem_GridCompMod.F90                     <status=8>
pe=00004 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00004 FAIL at line=04714    MAPL_Generic.F90                         <status=8>
pe=00004 FAIL at line=00158    GCHP_GridCompMod.F90                     <status=8>
pe=00001 FAIL at line=00158    GCHP_GridCompMod.F90                     <status=8>
pe=00001 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00001 FAIL at line=00535    MAPL_CapGridComp.F90                     <status=8>
pe=00001 FAIL at line=00908    MAPL_CapGridComp.F90                     <status=8>
pe=00001 FAIL at line=00245    MAPL_Cap.F90                             <status=8>
pe=00001 FAIL at line=00211    MAPL_Cap.F90                             <status=8>
pe=00001 FAIL at line=00154    MAPL_Cap.F90                             <status=8>
pe=00001 FAIL at line=00129    MAPL_Cap.F90                             <status=8>
pe=00001 FAIL at line=00030    GCHPctm.F90                              <status=8>
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
pe=00000 FAIL at line=00535    MAPL_CapGridComp.F90                     <status=8>
pe=00000 FAIL at line=00908    MAPL_CapGridComp.F90                     <status=8>
pe=00000 FAIL at line=00245    MAPL_Cap.F90                             <status=8>
pe=00000 FAIL at line=00211    MAPL_Cap.F90                             <status=8>
pe=00000 FAIL at line=00154    MAPL_Cap.F90                             <status=8>
pe=00000 FAIL at line=00129    MAPL_Cap.F90                             <status=8>
pe=00000 FAIL at line=00030    GCHPctm.F90                              <status=8>
pe=00002 FAIL at line=00908    MAPL_CapGridComp.F90                     <status=8>
pe=00002 FAIL at line=00245    MAPL_Cap.F90                             <status=8>
pe=00002 FAIL at line=00211    MAPL_Cap.F90                             <status=8>
pe=00002 FAIL at line=00154    MAPL_Cap.F90                             <status=8>
pe=00002 FAIL at line=00129    MAPL_Cap.F90                             <status=8>
pe=00004 FAIL at line=04678    MAPL_Generic.F90                         <status=8>
pe=00004 FAIL at line=00535    MAPL_CapGridComp.F90                     <status=8>
pe=00004 FAIL at line=00908    MAPL_CapGridComp.F90                     <status=8>
pe=00004 FAIL at line=00245    MAPL_Cap.F90                             <status=8>
pe=00004 FAIL at line=00211    MAPL_Cap.F90                             <status=8>
pe=00004 FAIL at line=00154    MAPL_Cap.F90                             <status=8>
pe=00004 FAIL at line=00129    MAPL_Cap.F90                             <status=8>
pe=00004 FAIL at line=00030    GCHPctm.F90                              <status=8>
pe=00002 FAIL at line=00030    GCHPctm.F90                              <status=8>
[compute1-exec-4.ris.wustl.edu:09224] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[compute1-exec-4.ris.wustl.edu:09224] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[compute1-exec-4.ris.wustl.edu:09224] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[compute1-exec-4.ris.wustl.edu:09224] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[compute1-exec-4.ris.wustl.edu:09224] PMIX ERROR: UNREACHABLE in file server/pmix_server.c at line 1741
[compute1-exec-4.ris.wustl.edu:09224] 5 more processes have sent help message help-mpi-api.txt / mpi-abort
[compute1-exec-4.ris.wustl.edu:09224] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Trivial change (affects only documentation or cleanup)

Checklist:

  • I have tested this change with a run of GEOSgcm (if non-trivial)
  • I have added one of the required labels (0 diff, 0 diff trivial, 0 diff structural, non 0-diff)
  • I have updated the CHANGELOG.md accordingly following the style of Keep a Changelog

…Services return codes so that errors in a GridComp's SetServices are reported properly.
@LiamBindle LiamBindle requested a review from a team as a code owner August 26, 2021 21:22
@LiamBindle
Copy link
Contributor Author

(Sorry, still working on getting that CLA signed. Just submitting this here and now because I'm also fixing it in GCHP right now.)

Copy link
Collaborator

@tclune tclune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'm surprised this has not been found before - I guess SetServices tends to have few/easy bugs.

@mathomp4 mathomp4 added the 🚫 Contingent - DNA Do Not Approve (DNA). These changes are contingent on other PRs label Aug 27, 2021
@mathomp4
Copy link
Member

Blocking for CLA. Yay lawyers!

@LiamBindle
Copy link
Contributor Author

I'll ping the admins again today and see if they've figured out who need to sign it. Sorry again about the delay with that!

@tclune
Copy link
Collaborator

tclune commented Aug 27, 2021

No need to apologize. Delays are expected for this sort of thing. Might take weeks of back and forth once the right lawyer is involved. They may want to contact the NASA lawyer, etc.

@mathomp4
Copy link
Member

@LiamBindle A big MAPL Generic refactoring just went into develop. As GitHub is informing you, conflicts have happened, though hopefully it's pretty simple to figure out how to transition. If not, @Gvilla1000-nasa can help as he did the refactoring.

@LiamBindle
Copy link
Contributor Author

@mathomp4 Thanks for letting me know. If the refactoring includes a fix so SetServices errors are propagated, we can close this.

@stale
Copy link

stale bot commented Nov 14, 2021

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the ❄️ Stale This issue has been marked stale label Nov 14, 2021
@tclune
Copy link
Collaborator

tclune commented Nov 15, 2021

This issue should not go stale. It's an "easy" one that should be addressed.

@stale stale bot removed the ❄️ Stale This issue has been marked stale label Nov 15, 2021
@weiyuan-jiang
Copy link
Contributor

This issue was resolved. Do not need anymore

@tclune
Copy link
Collaborator

tclune commented Dec 2, 2021

@LiamBindle If you get a chance - please confirm that we've fixed this separately. (Came up as in issue independently.)

@stale
Copy link

stale bot commented Jan 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

@stale stale bot added the ❄️ Stale This issue has been marked stale label Jan 31, 2022
@mathomp4
Copy link
Member

Making a comment to satisfy Stale Bot. @LiamBindle when you have a chance: is this taken care of already as @tclune and @weiyuan-jiang suggest?

@stale stale bot removed the ❄️ Stale This issue has been marked stale label Jan 31, 2022
@mathomp4
Copy link
Member

@LiamBindle What's the status of your CLA signing capability? Would be nice to pull these in

@LiamBindle
Copy link
Contributor Author

Hey @mathomp4, sorry I missed this. I didn't see it.

I gave up on the CLA process. It got lost between lawyers, and it turned into a real headache. I'd suggest fixing it in the offical code and not merging this request.

@LiamBindle LiamBindle deleted the bugfix/upstream/catch-errors-during-setservices branch September 7, 2022 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚫 Contingent - DNA Do Not Approve (DNA). These changes are contingent on other PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants