Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SST crashes with LibFabric #1728

Closed
khuck opened this issue Sep 9, 2019 · 12 comments
Closed

SST crashes with LibFabric #1728

khuck opened this issue Sep 9, 2019 · 12 comments
Assignees

Comments

@khuck
Copy link
Collaborator

khuck commented Sep 9, 2019

ADIOS2 was configured on a 36-core Linux workstation with the following cmake output:


Currently Loaded Modules:
  1) gcc/8.1   2) mpi/openmpi-4.0.1_gcc-8.1   3) cmake/3.15.1   4) python/3.6.8

++ which mpicc
++ which mpic++
++ which mpif90
+ cmake -DCMAKE_C_COMPILER=/packages/openmpi/4.0.1-gcc8.1/bin/mpicc -DCMAKE_CXX_COMPILER=/packages/openmpi/4.0.1-gcc8.1/bin/mpic++ -DCMAKE_Fortran_COMPILER=/packages/openmpi/4.0.1-gcc8.1/bin/mpif90 -DADIOS2_USE_Python=ON -DCMAKE_INSTALL_PREFIX=/home/users/khuck/src/ADIOS2/install_mpi -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
-- The C compiler identification is GNU 8.1.0
-- The CXX compiler identification is GNU 8.1.0
-- Check for working C compiler: /packages/openmpi/4.0.1-gcc8.1/bin/mpicc
-- Check for working C compiler: /packages/openmpi/4.0.1-gcc8.1/bin/mpicc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /packages/openmpi/4.0.1-gcc8.1/bin/mpic++
-- Check for working CXX compiler: /packages/openmpi/4.0.1-gcc8.1/bin/mpic++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find Blosc (missing: BLOSC_LIBRARY BLOSC_INCLUDE_DIR) 
-- Could NOT find BZip2 (missing: BZIP2_LIBRARIES BZIP2_INCLUDE_DIR) 
-- Could NOT find ZFP (missing: ZFP_LIBRARY ZFP_INCLUDE_DIR) 
-- Could NOT find SZ (missing: SZ_LIBRARY ZLIB_LIBRARY ZSTD_LIBRARY SZ_INCLUDE_DIR) 
-- Could NOT find MGARD (missing: MGARD_LIBRARY ZLIB_LIBRARY MGARD_INCLUDE_DIR) 
-- Found ZLIB: /usr/lib64/libz.so (found version "1.2.7") 
-- Could NOT find PNG: Found unsuitable version "1.5.13", but required is at least "1.6.0" (found /usr/lib64/libpng.so)
-- The Fortran compiler identification is GNU 8.1.0
-- Check for working Fortran compiler: /packages/openmpi/4.0.1-gcc8.1/bin/mpif90
-- Check for working Fortran compiler: /packages/openmpi/4.0.1-gcc8.1/bin/mpif90  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /packages/openmpi/4.0.1-gcc8.1/bin/mpif90 supports Fortran 90
-- Checking whether /packages/openmpi/4.0.1-gcc8.1/bin/mpif90 supports Fortran 90 -- yes
-- Found MPI_C: /packages/openmpi/4.0.1-gcc8.1/bin/mpicc (found version "3.1") 
-- Found MPI_CXX: /packages/openmpi/4.0.1-gcc8.1/bin/mpic++ (found version "3.1") 
-- Found MPI_Fortran: /packages/openmpi/4.0.1-gcc8.1/bin/mpif90 (found version "3.1") 
-- Found MPI: TRUE (found version "3.1") found components:  C Fortran CXX 
-- Found ZeroMQ: /usr/lib64/libzmq.so (found suitable version "4.1.4", minimum required is "4.1") 
-- Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS C) (found version "")
-- Found PythonInterp: /packages/python/3.6.8/bin/python3 (found version "3.6.8") 
-- Found PythonLibs: /packages/python/3.6.8/lib/libpython3.6m.so (found version "3.6.8") 
-- Found PythonModule_numpy: /packages/python/3.6.8/lib/python3.6/site-packages/numpy  
-- Found PythonModule_mpi4py: /home/users/khuck/.local/lib/python3.6/site-packages/mpi4py  
-- Found PythonFull: /packages/python/3.6.8/bin/python3  found components:  Interp Libs numpy mpi4py 
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.27.1") 
-- Checking for module 'libfabric'
--   Found libfabric, version 1.6.1
-- Found LIBFABRIC: /usr/lib64/libfabric.so (Required is at least version "1.6") 
-- Checking for module 'cray-drc'
--   No package 'cray-drc' found
-- Could NOT find CrayDRC (missing: CrayDRC_LIBRARIES) 
-- Looking for shmget
-- Looking for shmget - found
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  

-- ADIOS2 ThirdParty: Configuring KWSys
-- Checking whether header cstdio is available
-- Checking whether header cstdio is available - yes
-- Checking for Large File Support
-- Checking for Large File Support - yes
-- Checking whether C++ compiler has 'long long'
-- Checking whether C++ compiler has 'long long' - yes
-- Checking whether C++ compiler has '__int64'
-- Checking whether C++ compiler has '__int64' - no
-- Checking whether wstring is available
-- Checking whether wstring is available - yes
-- Checking whether C compiler has ptrdiff_t in stddef.h
-- Checking whether C compiler has ptrdiff_t in stddef.h - yes
-- Checking whether C compiler has ssize_t in unistd.h
-- Checking whether C compiler has ssize_t in unistd.h - yes
-- Checking whether CXX compiler has setenv
-- Checking whether CXX compiler has setenv - yes
-- Checking whether CXX compiler has unsetenv
-- Checking whether CXX compiler has unsetenv - yes
-- Checking whether CXX compiler has environ in stdlib.h
-- Checking whether CXX compiler has environ in stdlib.h - no
-- Checking whether CXX compiler has utimes
-- Checking whether CXX compiler has utimes - yes
-- Checking whether CXX compiler has utimensat
-- Checking whether CXX compiler has utimensat - yes
-- Checking whether CXX compiler struct stat has st_mtim member
-- Checking whether CXX compiler struct stat has st_mtim member - yes
-- Checking whether CXX compiler struct stat has st_mtimespec member
-- Checking whether CXX compiler struct stat has st_mtimespec member - no
-- Checking whether <ext/stdio_filebuf.h> is available
-- Checking whether <ext/stdio_filebuf.h> is available - yes

-- ADIOS2 ThirdParty: Configuring GTest

-- ADIOS2 ThirdParty: Configuring pybind11
-- Found PythonLibs: /packages/python/3.6.8/lib/libpython3.6m.so
-- pybind11 v2.2.4

-- ADIOS2 ThirdParty: Configuring pugixml

-- ADIOS2 ThirdParty: Configuring nlohmann_json

-- ADIOS2 ThirdParty: Configuring atl
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of double
-- Check size of double - done
-- Check size of float
-- Check size of float - done
-- Check size of int
-- Check size of int - done
-- Check size of short
-- Check size of short - done
-- Looking for include file malloc.h
-- Looking for include file malloc.h - found
-- Looking for include file unistd.h
-- Looking for include file unistd.h - found
-- Looking for include file stdlib.h
-- Looking for include file stdlib.h - found
-- Looking for include file string.h
-- Looking for include file string.h - found
-- Looking for include file sys/time.h
-- Looking for include file sys/time.h - found
-- Looking for include file windows.h
-- Looking for include file windows.h - not found
-- Looking for fork
-- Looking for fork - found
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found version "2.2.1") 

-- ADIOS2 ThirdParty: Configuring dill
-- Check size of void*
-- Check size of void* - done
-- Check size of long
-- Check size of long - done
-- Check if the system is big endian
-- Searching 16 bit integer
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- Checking for module 'libffi'
--   Found libffi, version 3.0.13
-- Found LibFFI: -lffi  
-- Enabling emulation
-- Looking for include file stdarg.h
-- Looking for include file stdarg.h - found
-- Looking for include file memory.h
-- Looking for include file memory.h - found
-- Found dill: /home/users/khuck/src/ADIOS2/build/thirdparty/dill/dill/dill-config.cmake (found version "2.4.0") 

-- ADIOS2 ThirdParty: Configuring ffs
-- Check size of off_t
-- Check size of off_t - done
-- Check size of long double
-- Check size of long double - done
-- Check size of long long
-- Check size of long long - done
-- Check size of size_t
-- Check size of size_t - done
-- Looking for socket
-- Looking for socket - found
-- Found BISON: /usr/bin/bison (found version "3.0.4") 
-- Found FLEX: /usr/bin/flex (found version "2.5.37") 
-- Found dill: /home/users/khuck/src/ADIOS2/build/thirdparty/dill/dill/dill-config.cmake (found suitable version "2.4.0", minimum required is "2.3.1") 
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found suitable version "2.2.1", minimum required is "2.2.1") 
-- Looking for netdb.h
-- Looking for netdb.h - found
-- Looking for sockLib.h
-- Looking for sockLib.h - not found
-- Looking for sys/select.h
-- Looking for sys/select.h - found
-- Looking for sys/socket.h
-- Looking for sys/socket.h - found
-- Looking for sys/times.h
-- Looking for sys/times.h - found
-- Looking for sys/uio.h
-- Looking for sys/uio.h - found
-- Looking for sys/un.h
-- Looking for sys/un.h - found
-- Looking for winsock.h
-- Looking for winsock.h - not found
-- Looking for strtof
-- Looking for strtof - found
-- Looking for strtod
-- Looking for strtod - found
-- Looking for strtold
-- Looking for strtold - found
-- Looking for getdomainname
-- Looking for getdomainname - found
-- Check size of struct iovec
-- Check size of struct iovec - done
-- Performing Test HAS_IOV_BASE_IOVEC
-- Performing Test HAS_IOV_BASE_IOVEC - Success
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found version "2.2.1") 
-- Found ffs: /home/users/khuck/src/ADIOS2/build/thirdparty/ffs/ffs/ffs-config.cmake (found version "1.6.0") 

-- ADIOS2 ThirdParty: Configuring enet
-- Looking for getaddrinfo
-- Looking for getaddrinfo - found
-- Looking for getnameinfo
-- Looking for getnameinfo - found
-- Looking for gethostbyaddr_r
-- Looking for gethostbyaddr_r - found
-- Looking for gethostbyname_r
-- Looking for gethostbyname_r - found
-- Looking for poll
-- Looking for poll - found
-- Looking for fcntl
-- Looking for fcntl - found
-- Looking for inet_pton
-- Looking for inet_pton - found
-- Looking for inet_ntop
-- Looking for inet_ntop - found
-- Performing Test HAS_MSGHDR_FLAGS
-- Performing Test HAS_MSGHDR_FLAGS - Success
-- Performing Test HAS_SOCKLEN_T
-- Performing Test HAS_SOCKLEN_T - Success
-- Found enet: /home/users/khuck/src/ADIOS2/build/thirdparty/enet/enet/enet-config.cmake (found version "1.3.14") 

-- ADIOS2 ThirdParty: Configuring EVPath
-- Performing Test HAVE_MATH
-- Performing Test HAVE_MATH - Failed
-- Performing Test HAVE_LIBM_MATH
-- Performing Test HAVE_LIBM_MATH - Success
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found suitable version "2.2.1", minimum required is "2.2.1") 
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found version "2.2.1") 
-- Found ffs: /home/users/khuck/src/ADIOS2/build/thirdparty/ffs/ffs/ffs-config.cmake (found suitable version "1.6.0", minimum required is "1.5.1") 
-- Could NOT find nvml (missing: NVML_INCLUDE_DIR) 
-- Looking for clock_gettime
-- Looking for clock_gettime - found
-- Found enet: /home/users/khuck/src/ADIOS2/build/thirdparty/enet/enet/enet-config.cmake (found suitable version "1.3.14", minimum required is "1.3.13") 
--  - Udt4 library was not found.  This is not a fatal error, just that the Udt4 transport will not be built.
-- Found LIBFABRIC: /usr/lib64/libfabric.so  
-- Looking for ibv_create_qp
-- Looking for ibv_create_qp - not found
-- Looking for ibv_create_qp in ibverbs
-- Looking for ibv_create_qp in ibverbs - found
-- Found IBVERBS: ibverbs  
-- Could NOT find nnti (missing: NNTI_INCLUDE_DIR NNTI_trios_nnti_LIBRARY NNTI_trios_support_LIBRARY) 
-- Looking for hostlib.h
-- Looking for hostlib.h - not found
-- Looking for sys/sockio.h
-- Looking for sys/sockio.h - not found
-- Performing Test HAVE_FDS_BITS
-- Performing Test HAVE_FDS_BITS - Failed
-- Looking for writev
-- Looking for writev - found
-- Looking for uname
-- Looking for uname - found
-- Looking for getloadavg
-- Looking for getloadavg - found
-- Looking for gettimeofday
-- Looking for gettimeofday - found
-- Looking for getifaddrs
-- Looking for getifaddrs - found
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found suitable version "2.2.1", minimum required is "2.2.1") 
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found version "2.2.1") 
-- Found ffs: /home/users/khuck/src/ADIOS2/build/thirdparty/ffs/ffs/ffs-config.cmake (found suitable version "1.6.0", minimum required is "1.6.0") 
-- Found EVPath: /home/users/khuck/src/ADIOS2/build/thirdparty/EVPath/EVPath/EVPathConfigCommon.cmake (found version "4.4.0") 
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found suitable version "2.2.1", minimum required is "2.2.1") 
-- Found atl: /home/users/khuck/src/ADIOS2/build/thirdparty/atl/atl/atl-config.cmake (found version "2.2.1") 

-- Looking for rdma/fi_ext_gni.h
-- Looking for rdma/fi_ext_gni.h - not found
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- LTO enabled
-- Detecting Fortran/C Interface
-- Detecting Fortran/C Interface - Found GLOBAL and MODULE mangling
-- Verifying Fortran/CXX Compiler Compatibility
-- Verifying Fortran/CXX Compiler Compatibility - Success
-- Found MPI: TRUE (found version "3.1") found components:  C 

ADIOS2 build configuration:
  ADIOS Version: 2.4.0
  C++ Compiler : GNU 8.1.0 
    /packages/openmpi/4.0.1-gcc8.1/bin/mpic++

  Fortran Compiler : GNU 8.1.0 
    /packages/openmpi/4.0.1-gcc8.1/bin/mpif90

  Installation prefix: /home/users/khuck/src/ADIOS2/install_mpi
        bin: bin
        lib: lib64
    include: include
      cmake: lib64/cmake/adios2
     python: lib64/python3.6/site-packages

  Features:
    Library Type: shared
    Build Type:   RelWithDebInfo
    Testing: ON
    Build Options:
      Blosc    : OFF
      BZip2    : OFF
      ZFP      : OFF
      SZ       : OFF
      MGARD    : OFF
      PNG      : OFF
      MPI      : ON
      DataMan  : ON
      SSC      : ON
      SST      : ON
      ZeroMQ   : ON
      HDF5     : OFF
      Python   : ON
      Fortran  : ON
      SysVShMem: ON
      Profiling: ON
      Endian_Reverse: OFF
    RDMA Transport for Staging: Available

-- Configuring done
-- Generating done
-- Build files have been written to: /home/users/khuck/src/ADIOS2/build

When trying to run the heatTransfer example on this workstation with SST, the following crash happened (similar/same crash happens without the --mca arguments):

mpirun --mca btl_openib_allow_ib true --mca btl_openib_warn_default_gid_prefix 0 -n 16 ./heatSimulation sim.bp 4 4 64 64 100 10 : -n 4 ./heatAnalysis sim.bp analysis.bp 2 2
Process decomposition  : 4 x 4
Array size per process : 64 x 64
Number of output steps : 100
Iterations per step    : 10
Using SST engine for input
Using BP4 engine for output
[delphi:53491] *** Process received signal ***
[delphi:53491] Signal: Segmentation fault (11)
[delphi:53491] Signal code: Address not mapped (1)
[delphi:53491] Failing at address: 0x10
[delphi:53491] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7f2fc755c5d0]
[delphi:53491] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7f2fc6d00080]
[delphi:53491] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7f2fc6d001ec]
[delphi:53491] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7f2fc6cf6ba6]
[delphi:53491] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7f2fc8a14827]
[delphi:53491] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7f2fc871b003]
[delphi:53491] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7f2fc8a7c3d3]
[delphi:53491] [ 7] ./heatSimulation[0x40fb36]
[delphi:53491] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53491] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f2fc71a23d5]
[delphi:53491] [10] ./heatSimulation[0x40b71f]
[delphi:53491] *** End of error message ***
[delphi:53489] *** Process received signal ***
[delphi:53489] Signal: Segmentation fault (11)
[delphi:53489] Signal code: Address not mapped (1)
[delphi:53489] Failing at address: 0x10
[delphi:53489] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7f115d8045d0]
[delphi:53489] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7f115cfa8080]
[delphi:53489] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7f115cfa81ec]
[delphi:53489] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7f115cf9eba6]
[delphi:53489] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7f115ecbc827]
[delphi:53489] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7f115e9c3003]
[delphi:53489] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7f115ed243d3]
[delphi:53489] [ 7] ./heatSimulation[0x40fb36]
[delphi:53489] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53489] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f115d44a3d5]
[delphi:53489] [10] ./heatSimulation[0x40b71f]
[delphi:53489] *** End of error message ***
[delphi:53482] *** Process received signal ***
[delphi:53482] Signal: Segmentation fault (11)
[delphi:53482] Signal code: Address not mapped (1)
[delphi:53482] Failing at address: 0x10
[delphi:53482] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7f63ceddf5d0]
[delphi:53482] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7f63ce583080]
[delphi:53482] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7f63ce5831ec]
[delphi:53482] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7f63ce579ba6]
[delphi:53482] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7f63d0297827]
[delphi:53482] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7f63cff9e003]
[delphi:53482] [ 6] [delphi:53486] *** Process received signal ***
[delphi:53486] Signal: Segmentation fault (11)
[delphi:53486] Signal code: Address not mapped (1)
[delphi:53486] Failing at address: 0x10
/storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7f63d02ff3d3]
[delphi:53482] [ 7] ./heatSimulation[0x40fb36]
[delphi:53482] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53482] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63cea253d5]
[delphi:53482] [10] ./heatSimulation[0x40b71f]
[delphi:53482] *** End of error message ***
[delphi:53484] *** Process received signal ***
[delphi:53484] Signal: Segmentation fault (11)
[delphi:53484] Signal code: Address not mapped (1)
[delphi:53484] Failing at address: 0x10
[delphi:53486] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fe2f7e8b5d0]
[delphi:53486] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7fe2f762f080]
[delphi:53486] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7fe2f762f1ec]
[delphi:53486] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7fe2f7625ba6]
[delphi:53486] [ 4] [delphi:53487] *** Process received signal ***
[delphi:53487] Signal: Segmentation fault (11)
[delphi:53487] Signal code: Address not mapped (1)
[delphi:53487] Failing at address: 0x10
[delphi:53487] [ 0] [delphi:53496] *** Process received signal ***
[delphi:53496] Signal: Segmentation fault (11)
[delphi:53496] Signal code: Address not mapped (1)
[delphi:53496] Failing at address: 0x10
[delphi:53496] [ 0] [delphi:53498] *** Process received signal ***
[delphi:53498] Signal: Segmentation fault (11)
[delphi:53498] Signal code: Address not mapped (1)
[delphi:53498] Failing at address: 0x10
[delphi:53484] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fa4407215d0]
[delphi:53484] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7fa43fec5080]
[delphi:53484] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7fa43fec51ec]
[delphi:53484] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7fa43febbba6]
[delphi:53484] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7fa441bd9827]
[delphi:53484] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7fa4418e0003]
[delphi:53484] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7fa441c413d3]
[delphi:53484] [ 7] ./heatSimulation[0x40fb36]
[delphi:53484] [ 8] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7fe2f9343827]
[delphi:53486] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7fe2f904a003]
[delphi:53486] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7fe2f93ab3d3]
[delphi:53486] [ 7] ./heatSimulation[0x40fb36]
[delphi:53486] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53486] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2f7ad13d5]
[delphi:53486] [10] ./heatSimulation[0x40b71f]
[delphi:53486] *** End of error message ***
/lib64/libpthread.so.0(+0xf5d0)[0x7efcb0ff15d0]
[delphi:53487] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7efcb0795080]
[delphi:53487] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7efcb07951ec]
[delphi:53487] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7efcb078bba6]
[delphi:53487] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7efcb24a9827]
[delphi:53487] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7efcb21b0003]
[delphi:53487] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7efcb25113d3]
[delphi:53487] [ 7] ./heatSimulation[0x40fb36]
[delphi:53487] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53487] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7efcb0c373d5]
[delphi:53487] [10] ./heatSimulation[0x40b71f]
[delphi:53487] *** End of error message ***
/lib64/libpthread.so.0(+0xf5d0)[0x7fa144f785d0]
[delphi:53496] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7fa14471c080]
[delphi:53496] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7fa14471c1ec]
[delphi:53496] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7fa144712ba6]
[delphi:53496] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7fa146430827]
[delphi:53496] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7fa146137003]
[delphi:53496] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7fa1464983d3]
[delphi:53496] [ 7] ./heatSimulation[0x40fb36]
[delphi:53496] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53496] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa144bbe3d5]
[delphi:53496] [10] [delphi:53498] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7f124180d5d0]
[delphi:53498] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7f1240fb1080]
[delphi:53498] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x172ff)[0x7f1240fb12ff]
[delphi:53498] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstReaderOpen+0xaa)[0x7f1240fa49fa]
[delphi:53498] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstReaderC1ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x161)[0x7f1242cb2be1]
[delphi:53498] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x71e)[0x7f12429cc12e]
[delphi:53498] [ 6] ./heatSimulation[0x40b71f]
[delphi:53496] *** End of error message ***
./heatSimulation[0x40b4ef]
[delphi:53484] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fa4403673d5]
[delphi:53484] [10] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7f1242d2d3d3]
[delphi:53498] [ 7] ./heatAnalysis[0x40a827]
[delphi:53498] [ 8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f12414533d5]
[delphi:53498] [ 9] ./heatAnalysis[0x40ba5f]
[delphi:53498] *** End of error message ***
./heatSimulation[0x40b71f]
[delphi:53484] *** End of error message ***
[delphi:53499] *** Process received signal ***
[delphi:53499] Signal: Segmentation fault (11)
[delphi:53499] Signal code: Address not mapped (1)
[delphi:53499] Failing at address: 0x10
[delphi:53499] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7f1b289265d0]
[delphi:53499] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7f1b280ca080]
[delphi:53499] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x172ff)[0x7f1b280ca2ff]
[delphi:53499] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstReaderOpen+0xaa)[0x7f1b280bd9fa]
[delphi:53499] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstReaderC1ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x161)[0x7f1b29dcbbe1]
[delphi:53499] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x71e)[0x7f1b29ae512e]
[delphi:53499] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7f1b29e463d3]
[delphi:53499] [ 7] ./heatAnalysis[0x40a827]
[delphi:53499] [ 8] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f1b2856c3d5]
[delphi:53499] [ 9] ./heatAnalysis[0x40ba5f]
[delphi:53499] *** End of error message ***
[delphi:53485] *** Process received signal ***
[delphi:53485] Signal: Segmentation fault (11)
[delphi:53485] Signal code: Address not mapped (1)
[delphi:53485] Failing at address: 0x10
[delphi:53485] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7fc7d7b995d0]
[delphi:53485] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17080)[0x7fc7d733d080]
[delphi:53485] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x171ec)[0x7fc7d733d1ec]
[delphi:53485] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7fc7d7333ba6]
[delphi:53485] [ 4] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x167)[0x7fc7d9051827]
[delphi:53485] [ 5] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x5f3)[0x7fc7d8d58003]
[delphi:53485] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7fc7d90b93d3]
[delphi:53485] [ 7] ./heatSimulation[0x40fb36]
[delphi:53485] [ 8] ./heatSimulation[0x40b4ef]
[delphi:53485] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fc7d77df3d5]
[delphi:53485] [10] ./heatSimulation[0x40b71f]
[delphi:53485] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 7 with PID 0 on node delphi exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

And the backtrace from one of the ranks:

[khuck@delphi cpp]$ gdb ./heatSimulation core.56907
ImportError: No module named site
[khuck@delphi cpp]$ module unload python
[khuck@delphi cpp]$ gdb ./heatSimulation core.56907
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /storage/users/khuck/src/adiosvm/Tutorial/heat2d/cpp/heatSimulation...done.
[New LWP 56907]
[New LWP 56929]
[New LWP 57462]
[New LWP 56941]
[New LWP 57851]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
btCore was generated by `./heatSimulation sim.bp 4 4 64 64 100 10'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fef071c3080 in fi_ep_bind (flags=0, bfid=0x157b120, ep=0x0) at /usr/include/rdma/fi_endpoint.h:168
168		return ep->fid.ops->bind(&ep->fid, bfid, flags);
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.172-2.el7.x86_64 elfutils-libs-0.172-2.el7.x86_64 infinipath-psm-3.3-26_g604758e_open.2.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-13.el7.x86_64 libfabric-1.6.1-2.el7.x86_64 libffi-3.0.13-18.el7.x86_64 libibverbs-17.2-3.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 libpciaccess-0.14-1.el7.x86_64 libpsm2-10.3.58-1.el7.x86_64 librdmacm-17.2-3.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libsodium13-1.0.5-1.el7.x86_64 libuuid-2.23.2-59.el7_6.1.x86_64 numactl-libs-2.0.9-7.el7.x86_64 openpgm-5.2.122-2.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7.x86_64 zeromq-4.1.4-5.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  0x00007fef071c3080 in fi_ep_bind (flags=0, bfid=0x157b120, ep=0x0) at /usr/include/rdma/fi_endpoint.h:168
#1  init_fabric (fabric=0x1578c40, Params=<optimized out>)
    at /home/users/khuck/src/ADIOS2/source/adios2/toolkit/sst/dp/rdma_dp.c:198
#2  0x00007fef071c31ec in RdmaInitWriter (Svcs=0x7fef073cbac0 <Svcs>, CP_Stream=0x155c420, Params=0x155c2f8)
    at /home/users/khuck/src/ADIOS2/source/adios2/toolkit/sst/dp/rdma_dp.c:558
#3  0x00007fef071b9ba6 in SstWriterOpen (Name=Name@entry=0x155c3e0 "sim.bp", Params=Params@entry=0x155c2f8, 
    comm=comm@entry=0x155a710) at /home/users/khuck/src/ADIOS2/source/adios2/toolkit/sst/cp/cp_writer.c:1124
#4  0x00007fef08ed7827 in adios2::core::engine::SstWriter::SstWriter(adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode, ompi_communicator_t*) ()
    at /home/users/khuck/src/ADIOS2/source/adios2/engine/sst/SstWriter.cpp:35
#5  0x00007fef08bde003 in construct<adios2::core::engine::SstWriter, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&>
    (this=<optimized out>, __p=0x155c230) at /storage/packages/gcc/8.1/include/c++/8.1.0/new:169
#6  construct<adios2::core::engine::SstWriter, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> (__a=..., 
    __p=0x155c230) at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/alloc_traits.h:475
#7  _Sp_counted_ptr_inplace<adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> (__a=..., this=0x155c220)
    at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/shared_ptr_base.h:549
#8  __shared_count<adios2::core::engine::SstWriter, std::allocator<adios2::core::engine::SstWriter>, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> (__a=..., this=<optimized out>)
    at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/shared_ptr_base.h:662
#9  __shared_ptr<std::allocator<adios2::core::engine::SstWriter>, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> (
    __a=..., __tag=..., this=<optimized out>)
    at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/shared_ptr_base.h:1328
#10 shared_ptr<std::allocator<adios2::core::engine::SstWriter>, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> (
    __a=..., __tag=..., this=<optimized out>)
    at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/shared_ptr.h:360
---Type <return> to continue, or q <return> to quit---
#11 allocate_shared<adios2::core::engine::SstWriter, std::allocator<adios2::core::engine::SstWriter>, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> (__a=...)
    at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/shared_ptr.h:707
#12 make_shared<adios2::core::engine::SstWriter, adios2::core::IO&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode const&, ompi_communicator_t*&> ()
    at /storage/packages/gcc/8.1/include/c++/8.1.0/bits/shared_ptr.h:723
#13 adios2::core::IO::Open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode, ompi_communicator_t*) () at /home/users/khuck/src/ADIOS2/source/adios2/core/IO.cpp:567
#14 0x00007fef08f3f3d3 in adios2::IO::Open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, adios2::Mode, ompi_communicator_t*) ()
    at /home/users/khuck/src/ADIOS2/bindings/CXX11/adios2/cxx11/IO.cpp:112
#15 0x000000000040fb36 in IO::IO(Settings const&, ompi_communicator_t*) () at simulation/IO_adios2.cpp:79
#16 0x000000000040b4ef in main () at simulation/heatSimulation.cpp:78
#17 0x00007fef076653d5 in __libc_start_main (main=0x40b320 <main>, argc=8, argv=0x7ffe942a12e8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe942a12d8)
    at ../csu/libc-start.c:266
#18 0x000000000040b71f in _start () at simulation/IO_adios2.cpp:118
@philip-davis
Copy link
Collaborator

I'm not positive, but I think this might be the issue introduced by #1713 and fixed in #1715. Would it be possible to pull the most recent master and try again, with the following environment variables set:

export SstVerbose=1
export FI_LOG_LEVEL=debug

@khuck
Copy link
Collaborator Author

khuck commented Sep 9, 2019

trying now...

@khuck
Copy link
Collaborator Author

khuck commented Sep 9, 2019

@philip-davis nope, I get the same error.

@khuck
Copy link
Collaborator Author

khuck commented Sep 9, 2019

Here's the output with the environment variables set. I changed to only 2 total ranks to keep the noise to a minimum:

+ mpirun --mca btl_openib_allow_ib true --mca btl_openib_warn_default_gid_prefix 0 -n 1 tau_exec -T mpi,pthread ./heatSimulation sim.bp 1 1 64 64 100 10 : -n 1 tau_exec -T mpi,pthread ./heatAnalysis sim.bp analysis.bp 1 1
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():272<info> variable provider=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider
libfabric:core:core:fi_param_define_():223<info> registered var fork_unsafe
libfabric:core:core:fi_param_define_():223<info> registered var universe_size
libfabric:core:core:fi_param_get_():272<info> variable provider=<not set>
libfabric:core:core:fi_param_define_():223<info> registered var provider_path
libfabric:core:core:fi_param_get_():272<info> variable provider_path=<not set>
libfabric:psm2:core:fi_psm2_ini():485<info> build options: HAVE_PSM2_SRC=0, HAVE_PSM2_AM_REGISTER_HANDLERS_2=1, PSMX2_USE_REQ_CONTEXT=0
libfabric:psm2:core:fi_param_define_():223<info> registered var name_server
libfabric:psm2:core:fi_param_define_():223<info> registered var tagged_rma
libfabric:psm2:core:fi_param_define_():223<info> registered var uuid
libfabric:psm2:core:fi_param_define_():223<info> registered var delay
libfabric:psm2:core:fi_param_define_():223<info> registered var timeout
libfabric:psm2:core:fi_param_define_():223<info> registered var prog_interval
libfabric:psm2:core:fi_psm2_ini():485<info> build options: HAVE_PSM2_SRC=0, HAVE_PSM2_AM_REGISTER_HANDLERS_2=1, PSMX2_USE_REQ_CONTEXT=0
libfabric:psm2:core:fi_param_define_():223<info> registered var name_server
libfabric:psm2:core:fi_param_define_():223<info> registered var tagged_rma
libfabric:psm2:core:fi_param_define_():223<info> registered var uuid
libfabric:psm2:core:fi_param_define_():223<info> registered var delay
libfabric:psm2:core:fi_param_define_():223<info> registered var timeout
libfabric:psm2:core:fi_param_define_():223<info> registered var prog_interval
libfabric:psm2:core:fi_param_define_():223<info> registered var prog_affinity
libfabric:psm2:core:fi_param_define_():223<info> registered var inject_size
libfabric:psm2:core:fi_param_define_():223<info> registered var lock_level
libfabric:psm2:core:fi_param_define_():223<info> registered var lazy_conn
libfabric:psm2:core:fi_param_define_():223<info> registered var disconnect
libfabric:psm2:core:fi_param_define_():223<info> registered var tag_layout
libfabric:psm2:core:fi_param_get_():272<info> variable name_server=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable tagged_rma=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable uuid=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable delay=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable timeout=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable prog_interval=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable prog_affinity=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable inject_size=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable lock_level=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable lazy_conn=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable disconnect=<not set>
libfabric:psm2:core:fi_param_define_():223<info> registered var prog_affinity
libfabric:psm2:core:fi_param_define_():223<info> registered var inject_size
libfabric:psm2:core:fi_param_define_():223<info> registered var lock_level
libfabric:psm2:core:fi_param_define_():223<info> registered var lazy_conn
libfabric:psm2:core:fi_param_define_():223<info> registered var disconnect
libfabric:psm2:core:fi_param_define_():223<info> registered var tag_layout
libfabric:psm2:core:fi_param_get_():272<info> variable name_server=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable tagged_rma=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable uuid=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable delay=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable timeout=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable prog_interval=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable prog_affinity=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable inject_size=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable lock_level=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable lazy_conn=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable disconnect=<not set>
libfabric:psm2:core:fi_param_get_():272<info> variable tag_layout=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: psm2 (1.6)
libfabric:psm:core:fi_psm_ini():708<info> 
libfabric:psm:core:fi_param_define_():223<info> registered var name_server
libfabric:psm:core:fi_param_define_():223<info> registered var am_msg
libfabric:psm:core:fi_param_define_():223<info> registered var tagged_rma
libfabric:psm:core:fi_param_define_():223<info> registered var uuid
libfabric:psm:core:fi_param_define_():223<info> registered var delay
libfabric:psm:core:fi_param_define_():223<info> registered var timeout
libfabric:psm:core:fi_param_define_():223<info> registered var prog_thread
libfabric:psm:core:fi_param_define_():223<info> registered var prog_interval
libfabric:psm:core:fi_param_define_():223<info> registered var prog_affinity
libfabric:psm:core:fi_param_get_():272<info> variable name_server=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable am_msg=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable tagged_rma=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable uuid=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable delay=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable timeout=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable prog_thread=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable prog_interval=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable prog_affinity=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: psm (1.6)
libfabric:core:core:ofi_register_provider():200<info> registering provider: usnic (1.0)
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():200<info> registering provider: shm (1.0)
libfabric:ofi_rxm:core:fi_param_define_():223<info> registered var buffer_size
libfabric:ofi_rxm:core:fi_param_define_():223<info> registered var comp_per_progress
libfabric:ofi_rxm:core:fi_param_get_():272<info> variable buffer_size=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: ofi_rxm (1.0)
libfabric:verbs:core:fi_param_define_():223<info> registered var tx_size
libfabric:verbs:core:fi_param_get_():272<info> variable tx_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rx_size
libfabric:verbs:core:fi_param_get_():272<info> variable rx_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered valibfabric:psm2:core:fi_param_get_():272<info> variable tag_layout=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: psm2 (1.6)
libfabric:psm:core:fi_psm_ini():708<info> 
libfabric:psm:core:fi_param_define_():223<info> registered var name_server
libfabric:psm:core:fi_param_define_():223<info> registered var am_msg
libfabric:psm:core:fi_param_define_():223<info> registered var tagged_rma
libfabric:psm:core:fi_param_define_():223<info> registered var uuid
libfabric:psm:core:fi_param_define_():223<info> registered var delay
libfabric:psm:core:fi_param_define_():223<info> registered var timeout
libfabric:psm:core:fi_param_define_():223<info> registered var prog_thread
libfabric:psm:core:fi_param_define_():223<info> registered var prog_interval
libfabric:psm:core:fi_param_define_():223<info> registered var prog_affinity
libfabric:psm:core:fi_param_get_():272<info> variable name_server=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable am_msg=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable tagged_rma=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable uuid=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable delay=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable timeout=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable prog_thread=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable prog_interval=<not set>
libfabric:psm:core:fi_param_get_():272<info> variable prog_affinity=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: psm (1.6)
libfabric:core:core:ofi_register_provider():200<info> registering provider: usnic (1.0)
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():193<warn> no provider structure or name
libfabric:core:core:ofi_register_provider():200<info> registering provider: shm (1.0)
libfabric:ofi_rxm:core:fi_param_define_():223<info> registered var buffer_size
libfabric:ofi_rxm:core:fi_param_define_():223<info> registered var comp_per_progress
libfabric:ofi_rxm:core:fi_param_get_():272<info> variable buffer_size=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: ofi_rxm (1.0)
libfabric:verbs:core:fi_param_define_():223<info> registered var tx_size
libfabric:verbs:core:fi_param_get_():272<info> variable tx_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rx_size
libfabric:verbs:core:fi_param_get_():272<info> variable rx_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var tx_iov_limit
libfabric:verbs:core:fi_param_get_():272<info> variable tx_iov_limit=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rx_iov_limit
libfabric:verbs:core:fi_param_get_():272<info> variable rx_iov_limit=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var inline_size
libfabric:verbs:core:fi_param_get_():272<info> variable inline_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var min_rnr_timer
libfabric:verbs:core:fi_param_get_():272<info> variable min_rnr_timer=<not set>
libfabric:core:core:fi_param_get_():272<info> variable fork_unsafe=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var use_odp
libfabric:verbs:core:fi_param_get_():272<info> variable use_odp=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var cqread_bunch_size
libfabric:verbs:core:fi_param_get_():272<info> variable cqread_bunch_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var iface
libfabric:verbs:core:fi_param_get_():272<info> variable iface=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_cache_enable
libfabric:verbs:core:fi_param_get_():272<info> variable mr_cache_enable=<not set>
libfabric:verbs:corer tx_iov_limit
libfabric:verbs:core:fi_param_get_():272<info> variable tx_iov_limit=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rx_iov_limit
libfabric:verbs:core:fi_param_get_():272<info> variable rx_iov_limit=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var inline_size
libfabric:verbs:core:fi_param_get_():272<info> variable inline_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var min_rnr_timer
libfabric:verbs:core:fi_param_get_():272<info> variable min_rnr_timer=<not set>
libfabric:core:core:fi_param_get_():272<info> variable fork_unsafe=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var use_odp
libfabric:verbs:core:fi_param_get_():272<info> variable use_odp=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var cqread_bunch_size
libfabric:verbs:core:fi_param_get_():272<info> variable cqread_bunch_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var iface
libfabric:verbs:core:fi_param_get_():272<info> variable iface=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_cache_enable
libfabric:verbs:core:fi_param_get_():272<info> variable mr_cache_enable=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_max_cached_cnt
libfabric:verbs:core:fi_param_get_():272<info> variable mr_max_cached_cnt=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_max_cached_size
libfabric:verbs:core:fi_param_get_():272<info> variable mr_max_cached_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_cache_merge_regions
libfabric:verbs:core:fi_param_get_():272<info> variable mr_cache_merge_regions=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_buffer_num
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_buffer_num=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_buffer_size
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_buffer_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_rndv_seg_size
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_rndv_seg_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_thread_timeout
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_thread_timeout=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_eager_send_opcode
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_eager_send_opcode=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_cm_thread_affinity
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_cm_thread_affinity=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var dgram_use_name_server
libfabric:verbs:core:fi_param_get_():272<info> variable dgram_use_name_server=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var dgram_name_server_port
libfabric:verbs:core:fi_param_get_():272<info> variable dgram_name_server_port=<not set>
libfabric:verbs:core:fi_ibv_init_info():1057<info> Enabling IB fork support
libfabric:verbs:fabric:fi_ibv_get_device_attrs():550<info> The first found active port is 1
:fi_param_define_():223<info> registered var mr_max_cached_cnt
libfabric:verbs:core:fi_param_get_():272<info> variable mr_max_cached_cnt=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_max_cached_size
libfabric:verbs:core:fi_param_get_():272<info> variable mr_max_cached_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var mr_cache_merge_regions
libfabric:verbs:core:fi_param_get_():272<info> variable mr_cache_merge_regions=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_buffer_num
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_buffer_num=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_buffer_size
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_buffer_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_rndv_seg_size
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_rndv_seg_size=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_thread_timeout
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_thread_timeout=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_eager_send_opcode
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_eager_send_opcode=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var rdm_cm_thread_affinity
libfabric:verbs:core:fi_param_get_():272<info> variable rdm_cm_thread_affinity=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var dgram_use_name_server
libfabric:verbs:core:fi_param_get_():272<info> variable dgram_use_name_server=<not set>
libfabric:verbs:core:fi_param_define_():223<info> registered var dgram_name_server_port
libfabric:verbs:core:fi_param_get_():272<info> variable dgram_name_server_port=<not set>
libfabric:verbs:core:fi_ibv_init_info():1057<info> Enabling IB fork support
libfabric:verbs:fabric:fi_ibv_get_device_attrs():550<info> The first found active port is 1
libfabric:verbs:fabric:fi_ibv_get_device_attrs():550<info> The first found active port is 1
libfabric:verbs:fabric:fi_ibv_get_device_attrs():550<info> The first found active port is 1
libfabric:core:core:ofi_register_provider():200<info> registering provider: verbs (1.0)
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: UDP (1.1)
libfabric:sockets:core:fi_param_define_():223<info> registered var pe_waittime
libfabric:verbs:fabric:fi_ibv_get_device_attrs():550<info> The first found active port is 1
libfabric:verbs:fabric:fi_ibv_get_device_attrs():550<info> The first found active port is 1
libfabric:core:core:ofi_register_provider():200<info> registering provider: verbs (1.0)
libfabric:core:core:fi_param_define_():223<info> registered var rxd_enable
libfabric:core:core:fi_param_get_():272<info> variable rxd_enable=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: UDP (1.1)
libfabric:sockets:core:fi_param_define_():223<info> registered var pe_waittime
libfabric:sockets:core:fi_param_define_():223<info> registered var max_conn_retry
libfabric:sockets:core:fi_param_define_():223<info> registered var def_conn_map_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var def_av_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var def_cq_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var max_conn_retry
libfabric:sockets:core:fi_param_define_():223<info> registered var def_conn_map_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var def_av_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var def_eq_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var pe_affinity
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_enable
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_time
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_intvl
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_probes
libfabric:sockets:core:fi_param_define_():223<info> registered var interface_name
libfabric:sockets:core:fi_param_get_():272<info> variable interface_name=<not set>
libfabric:sockets:core:fi_param_define_():223<info> registered var def_cq_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var def_eq_sz
libfabric:sockets:core:fi_param_define_():223<info> registered var pe_affinity
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_enable
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_time
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_intvl
libfabric:sockets:core:fi_param_define_():223<info> registered var keepalive_probes
libfabric:sockets:core:fi_param_define_():223<info> registered var interface_name
libfabric:sockets:core:fi_param_get_():272<info> variable interface_name=<not set>
libfabric:core:core:ofi_register_provider():200<info> registering provider: sockets (2.0)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:core:core:ofi_register_provider():200<info> registering provider: sockets (2.0)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:ofi_rxm:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:ofi_rxm:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:ofi_rxm:core:ofi_check_ep_attr():635<info> Supported: FI_EP_RDM
libfabric:ofi_rxm:core:ofi_check_ep_attr():635<info> Requested: FI_EP_DGRAM
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available)
libfabric:ofi_rxm:core:ofi_check_ep_attr():635<info> Supported: FI_EP_RDM
libfabric:ofi_rxm:core:ofi_check_ep_attr():635<info> Requested: FI_EP_DGRAM
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available)
Process decomposition  : 1 x 1
Array size per process : 64 x 64
Number of output steps : 100
Iterations per step    : 10
Using SST engine for input
Using BP4 engine for output
Reader 0 (0x1642d10): Sst set to use sockets as a Control Transport
libfabric:psm2:core:psmx2_getinfo():316<info> 
libfabric:psm2:core:psmx2_init_prov_info():268<info> TAG60 instance included
libfabric:psm2:core:psmx2_init_prov_info():285<info> TAG64 instance included
libfabric:psm2:core:psmx2_init_lib():217<info> PSM2 header version = (2, 1)
libfabric:psm2:core:psmx2_init_lib():219<info> PSM2 library version = (2, 1)
libfabric:psm2:core:psmx2_init_lib():222<info> PSM2 multi-ep feature enabled.
Writer 0 (0x1d8d010): Sst set to use sockets as a Control Transport
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/ports/1/state: 4
libfabric:psm2:core:psmx2_getinfo():316<info> 
libfabric:psm2:core:psmx2_init_prov_info():268<info> TAG60 instance included
libfabric:psm2:core:psmx2_init_prov_info():285<info> TAG64 instance included
libfabric:psm2:core:psmx2_init_lib():217<info> PSM2 header version = (2, 1)
libfabric:psm2:core:psmx2_init_lib():219<info> PSM2 library version = (2, 1)
libfabric:psm2:core:psmx2_init_lib():222<info> PSM2 multi-ep feature enabled.
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/nctxts: 36
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/nfreectxts: 36
libfabric:psm2:core:psmx2_update_hfi_info():281<info> hfi1 units: total 1, active 1; hfi1 contexts: total 36, free 36
libfabric:psm2:core:psmx2_update_hfi_info():292<info> Tx/Rx contexts: 36 in total, 36 available.
libfabric:psm2:core:psmx2_alter_prov_info():407<info> 2 instances available, 1 with CQ data flag set
libfabric:psm:core:psmx_getinfo():206<info> 
libfabric:psm:core:psmx_init_lib():97<info> PSM header version = (1, 16)
libfabric:psm:core:psmx_init_lib():99<info> PSM library version = (1, 16)
libfabric:psm:core:psmx_getinfo():289<info> no PSM device is found.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:psm2:core:psmx2_getinfo():316<info> 
libfabric:psm2:core:psmx2_init_prov_info():201<info> Unsupported endpoint type
libfabric:psm2:core:psmx2_init_prov_info():203<info> Supported: FI_EP_RDM
libfabric:psm2:core:psmx2_init_prov_info():205<info> Supported: FI_EP_DGRAM
libfabric:psm2:core:psmx2_init_prov_info():207<info> Requested: FI_EP_MSG
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:psm:core:psmx_getinfo():206<info> 
libfabric:psm:core:psmx_getinfo():238<info> hints->ep_attr->type=1, supported=0,2,3.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():661<info> Need core provider, skipping util ofi_rxm
libfabric:verbs:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:verbs:core:ofi_check_ep_attr():635<info> Supported: FI_EP_RDM
libfabric:verbs:core:ofi_check_ep_attr():635<info> Requested: FI_EP_MSG
libfabric:verbs:core:fi_ibv_check_hints():235<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_check_hints():236<info> Supported: FI_LOCAL_COMM, FI_REMOTE_COMM, FI_MSG, FI_RECV, FI_SEND
libfabric:verbs:core:fi_ibv_check_hints():236<info> Requested: FI_RMA
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/ports/1/state: 4
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/nctxts: 36
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/nfreectxts: 36
libfabric:psm2:core:psmx2_update_hfi_info():281<info> hfi1 units: total 1, active 1; hfi1 contexts: total 36, free 36
libfabric:psm2:core:psmx2_update_hfi_info():292<info> Tx/Rx contexts: 36 in total, 36 available.
libfabric:psm2:core:psmx2_alter_prov_info():407<info> 2 instances available, 1 with CQ data flag set
libfabric:psm:core:psmx_getinfo():206<info> 
libfabric:psm:core:psmx_init_lib():97<info> PSM header version = (1, 16)
libfabric:psm:core:psmx_init_lib():99<info> PSM library version = (1, 16)
libfabric:psm:core:psmx_getinfo():289<info> no PSM device is found.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:psm2:core:psmx2_getinfo():316<info> 
libfabric:psm2:core:psmx2_init_prov_info():201<info> Unsupported endpoint type
libfabric:psm2:core:psmx2_init_prov_info():203<info> Supported: FI_EP_RDM
libfabric:psm2:core:psmx2_init_prov_info():205<info> Supported: FI_EP_DGRAM
libfabric:psm2:core:psmx2_init_prov_info():207<info> Requested: FI_EP_MSG
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:psm:core:psmx_getinfo():206<info> 
libfabric:psm:core:psmx_getinfo():238<info> hints->ep_attr->type=1, supported=0,2,3.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():661<info> Need core provider, skipping util ofi_rxm
libfabric:verbs:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:verbs:core:ofi_check_ep_attr():635<info> Supported: FI_EP_RDM
libfabric:verbs:core:ofi_check_ep_attr():635<info> Requested: FI_EP_MSG
libfabric:verbs:core:fi_ibv_check_hints():235<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_check_hints():236<info> Supported: FI_LOCAL_COMM, FI_REMOTE_COMM, FI_MSG, FI_RECV, FI_SEND
libfabric:verbs:core:fi_ibv_check_hints():236<info> Requested: FI_RMA
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:core:fi_ibv_get_match_infos():1444<info> Handling of the socket address fails - -61
libfabric:UDP:core:ofi_check_info():979<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_get_match_infos():1444<info> Handling of the socket address fails - -61
libfabric:UDP:core:ofi_check_info():979<info> Unsupported capabilities
libfabric:UDP:core:ofi_check_info():980<info> Supported: FI_SOURCE, FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND
libfabric:UDP:core:ofi_check_info():980<info> Requested: FI_RMA
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider UDP returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():668<info> Skipping util;sockets layering
libfabric:shm:core:ofi_check_mr_mode():512<info> Invalid memory registration mode
libfabric:shm:core:ofi_check_mr_mode():513<info> Expected: FI_MR_SCALABLE
libfabric:shm:core:ofi_check_mr_mode():513<info> Given: FI_MR_BASIC
libfabric:UDP:core:ofi_check_info():980<info> Supported: FI_SOURCE, FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND
libfabric:UDP:core:ofi_check_info():980<info> Requested: FI_RMA
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider UDP returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():668<info> Skipping util;sockets layering
libfabric:shm:core:ofi_check_mr_mode():512<info> Invalid memory registration mode
libfabric:shm:core:ofi_check_mr_mode():513<info> Expected: FI_MR_SCALABLE
libfabric:shm:core:ofi_check_mr_mode():513<info> Given: FI_MR_BASIC
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider shm returned -61 (No data available)
libfabric:verbs:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:verbs:core:ofi_check_ep_attr():635<info> Supported: FI_EP_MSG
libfabric:verbs:core:ofi_check_ep_attr():635<info> Requested: FI_EP_RDM
libfabric:verbs:core:fi_ibv_check_hints():235<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_check_hints():236<info> Supported: FI_LOCAL_COMM, FI_REMOTE_COMM, FI_MSG, FI_RECV, FI_SEND
libfabric:verbs:core:fi_ibv_check_hints():236<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider shm returned -61 (No data available)
libfabric:verbs:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:verbs:core:ofi_check_ep_attr():635<info> Supported: FI_EP_MSG
libfabric:verbs:core:ofi_check_ep_attr():635<info> Requested: FI_EP_RDM
libfabric:verbs:core:fi_ibv_check_hints():235<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_check_hints():236<info> Supported: FI_LOCAL_COMM, FI_REMOTE_COMM, FI_MSG, FI_RECV, FI_SEND
libfabric:verbs:core:fi_ibv_check_hints():236<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:core:fi_ibv_get_match_infos():1444<info> Handling of the socket address fails - -61
libfabric:UDP:core:ofi_check_info():979<info> Unsupported capabilities
libfabric:UDP:core:ofi_check_info():980<info> Supported: FI_SOURCE, FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND
libfabric:UDP:core:ofi_check_info():980<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider UDP returned -61 (No data available)
libfabric:verbs:core:fi_ibv_get_match_infos():1444<info> Handling of the socket address fails - -61
libfabric:UDP:core:ofi_check_info():979<info> Unsupported capabilities
libfabric:UDP:core:ofi_check_info():980<info> Supported: FI_SOURCE, FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND
libfabric:UDP:core:ofi_check_info():980<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider UDP returned -61 (No data available)
libfabric:shm:core:ofi_check_domain_attr():546<info> Invalid data progress model
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider shm returned -61 (No data available)
libfabric:shm:core:ofi_check_domain_attr():546<info> Invalid data progress model
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider shm returned -61 (No data available)
DP Reader 0 (0x1642d10): RDMA Dataplane sees interface psm2, provider type psm2, which should work.
DP Reader 0 (0x1642d10): RDMA Dataplane sees interface psm2, provider type psm2, which should work.
DP Reader 0 (0x1642d10): RDMA Dataplane evaluating viability, returning priority 10
DP Reader 0 (0x1642d10): Considering DataPlane "evpath" for possible use, priority is 1
DP Reader 0 (0x1642d10): Considering DataPlane "rdma" for possible use, priority is 10
DP Reader 0 (0x1642d10): Selecting DataPlane "rdma", priority 10 for use
DP Writer 0 (0x1d8d010): RDMA Dataplane sees interface psm2, provider type psm2, which should work.
DP Writer 0 (0x1d8d010): RDMA Dataplane sees interface psm2, provider type psm2, which should work.
DP Writer 0 (0x1d8d010): RDMA Dataplane evaluating viability, returning priority 10
DP Writer 0 (0x1d8d010): Considering DataPlane "evpath" for possible use, priority is 1
DP Writer 0 (0x1d8d010): Considering DataPlane "rdma" for possible use, priority is 10
DP Writer 0 (0x1d8d010): Selecting DataPlane "rdma", priority 10 for use
libfabric:psm2:core:psmx2_getinfo():316<info> 
libfabric:psm2:core:psmx2_init_prov_info():268<info> TAG60 instance included
libfabric:psm2:core:psmx2_init_prov_info():285<info> TAG64 instance included
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/ports/1/state: 4
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/nctxts: 36
libfabric:psm2:core:psmx2_read_sysfs_int():245<info> /sys/class/infiniband/hfi1_0/nfreectxts: 36
libfabric:psm2:core:psmx2_update_hfi_info():281<info> hfi1 units: total 1, active 1; hfi1 contexts: total 36, free 36
libfabric:psm2:core:psmx2_update_hfi_info():292<info> Tx/Rx contexts: 36 in total, 36 available.
libfabric:psm2:core:psmx2_alter_prov_info():407<info> 2 instances available, 1 with CQ data flag set
libfabric:psm:core:psmx_getinfo():206<info> 
libfabric:psm:core:psmx_getinfo():289<info> no PSM device is found.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:psm2:core:psmx2_getinfo():316<info> 
libfabric:psm2:core:psmx2_init_prov_info():201<info> Unsupported endpoint type
libfabric:psm2:core:psmx2_init_prov_info():203<info> Supported: FI_EP_RDM
libfabric:psm2:core:psmx2_init_prov_info():205<info> Supported: FI_EP_DGRAM
libfabric:psm2:core:psmx2_init_prov_info():207<info> Requested: FI_EP_MSG
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm2 returned -61 (No data available)
libfabric:psm:core:psmx_getinfo():206<info> 
libfabric:psm:core:psmx_getinfo():238<info> hints->ep_attr->type=1, supported=0,2,3.
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider psm returned -61 (No data available)
libfabric:usnic:fabric:usdf_getinfo():863<trace> 
libfabric:usnic:fabric:usdf_getinfo():964<info> returning -61 (No data available)
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():661<info> Need core provider, skipping util ofi_rxm
libfabric:verbs:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:verbs:core:ofi_check_ep_attr():635<info> Supported: FI_EP_RDM
libfabric:verbs:core:ofi_check_ep_attr():635<info> Requested: FI_EP_MSG
libfabric:verbs:core:fi_ibv_check_hints():235<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_check_hints():236<info> Supported: FI_LOCAL_COMM, FI_REMOTE_COMM, FI_MSG, FI_RECV, FI_SEND
libfabric:verbs:core:fi_ibv_check_hints():236<info> Requested: FI_RMA
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:core:fi_ibv_get_match_infos():1444<info> Handling of the socket address fails - -61
libfabric:UDP:core:ofi_check_info():979<info> Unsupported capabilities
libfabric:UDP:core:ofi_check_info():980<info> Supported: FI_SOURCE, FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND
libfabric:UDP:core:ofi_check_info():980<info> Requested: FI_RMA
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider UDP returned -61 (No data available)
libfabric:core:core:ofi_layering_ok():668<info> Skipping util;sockets layering
libfabric:shm:core:ofi_check_mr_mode():512<info> Invalid memory registration mode
libfabric:shm:core:ofi_check_mr_mode():513<info> Expected: FI_MR_SCALABLE
libfabric:shm:core:ofi_check_mr_mode():513<info> Given: FI_MR_BASIC
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider shm returned -61 (No data available)
libfabric:verbs:core:ofi_check_ep_attr():634<info> Unsupported endpoint type
libfabric:verbs:core:ofi_check_ep_attr():635<info> Supported: FI_EP_MSG
libfabric:verbs:core:ofi_check_ep_attr():635<info> Requested: FI_EP_RDM
libfabric:verbs:core:fi_ibv_check_hints():235<info> Unsupported capabilities
libfabric:verbs:core:fi_ibv_check_hints():236<info> Supported: FI_LOCAL_COMM, FI_REMOTE_COMM, FI_MSG, FI_RECV, FI_SEND
libfabric:verbs:core:fi_ibv_check_hints():236<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE
Reader 0 (0x1642d10): Looking for writer contact in file sim.bp.sst, with timeout 10 secs
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: No such device(19)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:fabric:fi_ibv_create_ep():221<info> rdma_create_ep: Invalid argument(22)
libfabric:verbs:core:fi_ibv_get_match_infos():1444<info> Handling of the socket address fails - -61
libfabric:UDP:core:ofi_check_info():979<info> Unsupported capabilities
libfabric:UDP:core:ofi_check_info():980<info> Supported: FI_SOURCE, FI_MSG, FI_MULTICAST, FI_RECV, FI_SEND
libfabric:UDP:core:ofi_check_info():980<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_WRITE, FI_RECV, FI_SEND, FI_REMOTE_READ, FI_REMOTE_WRITE
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider UDP returned -61 (No data available)
libfabric:shm:core:ofi_check_domain_attr():546<info> Invalid data progress model
libfabric:core:core:fi_getinfo_():751<warn> fi_getinfo: provider shm returned -61 (No data available)
libfabric:psm2:core:psmx2_fabric():89<info> 
libfabric:psm2:domain:psmx2_domain_open():281<info> 
libfabric:psm2:core:psmx2_domain_start_progress():148<info> progress thread started
libfabric:psm2:core:psmx2_init_tag_layout():150<info> use tag60: tag_mask: 0FFFFFFFFFFFFFFF, data_mask: FFFFFFFF
libfabric:psm2:core:psmx2_trx_ctxt_alloc():275<info> uuid: 00FF00FF-0000-0000-0000-00FF00FF00FF
libfabric:psm2:core:psmx2_trx_ctxt_alloc():280<info> ep_open_opts: unit=-1 port=0
libfabric:psm2:core:psmx2_trx_ctxt_alloc():293<warn> psm2_ep_open returns 12, errno=22
libfabric:psm2:av:psmx2_av_open():924<info> Multi-EP is enabled, force FI_AV_TABLE
libfabric:psm2:av:psmx2_av_open():993<info> type = FI_AV_TABLE
[delphi:06918] *** Process received signal ***
[delphi:06918] Signal: Segmentation fault (11)
[delphi:06918] Signal code: Address not mapped (1)
[delphi:06918] Failing at address: 0x10
[delphi:06918] [ 0] /lib64/libpthread.so.0(+0xf5d0)[0x7eff8fe225d0]
[delphi:06918] [ 1] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x1729f)[0x7eff8e8ec29f]
[delphi:06918] [ 2] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(+0x17591)[0x7eff8e8ec591]
[delphi:06918] [ 3] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/../lib64/libadios2_sst.so.2(SstWriterOpen+0xc6)[0x7eff8e8e2cf6]
[delphi:06918] [ 4] libfabric:psm2:core:psmx2_progress_func():110<info> 
/storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core6engine9SstWriterC2ERNS0_2IOERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeENS_6helper4CommE+0x165)[0x7eff9132a055]
[delphi:06918] [ 5] libfabric:psm2:core:psmx2_progress_set_affinity():60<info> progress thread affinity not set
/storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios24core2IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0x891)[0x7eff90ffccd1]
[delphi:06918] [ 6] /storage/users/khuck/src/ADIOS2/install_mpi/lib64/libadios2.so.2(_ZN6adios22IO4OpenERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_4ModeEP19ompi_communicator_t+0xe3)[0x7eff91392c53]
[delphi:06918] [ 7] ./heatSimulation[0x40fb36]
[delphi:06918] [ 8] ./heatSimulation[0x40b4ef]
[delphi:06918] [ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7eff8fa683d5]
[delphi:06918] [10] ./heatSimulation[0x40b71f]
[delphi:06918] *** End of error message ***
/home/users/khuck/src/tau2/x86_64/bin/tau_exec: line 1292:  6918 Segmentation fault      $dryrun "$@"
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[11122,1],0]
  Exit code:    139
--------------------------------------------------------------------------

@philip-davis
Copy link
Collaborator

Does this machine have an attached omnipath fabric?

@khuck
Copy link
Collaborator Author

khuck commented Sep 10, 2019

@philip-davis I just learned from our sysadmin that the omnipath isn't working on that machine, so this could very well be a wild goose chase. Although, would you be able to detect if omnipath was working during cmake configuration?

@philip-davis
Copy link
Collaborator

If Omnipath is present, but disabled that might explain the issue. When libfabric builds, it detects what fabrics it can build against, but doesn't do any runtime validation of them. ADIOS builds against libfabric, and it inherits that configuration. We do runtime checks to see if there's a "valid" fabric that we can run against, but we're at libfabric's mercy for those checks. In this case, it believes that the Omnipath fabric exists but crashes when it tries to use it. It looks like the crash is coming from inside a thread that libfabric is launching for progress management, so this probably comes down to a bug in libfabric in handling unexpected psm2 states. In any case, since there's no RDMA fabric working in the machine (unless there's also a viable IB fabric besides the OPA?) you are better off sticking to the WAN/evpath dataplane.

@chuckatkins
Copy link
Contributor

@khuck unrelated to the issue at hand, I noticed when building you're using the mpicc compiler wrappers. When building ADIOS, and really most CMake packages in general, it's best to use the actual compiler and let the CMake's FindMPI module extract the necessary pieces.

@khuck
Copy link
Collaborator Author

khuck commented Sep 28, 2019

@chuckatkins I wish that were true with all CMake configurations, but not every project sets up a proper configuration. :) So my habit is to try with the compiler and when that doesn't work (some file can't compile because #include "mpi.h" fails, and the config didn't tell CMake to use the mpi wrappers for that file/library), try with the mpi wrappers. I can only assume that's what I did here...

@eisenhauer
Copy link
Member

@philip-davis Are there remaining things to sort out WRT this issue? Or can we close?

@khuck
Copy link
Collaborator Author

khuck commented Apr 21, 2020

close it. I haven't had problems with SST since, and other stability (i.e. crash) issues with ADIOS2 have been resolved.

@eisenhauer
Copy link
Member

Thanks @khuck !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants