Memory corruption error #840

anshumang · 2015-04-29T03:38:04Z

Has anyone seen this before?

Starting program: /home/agoswami/computationalRadiationPhysics/release-branch/build-temp/build_picongpu/picongpu -g 32 32 32 -d 1 1 1 -s 10
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffecc1f700 (LWP 14837)]
[New Thread 0x7fffebbfe700 (LWP 14838)]
[New Thread 0x7fffea471700 (LWP 14842)]
PIConGPUVerbose PHYSICS(1) | Sliding Window is OFF

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff78fb7c4 in opal_memory_ptmalloc2_int_malloc () from /usr/lib/libmpi.so.1
(gdb) bt
#0  0x00007ffff78fb7c4 in opal_memory_ptmalloc2_int_malloc () from /usr/lib/libmpi.so.1
#1  0x00007ffff78fdaf5 in opal_memory_ptmalloc2_int_memalign () from /usr/lib/libmpi.so.1
#2  0x00007ffff78fdf3c in opal_memory_ptmalloc2_memalign () from /usr/lib/libmpi.so.1
#3  0x00007ffff6497f2d in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff6498029 in operator new[](unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x000000000076a01f in PMacc::nvidia::memory::MemoryInfo::isSharedMemoryPool (this=0xdfdd98 <PMacc::nvidia::memory::MemoryInfo::getInstance()::instance>)
    at /home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/picongpu/../libPMacc/include/nvidia/memory/MemoryInfo.hpp:88
#6  0x0000000000788629 in picongpu::MySimulation::init (this=0xf1d3a0) at /home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/picongpu/include/simulationControl/MySimulation.hpp:276
#7  0x00000000007c32e2 in PMacc::SimulationHelper<3u>::startSimulation (this=0xf1d3a0)
    at /home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/picongpu/../libPMacc/include/simulationControl/SimulationHelper.hpp:180
#8  0x00000000007a814f in picongpu::SimulationStarter<picongpu::InitialiserController, picongpu::PluginController, picongpu::MySimulation>::start (this=0x7fffffffe320)
    at /home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/picongpu/include/simulationControl/SimulationStarter.hpp:86
#9  0x000000000075b0e4 in main (argc=11, argv=0x7fffffffe458) at /home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/picongpu/main.cu:56

The text was updated successfully, but these errors were encountered:

ax3l · 2015-04-29T09:27:26Z

Interesting, seems to hang in the isSharedMemoryPool check which allocates and free's some memory to find out if you are on a SoC-like device, such as Jetson TK1.

but since it failes on new it sounds like heap corruption to me...

What host system (OS, compiler & RAM) and GPU are you using and how much memory do both have?

ax3l · 2015-04-29T09:38:33Z

also, can you try to run valgrind on that?

anshumang · 2015-04-29T16:21:57Z

Host =>
OS : Ubuntu 14.04.1 LTS
Compiler : g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2
RAM :
MemTotal: 12292376 kB
MemFree: 6172092 kB

anshumang · 2015-04-29T16:36:39Z

Valgrind log
https://gist.github.com/anshumang/5ae56d978e2c9896d63a

ax3l · 2015-04-29T16:37:09Z

can you post the output of cmake -L . in the build dir, too?

anshumang · 2015-04-29T16:38:55Z

agoswami@shiva:~/computationalRadiationPhysics/release-branch/build-temp$ cmake -L
CMake Error: The source directory "/home/agoswami/computationalRadiationPhysics/release-branch/build-temp" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
-- Cache values
ADIOS_CONFIG:FILEPATH=ADIOS_CONFIG-NOTFOUND
CMAKE_BUILD_TYPE:STRING=
CMAKE_INSTALL_PREFIX:PATH=/home/agoswami/computationalRadiationPhysics/release-branch/param-sets-temp/KH
CUDA_ARCH:STRING=sm_20
CUDA_BUILD_CUBIN:BOOL=OFF
CUDA_BUILD_EMULATION:BOOL=OFF
CUDA_FTZ:STRING=--ftz=false
CUDA_HOST_COMPILER:FILEPATH=/usr/bin/cc
CUDA_KEEP_FILES:BOOL=OFF
CUDA_MATH:STRING=--use_fast_math
CUDA_MEMTEST_DIR:PATH=/home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/cuda_memtest
CUDA_MEMTEST_RELEASE:BOOL=ON
CUDA_SDK_ROOT_DIR:PATH=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION:BOOL=OFF
CUDA_SHOW_CODELINES:BOOL=OFF
CUDA_SHOW_REGISTER:BOOL=OFF
CUDA_TOOLKIT_ROOT_DIR:PATH=/usr/local/cuda
CUDA_VERBOSE_BUILD:BOOL=OFF
MPI_EXTRA_LIBRARY:STRING=/usr/lib/libmpi.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libhwloc.so
MPI_INFO_DIR:PATH=/home/agoswami/computationalRadiationPhysics/release-branch/picongpu/src/mpiInfo
MPI_LIBRARY:FILEPATH=/usr/lib/libmpi_cxx.so
PIC_COPY_ON_INSTALL:STRING=include/simulation_defines;submit
PIC_ENABLE_INSITU_VOLVIS:BOOL=OFF
PIC_EXTENSION_PATH:PATH=/home/agoswami/computationalRadiationPhysics/release-branch/param-sets-temp/KH
PIC_RELEASE:BOOL=OFF
PIC_VERBOSE:STRING=1
PMACC_BLOCKING_KERNEL:BOOL=OFF
PMACC_VERBOSE:STRING=0
PNGwriter_ROOT_DIR:PATH=PNGwriter_ROOT_DIR-NOTFOUND
SCOREP_ENABLE:BOOL=OFF
Splash_ROOT_DIR:PATH=Splash_ROOT_DIR-NOTFOUND
VAMPIR_ENABLE:BOOL=OFF
VT_INST_FILE_FILTER:STRING=stl,usr/include,libgpugrid,vector_types.h,Vector.hpp,DeviceBuffer.hpp,DeviceBufferIntern.hpp,Buffer.hpp,StrideMapping.hpp,StrideMappingMethods.hpp,MappingDescription.hpp,AreaMapping.hpp,AreaMappingMethods.hpp,ExchangeMapping.hpp,ExchangeMappingMethods.hpp,DataSpace.hpp,Manager.hpp,Manager.tpp,Transaction.hpp,Transaction.tpp,TransactionManager.hpp,TransactionManager.tpp,Vector.tpp,Mask.hpp,ITask.hpp,EventTask.hpp,EventTask.tpp,StandartAccessor.hpp,StandartNavigator.hpp,HostBuffer.hpp,HostBufferIntern.hpp
VT_INST_FUNC_FILTER:STRING=vector,Vector,dim3,GPUGrid,execute,allocator,Task,Manager,Transaction,Mask,operator,DataSpace,PitchedBox,Event,new,getGridDim,GetCurrentDataSpaces,MappingDescription,getOffset,getParticlesBuffer,getDataSpace,getInstance

ax3l · 2015-04-29T16:40:38Z

Are you running additional tasks on this machine?

The host only has 12 GB of memory and only 6 GB are left... the K40c has 12 GB of memory and one GPU but we usually assume the host has at least the same ammount of memory in RAM that we can use for double-buffering.

ax3l · 2015-04-29T16:41:24Z

Probably the easy solution to your problem: double your host memory (or allocate something on the GPU so PIConGPU can only use half of the 6GB - which would be a pity!)

anshumang · 2015-04-29T16:42:18Z

Oh, that is the problem then....this is a shared machine and some other student is running some large tasks...I could probably use the C2050 with 3GB memory, no?

ax3l · 2015-04-29T16:42:31Z

sure! just change the environment value in CUDA_VISIBLE_DEVICES

anshumang · 2015-04-29T16:43:03Z

thanks for your help in finding the problem 👍

ax3l · 2015-04-29T16:44:42Z

your are welcome :)

when designing systems for now, try to add at least the same amount of memory in the host as you have on the device. host memory is comparably cheap, so adding twice the amount that one has in devices of a node is a good idea and allows for neat tricks like time-averaging of large data sets (not yet implemented).

ax3l added the question label Apr 29, 2015

ax3l added this to the Open Beta milestone Apr 29, 2015

ax3l closed this as completed Apr 29, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory corruption error #840

Memory corruption error #840

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

Memory corruption error #840

Memory corruption error #840

Comments

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015

anshumang commented Apr 29, 2015

ax3l commented Apr 29, 2015