Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

crash on exit with error "corrupted size vs. prev_size" #8450

Closed
matthewdarwin opened this issue Jan 16, 2020 · 11 comments
Closed

crash on exit with error "corrupted size vs. prev_size" #8450

matthewdarwin opened this issue Jan 16, 2020 · 11 comments
Assignees
Labels

Comments

@matthewdarwin
Copy link

To reproduce

  • compile 2.0.0
  • start nodeos with database-map-mode = locked.
  • stop nodeos normally and let it shutdown
  • observe the logs

Files are stored on ext4 SSD. Please advise if any additional environmental information is needed.

Note: using systemd here to manage nodeos

Jan 16 03:58:34 mar140 nodeos[5779]: info  2020-01-16T03:58:34.812 nodeos    net_plugin.cpp:3521           plugin_shutdown      ] exit shutdown
Jan 16 03:58:34 mar140 nodeos[5779]: CHAINBASE: Writing "reversible" database file, this could take a moment...
Jan 16 03:58:35 mar140 nodeos[5779]:               7% complete...
Jan 16 03:58:35 mar140 nodeos[5779]:            Syncing buffers...
Jan 16 03:58:35 mar140 nodeos[5779]:            Complete
Jan 16 03:58:36 mar140 nodeos[5779]: CHAINBASE: Writing "state" database file, this could take a moment...
Jan 16 03:58:37 mar140 nodeos[5779]:               0% complete...
Jan 16 03:58:38 mar140 nodeos[5779]:               3% complete...
Jan 16 03:58:39 mar140 nodeos[5779]:               6% complete...
Jan 16 03:58:40 mar140 nodeos[5779]:               9% complete...
Jan 16 03:58:41 mar140 nodeos[5779]:               12% complete...
Jan 16 03:58:42 mar140 nodeos[5779]:               14% complete...
Jan 16 03:58:43 mar140 nodeos[5779]:               17% complete...
Jan 16 03:58:44 mar140 nodeos[5779]:               20% complete...
Jan 16 03:58:45 mar140 nodeos[5779]:               23% complete...
Jan 16 03:58:46 mar140 nodeos[5779]:               26% complete...
Jan 16 03:58:47 mar140 nodeos[5779]:               29% complete...
Jan 16 03:58:48 mar140 nodeos[5779]:               32% complete...
Jan 16 03:58:49 mar140 nodeos[5779]:               37% complete...
Jan 16 03:58:50 mar140 nodeos[5779]:               48% complete...
Jan 16 03:58:51 mar140 nodeos[5779]:               58% complete...
Jan 16 03:58:52 mar140 nodeos[5779]:               68% complete...
Jan 16 03:58:53 mar140 nodeos[5779]:               79% complete...
Jan 16 03:58:54 mar140 nodeos[5779]:               89% complete...
Jan 16 03:58:54 mar140 nodeos[5779]:            Syncing buffers...
Jan 16 03:59:12 mar140 nodeos[5779]:            Complete
Jan 16 03:59:25 mar140 nodeos[5779]: corrupted size vs. prev_size
Jan 16 03:59:25 mar140 systemd[1]: nodeos.service: Main process exited, code=killed, status=6/ABRT
@matthewdarwin
Copy link
Author

Note: the database is not corrupted and works fine on next start, so this issue is of the "annoyance" type rather than a serious problem that is corrupting data.

@matthewdarwin
Copy link
Author

The pre-built eosio_2.0.0-1-ubuntu-18.04_amd64.deb works fine and does not crash

Prebuilt:

ldd /usr/bin/nodeos
        linux-vdso.so.1 (0x00007fff10689000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fceb3bea000)
        libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007fceb3b58000)
        libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007fceb386f000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fceb3651000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007fceb35ce000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fceb35ad000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fceb35a1000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fceb341e000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007fceb3404000)
        libicuuc.so.60 => /lib/x86_64-linux-gnu/libicuuc.so.60 (0x00007fceb304d000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fceb3033000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fceb2e72000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fceb6f89000)
        libicudata.so.60 => /lib/x86_64-linux-gnu/libicudata.so.60 (0x00007fceb12c7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fceb1143000)

My build:

$ ldd /usr/bin/nodeos
        linux-vdso.so.1 (0x00007ffd8b3f8000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f84c12cf000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f84c12ae000)
        libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f84c1282000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f84c1064000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f84c0ee1000)
        libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f84c0e4f000)
        libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f84c0b64000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f84c0b5a000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f84c0ad7000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f84c0abd000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f84c08fc000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f84c12da000)

(quite different)

My build script:

./scripts/eosio_build.sh -s EOS -P -y -i xxxx

on

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
uname -a
Linux build 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 x86_64 x86_64 GNU/Linux

@matthewdarwin
Copy link
Author

(ubuntu build server is running inside a debian lxc container)

@matthewdarwin
Copy link
Author

After further testing, it is not exactly consistent as to when the problem happens. Sometimes my custom compiled binary exists cleanly. So probably need to repeatedly start/stop nodeos to reproduce the problem.

@heifner
Copy link
Contributor

heifner commented Jan 24, 2020

I have one report of someone getting this error when not running database-map-mode = locked

@xebb82
Copy link

xebb82 commented Jan 24, 2020

I have one report of someone getting this error when not running database-map-mode = locked

On three different machines.
One of them in lxc and two directly on the host.

All three after the nodeos instances was restarted because they didn't sync blocks anymore.

@matthewdarwin
Copy link
Author

My issues happened on just a normal nodeos restart, ie, I wasn't restarting nodeos to fix some problem like what Eric is reporting, but rather to update configuration or something.

@matthewdarwin
Copy link
Author

I have also seen different signals. eg also SEGV.

@n8d
Copy link

n8d commented Jan 26, 2020

I have seen this error on 1.8 as well, and with default mapped map mode. I should be able to get a core file.

@n8d
Copy link

n8d commented Jan 26, 2020

Here is a backtrace from 1.8.9:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/nodeos --data-dir /data/eos/main --config-dir /etc/nodeos --config con'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Thread 1 (Thread 0x7fa7d355d980 (LWP 24973)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fa7d11ee801 in __GI_abort () at abort.c:79
#2  0x00007fa7d1237897 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fa7d1364b9a "%s\n")
    at ../sysdeps/posix/libc_fatal.c:181
#3  0x00007fa7d123e90a in malloc_printerr (str=str@entry=0x7fa7d1362c9d "corrupted size vs. prev_size") at malloc.c:5350
#4  0x00007fa7d123eb0c in malloc_consolidate (av=av@entry=0x7f897c000020) at malloc.c:4456
#5  0x00007fa7d124603b in _int_free (have_lock=0, p=<optimized out>, av=0x7f897c000020) at malloc.c:4362
#6  __GI___libc_free (mem=0x7f897c3c4c50) at malloc.c:3124
#7  0x00000000008869f8 in boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<boost::asio::executor_binder<void eosio::http_plugin_impl::handle_http_request<eosio::detail::asio_with_stub_log<websocketpp::transport::asio::basic_socket::endpoint> >(websocketpp::server<eosio::detail::asio_with_stub_log<websocketpp::transport::asio::basic_socket::endpoint> >::connection_ptr)::{lambda()#1}, appbase::execution_priority_queue::executor> >, std::__1::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::__1::allocator<void>*, boost::system::error_code const&, unsigned long) ()
#8  0x000000000045a9ad in boost::asio::detail::scheduler::shutdown() ()
#9  0x000000000045dbc9 in std::__1::__shared_ptr_emplace<boost::asio::io_context, std::__1::allocator<boost::asio::io_context> >::__on_zero_shared() ()
#10 0x00000000004553b6 in appbase::application::exec() ()
#11 0x000000000044a1e1 in main ()```

@spoonincode spoonincode changed the title v2.0.0 will crash on exit with error "corrupted size vs. prev_size" when database-map-mode = locked crash on exit with error "corrupted size vs. prev_size" Jan 26, 2020
@heifner heifner self-assigned this Jan 27, 2020
@matthewdarwin
Copy link
Author

Should be fixed now. Closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants