Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on exit with corrupted size vs. prev_size; possibly http_plugin related #646

Closed
Tracked by #1440
spoonincode opened this issue Jan 18, 2023 · 4 comments
Closed
Tracked by #1440
Assignees
Labels
actionable bug Something isn't working 👍 lgtm OCI Work exclusive to OCI team

Comments

@spoonincode
Copy link
Member

This is a transfer of eosnetworkfoundation/mandel#796 originally authored by @heifner and possibly related to eosnetworkfoundation/mandel#789 (which I've not included the text for here) authored by @matthewdarwin. It is being moved to leap to increase visibility. I asked around and there doesn't seem to be any existing fix that targeted this problem.

Version main 3.2.x. Also reported in 3.1.x & 2.0.x & 2.1.x. Although generic message may point to different issues over time.

info  2022-08-09T14:00:57.388 net-1     net_plugin.cpp:1016           _close               ] [\"xxx:9876 - e1a715e\" - 2 1.1.1.1:9876] closing
info  2022-08-09T14:00:57.389 nodeos    net_plugin.cpp:3809           plugin_shutdown      ] exit shutdown
CHAINBASE: Writing \"state\" database file, this could take a moment...
              1% complete...
              5% complete...
              8% complete...
              12% complete...
              15% complete...
              18% complete...
              22% complete...
              26% complete...
              29% complete...
              32% complete...
              35% complete...
              39% complete...
              42% complete...
              46% complete...
              49% complete...
              53% complete...
              56% complete...
              59% complete...
              62% complete...
              65% complete...
              69% complete...
              72% complete...
              76% complete...
              80% complete...
              85% complete...
              89% complete...
              93% complete...
              97% complete...
           Syncing buffers...
           Complete
corrupted size vs. prev_size
[1]    545250 abort (core dumped)  /usr/bin/nodeos --config-dir /etc/nodeos -d /var/lib/nodeos

Thread dump:

Reading symbols from /usr/bin/nodeos...
(No debugging symbols found in /usr/bin/nodeos)
[New LWP 545250]
[New LWP 545251]
[Thread debugging using libthread_db enabled]
Using host libthread_db library \"/lib/x86_64-linux-gnu/libthread_db.so.1\".
Core was generated by `/usr/bin/nodeos --config-dir /etc/nodeos -d /var/lib/nodeos'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x7f7fd1401840 (LWP 545250))]
\u001B[?2004h(gdb) \u001B[7mthread apply all where\u001B[27m
\u001B[C\u001B[C\u001B[C\u001B[C\u001B[C\u001B[Cthread apply all where
\u001B[?2004l
Thread 2 (Thread 0x7f7fd1400700 (LWP 545251)):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x5555b3238b28) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5555b3238ac8, cond=0x5555b3238b00) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x5555b3238b00, mutex=0x5555b3238ac8) at pthread_cond_wait.c:638
#3  0x00005555af6441d3 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
#4  0x00005555af643e11 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
#5  0x00005555af643bde in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, appbase::application_impl::application_impl()::{lambda()#1}> >(void*) ()
#6  0x00007f7fd173eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f7fd1503def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f7fd1401840 (LWP 545250)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f7fd142b537 in __GI_abort () at abort.c:79
#2  0x00007f7fd1484768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f7fd1592e2d \"%s\\n\") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f7fd148ba5a in malloc_printerr (str=str@entry=0x7f7fd1591020 \"corrupted size vs. prev_size\") at malloc.c:5347
#4  0x00007f7fd148c7a6 in unlink_chunk (p=p@entry=0x7f7d44001170, av=0x7f7d44000020) at malloc.c:1454
#5  0x00007f7fd148c8f7 in malloc_consolidate (av=av@entry=0x7f7d44000020) at malloc.c:4502
#6  0x00007f7fd148d0c0 in _int_free (av=0x7f7d44000020, p=0x7f7d4400b5f0, have_lock=<optimized out>) at malloc.c:4400
#7  0x00005555afcb159c in eosio::http_plugin_impl::make_app_thread_url_handler(int, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)>, std::__1::shared_ptr<eosio::http_plugin_impl>)::{lambda(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)#1}::operator()(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>) const::{lambda()#1}::~shared_ptr() ()
#8  0x00005555afcb1d25 in boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<boost::asio::executor_binder<eosio::http_plugin_impl::make_app_thread_url_handler(int, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)>, std::__1::shared_ptr<eosio::http_plugin_impl>)::{lambda(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)#1}::operator()(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>) const::{lambda()#1}, appbase::execution_priority_queue::executor> >, std::__1::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
#9  0x00005555af646975 in boost::asio::detail::scheduler::shutdown() ()
#10 0x00005555af6492b9 in std::__1::__shared_ptr_emplace<boost::asio::io_context, std::__1::allocator<boost::asio::io_context> >::__on_zero_shared() ()
#11 0x00005555af63f8a4 in appbase::application::exec() ()
#12 0x00005555af632f40 in main ()
\u001B[?2004h(gdb) quit

Maybe related: EOSIO/eos#8450

Quick glance, looks like http plugin url_handlers iterator in use after url_handlers.clear() in http_plugin::plugin_shutdown. See handle_http_request use of iterator into url_handlers.

Note http rewrite currently in work: eosnetworkfoundation/mandel#675 Should verify any fix is also applied to this if appropriate.
I think it is also worth fixing in 3.1 which will not have #675.

@enf-ci-bot enf-ci-bot moved this to Todo in Team Backlog Jan 18, 2023
@heifner heifner added bug Something isn't working actionable and removed triage labels Jan 19, 2023
@stephenpdeos stephenpdeos added more-info waiting for submitter to reply with more information actionable and removed actionable more-info waiting for submitter to reply with more information labels Jan 19, 2023
@heifner
Copy link
Member

heifner commented Jan 26, 2023

See if we can reproduce via spamming with get_info or get_account or get_block while using HEAP mode and shutting down. Also would be interested in seeing if it is reproducible after AntelopeIO/appbase#4 is merged.

@heifner
Copy link
Member

heifner commented Feb 1, 2023

Reproduced in v3.1.0-rc1.

raise 0x00007f7f2a4462ab
eosio::chain::eosvmoc::segv_handler(int, siginfo_t *, void *) executor.cpp:71
__restore_rt 0x00007f7f2a446420
_int_free 0x00007f7f29f4ea2f
__gnu_cxx::new_allocator::deallocate(std::_Sp_counted_ptr_inplace<…> *, unsigned long) new_allocator.h:133
std::allocator_traits::deallocate(std::allocator<…> &, std::_Sp_counted_ptr_inplace<…> *, unsigned long) alloc_traits.h:492
std::__allocated_ptr::~__allocated_ptr() allocated_ptr.h:73
std::_Sp_counted_ptr_inplace::_M_destroy() shared_ptr_base.h:570
std::_Sp_counted_base::_M_release() shared_ptr_base.h:174
std::__shared_count::~__shared_count() shared_ptr_base.h:733
std::__shared_ptr::~__shared_ptr() shared_ptr_base.h:1183
std::shared_ptr::~shared_ptr() shared_ptr.h:121
eosio::http_plugin_impl::abstract_conn_impl::~abstract_conn_impl() http_plugin.cpp:366
__gnu_cxx::new_allocator::destroy<…>(eosio::http_plugin_impl::abstract_conn_impl<…> *) new_allocator.h:156
std::allocator_traits::destroy<…>(std::allocator<…> &, eosio::http_plugin_impl::abstract_conn_impl<…> *) alloc_traits.h:531
std::_Sp_counted_ptr_inplace::_M_dispose() shared_ptr_base.h:560
std::_Sp_counted_base::_M_release() shared_ptr_base.h:158
std::__shared_count::~__shared_count() shared_ptr_base.h:733
std::__shared_ptr::~__shared_ptr() shared_ptr_base.h:1183
std::shared_ptr::~shared_ptr() shared_ptr.h:121
eosio::http_plugin_impl::make_app_thread_url_handler(int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::function<void (int, fc::variant)>)>, std::shared_ptr<eosio::http_plugin_impl>)::'lambda'(std::shared_ptr<eosio::detail::abstract_conn>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::function<void (int, fc::variant)>)::operator()(std::shared_ptr<eosio::detail::abstract_conn>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::function<void (int, fc::variant)>) const::'lambda'()::~() http_plugin.cpp:493
boost::asio::detail::executor_binder_base::~executor_binder_base() bind_executor.hpp:182
boost::asio::executor_binder::~executor_binder() bind_executor.hpp:382
boost::asio::detail::work_dispatcher::~work_dispatcher() work_dispatcher.hpp:30
boost::asio::detail::executor_op::do_complete(void *, boost::asio::detail::scheduler_operation *, const boost::system::error_code &, unsigned long) executor_op.hpp:73
boost::asio::detail::scheduler_operation::destroy() scheduler_operation.hpp:45
boost::asio::detail::scheduler::shutdown() scheduler.ipp:166
boost::asio::detail::service_registry::shutdown_services() service_registry.ipp:44
boost::asio::execution_context::shutdown() execution_context.ipp:41
boost::asio::execution_context::~execution_context() execution_context.ipp:34
boost::asio::io_context::~io_context() io_context.ipp:58
__gnu_cxx::new_allocator::destroy<…>(boost::asio::io_context *) new_allocator.h:156
std::allocator_traits::destroy<…>(std::allocator<…> &, boost::asio::io_context *) alloc_traits.h:531
std::_Sp_counted_ptr_inplace::_M_dispose() shared_ptr_base.h:560
std::_Sp_counted_base::_M_release() shared_ptr_base.h:158
std::__shared_count::~__shared_count() shared_ptr_base.h:733
std::__shared_ptr::~__shared_ptr() shared_ptr_base.h:1183
std::__shared_ptr::reset() shared_ptr_base.h:1301
appbase::application::exec() application.cpp:455
main main.cpp:143
__wrap_main(int, char **) compile_monitor.cpp:303
__libc_start_main 0x00007f7f29edc083
_start 0x000000000047044e

Reproduced without using heap mode.
Loading the core file in the debugger you can see that the segfault is happening in the destructor of the websocketpp::connection.

Unable to reproduce with release/3.2 which has the new beast http implementation. Also unable to reproduce with #636 which I thought might make this more likely because it does not drain the io_context queue.

@heifner
Copy link
Member

heifner commented Feb 1, 2023

Appears to be fixed by: #22

@heifner
Copy link
Member

heifner commented Jul 24, 2023

Cat: API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable bug Something isn't working 👍 lgtm OCI Work exclusive to OCI team
Projects
Archived in project
Development

No branches or pull requests

4 participants