Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in trace_api_plugin shutdown #590

Closed
heifner opened this issue Jul 2, 2022 · 0 comments
Closed

Fix race condition in trace_api_plugin shutdown #590

heifner opened this issue Jul 2, 2022 · 0 comments
Assignees
Labels
3.1 RC2 OCI OCI working this issue...
Milestone

Comments

@heifner
Copy link
Member

heifner commented Jul 2, 2022

Failure https://github.com/eosnetworkfoundation/mandel/runs/7143297502?check_suite_focus=true points to an issue in shutdown of the trace_api_plugin.

slice_directory::stop_maintenance_thread() sets atomic bool _maintenance_shutdown = true and calls notify_one() on the _maintenance_condition condition variable. This is a race condition because it doesn't acquire the condition variable mutex setting up the possibility that _maintenance_shutdown can be set to true and notify_one called after the while check in slice_directory::start_maintenance_thread but before the wait() causing it to wait forever. This then blocks the slice_directory::stop_maintenance_thread() call of join() on the main thread, blocking the shutdown of all other plugins.

debug 2022-07-01T03:23:35.304 nodeos    net_plugin.cpp:2705           update_chain_info    ] updating chain info lib 90, head 138, fork 138
info  2022-07-01T03:23:35.304 nodeos    resource_monitor_plugi:122    plugin_shutdown      ] shutdown...
debug 2022-07-01T03:23:35.304 net-0     net_plugin.cpp:3195           operator()           ] accepted signed_block : #138 23fb4fff5f2c0077...
debug 2022-07-01T03:23:35.304 net-1     net_plugin.cpp:2152           operator()           ] ["localhost:9876 - 465c8f8" - 3 127.0.0.1:9876] bcast block 138
debug 2022-07-01T03:23:35.304 net-1     net_plugin.cpp:2173           recv_block           ] ["localhost:9876 - 465c8f8" - 3 127.0.0.1:9876] canceling wait
debug 2022-07-01T03:23:35.304 net-1     net_plugin.cpp:1993           sync_recv_block      ] ["localhost:9876 - 465c8f8" - 3 127.0.0.1:9876] got block 138
info  2022-07-01T03:23:35.304 net-1     net_plugin.cpp:1016           _close               ] ["localhost:9876 - 465c8f8" - 3 127.0.0.1:9876] closing
info  2022-07-01T03:23:35.304 net-1     net_plugin.cpp:1227           operator()           ] ["localhost:9876 - 465c8f8" - 3 127.0.0.1:9876] async write socket closed before callback
info  2022-07-01T03:23:35.304 net-1     net_plugin.cpp:1016           _close               ] ["localhost:9876 - 465c8f8" - 3 127.0.0.1:9876] closing
info  2022-07-01T03:23:35.305 nodeos    resource_monitor_plugi:129    plugin_shutdown      ] exit shutdown
info  2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3772           plugin_shutdown      ] shutdown..
info  2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3789           plugin_shutdown      ] close 5 connections
debug 2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3792           plugin_shutdown      ] close: 1
debug 2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3792           plugin_shutdown      ] close: 2
debug 2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3792           plugin_shutdown      ] close: 3
debug 2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3792           plugin_shutdown      ] close: 4
debug 2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3792           plugin_shutdown      ] close: 5
info  2022-07-01T03:23:35.305 net-1     net_plugin.cpp:1016           _close               ] ["localhost:9876 - 465c8f8" - 1 127.0.0.1:53368] closing
info  2022-07-01T03:23:35.305 nodeos    net_plugin.cpp:3809           plugin_shutdown      ] exit shutdown
info  2022-07-01T03:23:35.305 nodeos    test_control_plugin.cp:139    plugin_shutdown      ] test_control_plugin shutting down
debug 2022-07-01T03:23:35.305 trace-mx  trace_api_plugin.cpp:155      operator()           ] Waking up to handle lib: 90
@heifner heifner added the 3.1 RC2 label Jul 2, 2022
@heifner heifner moved this to Todo in ENF Engineering Jul 2, 2022
@heifner heifner added this to the Mandel 3.1.0 milestone Jul 2, 2022
@heifner heifner self-assigned this Jul 2, 2022
@heifner heifner added the OCI OCI working this issue... label Jul 2, 2022
heifner added a commit that referenced this issue Jul 2, 2022
@heifner heifner added OCI OCI working this issue... 3.1 RC2 and removed OCI OCI working this issue... 3.1 RC2 labels Jul 2, 2022
heifner added a commit that referenced this issue Jul 2, 2022
[3.1] Fix race condition on trace_api_plugin shutdown
heifner added a commit that referenced this issue Jul 2, 2022
heifner added a commit that referenced this issue Jul 5, 2022
…e-3-1

[3.1] Fix race condition on trace_api_plugin shutdown
@heifner heifner closed this as completed Jul 5, 2022
Repository owner moved this from Todo to Done in ENF Engineering Jul 5, 2022
heifner added a commit that referenced this issue Jul 5, 2022
[3.1 -> main] Fix race condition on trace_api_plugin shutdown
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.1 RC2 OCI OCI working this issue...
Projects
Status: Done
Development

No branches or pull requests

1 participant