Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.1] Fix race condition on trace_api_plugin shutdown #592

Merged
merged 1 commit into from
Jul 5, 2022

Conversation

heifner
Copy link
Member

@heifner heifner commented Jul 2, 2022

Failure https://github.com/eosnetworkfoundation/mandel/runs/7143297502?check_suite_focus=true pointed to an issue in shutdown of the trace_api_plugin.

slice_directory::stop_maintenance_thread() sets atomic bool _maintenance_shutdown = true and calls notify_one() on the _maintenance_condition condition variable. This is a race condition because it doesn't acquire the condition variable mutex setting up the possibility that _maintenance_shutdown can be set to true and notify_one called after the while check in slice_directory::start_maintenance_thread but before the wait() causing it to wait forever. This then blocks the slice_directory::stop_maintenance_thread() call of join() on the main thread, blocking the shutdown of all other plugins.

Changed the logic to correctly use a mutex and condition variable around _best_known_lib & _maintenance_shutdown. Also made these two variables non-atomic since they are now only accessed via the mutex. Also release the mutex before running the run_maintenance_tasks() which appeared to be the original intention with the use of the local variables.

Accidentally targeted main on #591 so this is a cherry-pick back to 3.1.

@heifner heifner added 3.1 RC2 OCI OCI working this issue... labels Jul 2, 2022
@linh2931
Copy link
Member

linh2931 commented Jul 2, 2022

Sorry I did not notice the wrong branch on #591 either. But the line under the title of this PR still says heifner wants to merge 1 commit into main from GH-590-shutdown-race-3-1, not into 3.1.

@heifner heifner changed the base branch from main to release/3.1.x July 2, 2022 21:31
@heifner
Copy link
Member Author

heifner commented Jul 2, 2022

Sorry I did not notice the wrong branch on #591 either. But the line under the title of this PR still says heifner wants to merge 1 commit into main from GH-590-shutdown-race-3-1, not into 3.1.

Can't believe I did it again. Thanks.

@spoonincode
Copy link
Member

I think may still want to make a merge PR for this?

@heifner
Copy link
Member Author

heifner commented Jul 5, 2022

I think may still want to make a merge PR for this?

I accidentally did that first: #591

@heifner heifner merged commit bfbc551 into release/3.1.x Jul 5, 2022
@heifner heifner deleted the GH-590-shutdown-race-3-1 branch July 5, 2022 17:21
@spoonincode
Copy link
Member

Right, if you don't do a (noop) merge won't it leave a "gap" where we have a PR that wasn't merged?

@heifner
Copy link
Member Author

heifner commented Jul 5, 2022

Right, if you don't do a (noop) merge won't it leave a "gap" where we have a PR that wasn't merged?

Created #604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.1 RC2 OCI OCI working this issue...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants