Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module loader: Avoid deadlock #396

Merged
merged 1 commit into from
Feb 5, 2021
Merged

module loader: Avoid deadlock #396

merged 1 commit into from
Feb 5, 2021

Conversation

radeksimko
Copy link
Member

@radeksimko radeksimko commented Feb 4, 2021

This fixes a number of bugs I found while testing this both manually in VS Code and by running tests with high enough -count.

  1. Deadlock could occur when retry of dispatch is attempted, because while run() is executing, another goroutine (e.g. one executing a different operation) could have finished and triggered another tryDispatchingModuleOp which then filled the channel and the next tryDispatchingModuleOp triggered from run() as a result of retry would block, therefore blocking the main loop.
    • This was resolved by running retry tryDispatchingModuleOp in an additional goroutine
  2. tryDispatchingModuleOp can get triggered by many threads at the same time and its original implementation left an opportunity for race condition due to queue.Peek() relying on queue.Len() but the reliance was not guaranteed in any way. Additionally Peek could have "peeked" at the exact same operation more than once, which then resulted in that duplicate operation being dispatched.
    • This was resolved by getting rid of Peek method entirely and ensuring we never pop from an empty queue by utilizing the internal mutex. Retries should be rare enough that (especially with the 100ms delay) shouldn't cause concern wrt resource consumption by repeated push/pop.
  3. tryDispatchingModuleOp was triggered from inside of executeModuleOp, but loading counters were being decremented via defer after, which could've caused the dispatch attempt to fail when the queue was full, just because the two operations were done in the wrong order.
    • This was resolved by getting rid of defer and calling tryDispatchingModuleOp from the main loop in run() where we can better control the order.

@radeksimko radeksimko added the bug Something isn't working label Feb 4, 2021
@radeksimko radeksimko changed the title module loader: Fix channel buffer size module loader: Avoid deadlock Feb 4, 2021
@radeksimko radeksimko force-pushed the b-fix-chan-buffer branch 2 times, most recently from cc030d9 to 81e7395 Compare February 4, 2021 10:36
@radeksimko radeksimko marked this pull request as draft February 4, 2021 10:41
@radeksimko radeksimko force-pushed the b-fix-chan-buffer branch 3 times, most recently from e385ec8 to 892ce95 Compare February 4, 2021 20:25
@radeksimko radeksimko marked this pull request as ready for review February 4, 2021 20:46
@radeksimko radeksimko requested a review from a team February 4, 2021 21:01
Copy link
Contributor

@aeschright aeschright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@radeksimko radeksimko merged commit fdfe23d into main Feb 5, 2021
@radeksimko radeksimko deleted the b-fix-chan-buffer branch February 5, 2021 05:43
@ghost
Copy link

ghost commented Mar 7, 2021

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the context necessary to investigate further.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants