Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notification handlers can deadlock with the dispatch loop #27

Closed
creachadair opened this issue Aug 13, 2020 · 0 comments · Fixed by #28
Closed

Notification handlers can deadlock with the dispatch loop #27

creachadair opened this issue Aug 13, 2020 · 0 comments · Fixed by #28
Assignees
Labels
bug Something isn't working

Comments

@creachadair
Copy link
Owner

creachadair commented Aug 13, 2020

This issue tracks the bug reported by @appilon in #26.

In hashicorp/terraform-ls#258 they observed a deadlock in the server, which was traced to a notification handler attempting to cancel another request in-flight.

Hypothesis: If a handler responding to a notification attempts to cancel another request, it could deadlock with the next batch waiting for that same notification to complete (during which time it holds the server lock).

I was able to build a repro for this hypothesis. Specifically, here's the problematic sequence:

  1. A notification (N) arrives and is dispatched to its handler.
  2. While N is busy doing other work, the dispatcher locks to wait for notifications to clear.
  3. N invokes jrpc2.CancelRequest.

Step (3) attempts to acquire the server lock, and deadlocks with the dispatcher. The key factor is a notification handler that attempts to cancel other requests in flight. This problem was made possible by #24. The solution is for the dispatcher to yield the lock while it waits for previous notifications to settle.

@creachadair creachadair added the bug Something isn't working label Aug 13, 2020
@creachadair creachadair self-assigned this Aug 13, 2020
creachadair added a commit that referenced this issue Aug 13, 2020
Fixes #27. Since #24, the server holds its lock during dispatch to wait for
previously-issued notifications to settle. This is necessary to ensure a
sensible order of operations; however, it interacts badly with a notification
handler that uses the jrpc2.CancelRequest helper: That function itself acquires
the server lock, and the two (may) deadlock.

To avert this problem, wait on the notification barrier outside the lock.
Add a regression test against the original bug.
creachadair added a commit that referenced this issue Aug 14, 2020
Fixes #27. Since #24, the server holds its lock during dispatch to wait for
previously-issued notifications to settle. This is necessary to ensure a
sensible order of operations; however, it interacts badly with a notification
handler that uses the jrpc2.CancelRequest helper: That function itself acquires
the server lock, and the two (may) deadlock.

To avert this problem, wait on the notification barrier outside the lock.
Add a regression test against the original bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant