You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue tracks the bug reported by @appilon in #26.
In hashicorp/terraform-ls#258 they observed a deadlock in the server, which was traced to a notification handler attempting to cancel another request in-flight.
Hypothesis: If a handler responding to a notification attempts to cancel another request, it could deadlock with the next batch waiting for that same notification to complete (during which time it holds the server lock).
I was able to build a repro for this hypothesis. Specifically, here's the problematic sequence:
A notification (N) arrives and is dispatched to its handler.
While N is busy doing other work, the dispatcher locks to wait for notifications to clear.
N invokes jrpc2.CancelRequest.
Step (3) attempts to acquire the server lock, and deadlocks with the dispatcher. The key factor is a notification handler that attempts to cancel other requests in flight. This problem was made possible by #24. The solution is for the dispatcher to yield the lock while it waits for previous notifications to settle.
The text was updated successfully, but these errors were encountered:
Fixes#27. Since #24, the server holds its lock during dispatch to wait for
previously-issued notifications to settle. This is necessary to ensure a
sensible order of operations; however, it interacts badly with a notification
handler that uses the jrpc2.CancelRequest helper: That function itself acquires
the server lock, and the two (may) deadlock.
To avert this problem, wait on the notification barrier outside the lock.
Add a regression test against the original bug.
Fixes#27. Since #24, the server holds its lock during dispatch to wait for
previously-issued notifications to settle. This is necessary to ensure a
sensible order of operations; however, it interacts badly with a notification
handler that uses the jrpc2.CancelRequest helper: That function itself acquires
the server lock, and the two (may) deadlock.
To avert this problem, wait on the notification barrier outside the lock.
Add a regression test against the original bug.
This issue tracks the bug reported by @appilon in #26.
In hashicorp/terraform-ls#258 they observed a deadlock in the server, which was traced to a notification handler attempting to cancel another request in-flight.
Hypothesis: If a handler responding to a notification attempts to cancel another request, it could deadlock with the next batch waiting for that same notification to complete (during which time it holds the server lock).
I was able to build a repro for this hypothesis. Specifically, here's the problematic sequence:
jrpc2.CancelRequest
.Step (3) attempts to acquire the server lock, and deadlocks with the dispatcher. The key factor is a notification handler that attempts to cancel other requests in flight. This problem was made possible by #24. The solution is for the dispatcher to yield the lock while it waits for previous notifications to settle.
The text was updated successfully, but these errors were encountered: