-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove use of lock in cancelRequests #26
Conversation
CI is complaining about the data race, I will swap the map to be concurrent safe |
I would prefer not to introduce |
Upon further reflection, you did say the race comes in via If that's correct, then I suspect there's an unforeseen interaction revealed by #24. If a handler responding to a notification attempts to cancel another request, it could deadlock with the next batch waiting for that same notification to complete (during which time it holds the server lock. If that sounds related, I will see if I can put together a repro based on that suspicion. |
@creachadair Yes I thought I had traced the deadlock as we spoke, but I'm seeing our LSP is calling |
If that is the cause, I still agree that it's a bug in jrpc2. Even if we can find a workaround to get you unstuck, I'd like to find a more robust solution too, if it's practical. |
I was able to build a repro for a deadlock around this. Specifically, here's the problematic sequence:
Step (3) attempts to acquire the server lock, and deadlocks with the dispatcher. The crux is a notification handler that attempts to cancel other requests in flight. This problem was made possible by #24. I believe the solution is for the dispatcher to yield the lock while it waits for previous notifications to settle. I will verify that this actually solves the problem and add some tests to defend against regression. |
I think I 80% follow but I only started looking at all of this today (so I'm still getting up to speed on how this works). Any chance you could file a tracking issue with your fantastic contextual write up and I will close this PR. |
Done: #27, and thank you for reporting it. I will send a PR shortly with the fix and regression tests. |
Found via ad hoc testing/repro. We are able to trigger a race where the lock acquired in the
cancelRequests
handler is deadlocked. The reason for deadlock is not fully traced, but from what I can tell,s.mu.Lock
is not needed when callings.cancel
.This PR may need to createI will make it concurrent safe lookup of thes.used
map with https://golang.org/pkg/sync/#Map.I am new to this codebase so as I said I'm unsure why the code deadlocks (I wish I could offer concrete repro steps, but it involves the terraform language server/vscode plugin and spamming format on save requests), it seems like the lock is acquired in
nextRequest
but not freed by the time it hitscancelRequests
.Perhaps something about these lines?
https://github.com/creachadair/jrpc2/blob/master/server.go#L218-L222
But I think it's possible calling the lock in a request handler (given a lock is acquired in
nextRequest
) is a mistake (which is why I removed it in this PR)?Relates hashicorp/terraform-ls#258