-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed fix for deadlock in globalsign/mgo#120 #121
Conversation
Thanks a load! Having a reproducible test case makes this 1000x easier. Would you mind retargeting to development and we'll merge this. Dom |
Sorry it's Monday - fix included! Amazing work - thanks very much - I thought I was going to spend today digging through the mgo code! I'll get this merged and tested. Again, thanks! |
We've seen a deadlock happen occasionally where syncServers needs to acquire a socket to call isMaster, but the socket acquisition needs to know the server topology which isn't known yet. See #120 issue for a detailed breakdown. This replicates the issue by setting up a mongo "server" which closes sockets as soon as they're opened; about 20% of the time, this will trigger the deadlock because the acquired socket for ismaster() dies and needs to be reacquired.
As discussed in the issue #120, isMaster() can cause a deadlock with the topology scanner if the connection it makes dies before running the command; mgo automagically attempts to make another socket in acquireSocket, but this can't work without topology. This commit forces isMaster() to actually run on the intended socket.
Sorry, forgot about that. I've rebased - PTAL. |
session.go
Outdated
// RunOnSocket does the same as Run, but guarantees that your command will be run | ||
// on the provided socket instance; if it's unhealthy, you will receive the error | ||
// from it. | ||
func (db *Database) RunOnSocket(socket *mongoSocket, cmd interface{}, result interface{}) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that accepting a package level parameter in a public method is a good idea.
There is no way this can be used outside of the mgo package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KJTsanaktsidis maybe make a private function instead of a public method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can definitely do this - don’t think I’ll get to it for a couple of days though, sorry :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KJTsanaktsidis you could merge https://github.com/zendesk/mgo/pull/1 if you agree with the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @dvic - I merged your change into this PR. Sorry this took so long for me to get to :/
Make run on socket helper methods private
Proposed fix for deadlock in globalsign#120
As discussed in the issue #120, isMaster() can cause a deadlock with the topology scanner if the connection it makes dies before running the command; mgo automagically attempts to make another socket in acquireSocket, but this can't work without topology.
This PR provides a test that fails ~20% of the time on my machine with the deadlock identified in the linked issue. It also proposes a fix which is to force isMaster to run its commands on the socket it sets on the session.