-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use event bus for monitoring peer connections and protocol updates #536
Conversation
Signed-off-by: gfanton <8671905+gfanton@users.noreply.github.com>
Some background if it helps: We discovered and created a fix for this issue while working on this simple example of our networking library that is built on libp2p. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this pr.
This is something that we have had to do for a while, so it is a step on the right direction.
At first glance it looks good, but I will do a second more thorough pass to make sure we don't break stuff.
pubsub.go
Outdated
|
||
ps.val.Start(ps) | ||
|
||
go ps.processLoop(ctx) | ||
|
||
(*PubSubNotif)(ps).Initialize() | ||
if err := (*PubSubNotif)(ps).startMonitoring(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this check before spawning the processLoop, as if it fails the system will not work at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thumbs up. Do we need to address any other issues with this PR?
Signed-off-by: Jeff Thompson <jeff@thefirst.org>
e4c5794
to
7aeb7da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, left some comments requiring changes.
func (p *PubSubNotif) Initialize() { | ||
isTransient := func(pid peer.ID) bool { | ||
for _, c := range p.host.Network().ConnsToPeer(pid) { | ||
if !c.Stat().Transient { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to keep this code around; please don't remove it and run it right after initializing the bus and before starting the monitoring goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the initialization code is still there (below), but I believe it is racey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments below.
notify.go
Outdated
if evt.Connectedness == network.Connected { | ||
go p.AddPeers(evt.Peer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should process the disconnect events as well and get rid of the nasty code that handles current peer disconnections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest commit handles the NotConnected event and calls RemovePeers. How does it look?
|
||
ps.val.Start(ps) | ||
|
||
go ps.processLoop(ctx) | ||
|
||
(*PubSubNotif)(ps).Initialize() | ||
// add current peers to notify system | ||
notify.AddPeers(h.Network().Peers()...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, it is still here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is racey; it needs to happen inside startMonitoring
, after we have initialized the bus but before we have spawned the monitoring goroutine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest commit, the following test passes:
go test -v -run=TestSimpleDiscovery -count=1 .
As you suggest, we could move AddPeers
to inside startMonitoring
before the goroutine as p.AddPeers(p.host.Network().Peers()...)
. But with this change, TestSimpleDiscovery
hangs and times out. What do you think?
func (p *PubSubNotif) Initialize() { | ||
isTransient := func(pid peer.ID) bool { | ||
for _, c := range p.host.Network().ConnsToPeer(pid) { | ||
if !c.Stat().Transient { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the initialization code is still there (below), but I believe it is racey.
func (p *PubSubNotif) Initialize() { | ||
isTransient := func(pid peer.ID) bool { | ||
for _, c := range p.host.Network().ConnsToPeer(pid) { | ||
if !c.Stat().Transient { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments below.
Also set eventbus.Name to "libp2p/pubsub/notify". Signed-off-by: Jeff Thompson <jeff@thefirst.org>
Hello @vyzo . You suggested moving |
Uhm, fix the deadlock? Maybe add an extra goroutine for startup. |
This fixes a bug where another peer is not added to a topic after subscribing. This bug happens, for example, when a discovery system (like MDNS) is set up before pubsub initialization. Even though the peers have been discovered, they are not added to the new pubsub topics.
The bug can be seen by running TestNotifyPeerProtocolsUpdated in the master branch:
It prints "topic1 should at least have 1 peer". In our code we had to workaround this by disabling discovery (MDNS) and deferring discovery until after the pubsub protocols are registered. However, this pull request fixes the problem so that a workaround is not needed. It uses the event bus to monitor for peer connections and protocol updates. When
hosts[1]
joinstopic1
, the other peer is automatically added and the test passes.