-
Notifications
You must be signed in to change notification settings - Fork 37
clear out extra dial jobs after dial finishes #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we were leaking dial jobs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't fix the bug we discussed but may help a bit (maybe).
@Stebalien how does it not solve the problem? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look closely enough. This does fix the dial issue (although I'd still like to simplify all this code and remove the numPeersDialing^2
operation).
limiter.go
Outdated
|
||
arr := dl.waitingOnFd[:0] | ||
for _, j := range dl.waitingOnFd { | ||
if j.peer != p { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are currently considered "active". We'll need to decrement the activePerPeer counter for each one we remove.
887a5d4
to
2c697ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this isn't buggy but still isn't perfect. However, it's better than nothing.
delete(dl.waitingOnPeerLimit, p) | ||
// NB: the waitingOnFd list doesnt need to be cleaned out here, we will | ||
// remove them as we encounter them because they are 'cancelled' at this | ||
// point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the waitingOnPeerLimit
list. The problem is that it could be a while before that happens if we're really backed up behind a bunch of slow dials. It won't be quite as bad because it's limited by dials*perPeerLimit
but it will still be large and can still grow in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien Yeah, the tradeoff we make is the O(n) operation under a held lock after completion vs this potential extra memory buffering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be solved with a doubly linked list (wait queues like this are really one of the only uses for them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great point. We never really need to iterate over it, so that makes perfect sense.
So the final thing I need to do here is add metrics as @lgierth requested |
alright, so it turns out we don't have things set up yet to nicely add metrics from go-libp2p. I'm thinking that we should push to get this merged sooner, and work out the metrics later |
Agreed |
This really wouldnt be an issue if we didnt have tens of thousands of addresses hanging around for some reason