-
Notifications
You must be signed in to change notification settings - Fork 37
[RFC] Delay dials to relay addresses #57
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so what happens if a peer only has a relay address?
shouldn't we dial that immediately instead of waiting for 2s?
I see the issue but I'd like to discuss this thoroughly first (in an issue). More generally, we need some way to prefer some transports over others. Note: We're using a channel because, technically, we'd like to dial and discover addresses in parallel. However, we don't currently do that. This does make things a bit annoying... but we can always try reading off what's on the channel into an array and then sorting there. We can even, e.g., introduce delays if we have different "preference" tiers. |
Yeah, ideally we are able to discover new addresses and inform the dial process of them during a dial. I did the refactor to move towards that, but the other side (the dht address finder) hasnt caught up yet. This is a hard problem. |
That's the hard part |
Expanding on my previous comment, I think the better (although certainly not perfect) solution here is to:
|
454d440
to
2012f5d
Compare
Ok, I think I pushed something that makes sense, please review. todo:
|
2012f5d
to
19ab72d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't think we really need sorting. Really, we just want to bucket. Also, preempting would be nice...
Here's a quick sketch of what I'm thinking (could probably use some optimizations (also, not tested at all)). Do you think this is too complicated? Probably overkill for what we need now but we'll want something like this eventually.
(also, mixing context stuff and probably buggy as hell)
out := make(chan ma.Multiaddr)
go func() {
// We have a preset number of "tiers"
var pending [NumTiers][]ma.Multiaddr
lastTier := 0
// put enqueues the mutliaddr
put := func(addr ma.Multiaddr) {
tier := getTier(addr)
pending[tier] = append(pending[tier], addr)
}
// get gets the best multiaddr available.
get := func() (ma.Multiaddr, int) {
for i, tier := range pending[:] {
if len(tier) > 0 {
addr := tier[len(tier)-1]
tier[len(tier)-1] = nil
pending[i] = tier[:len(tier)-1]
return addr, i
}
}
}
// Always delay 2 seconds between tiers.
delay := timer.Timeout(time.Second * 2)
defer delay.Stop()
outer:
for {
fill:
for {
select {
case addr, ok := <-addrs:
if !ok {
break outer
}
put(addr)
default:
break fill
}
}
next, tier = get()
// Nothing? Block!
if next == nil {
addr, ok := <-addrs
if !ok {
break outer
}
put(addr)
continue
}
// Jumping a tier?
if tier > lastTier {
// Wait the delay (preempt with new addresses)
select {
case addr, ok := <-addrs:
put(addr)
continue
case <-delay.C:
// So we always get a zero delay, there
// are better ways to deal with this...
delay.Reset(0)
}
}
lastTier = tier
select {
case addr, ok := <-addrs:
put(next)
if !ok {
break outer
}
put(addr)
continue
case out<-addr:
// Always count the timeout since the last dial.
delay.Reset(time.Second * 2)
}
}
// finish sending
for {
next, tier := get()
if next == nil {
return
}
if tier > lastTier {
<-delay.C
delay.Reset(time.Second * 2)
}
tier = lastTier
out<-next
}
}
cc @diasdavid. Something something relay dialing logic. |
19ab72d
to
3699629
Compare
Refactored to use buckets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is crazy complicated...
dial_delay.go
Outdated
const p_circuit = 290 | ||
|
||
const numTiers = 2 | ||
const tierDelay = 2 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make it just 1s? 2s might already be too much for a dial.
dial_delay.go
Outdated
} | ||
} | ||
|
||
next, tier := get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with the current setup we don't ever reach this point in this loop as the channel is closed by now. All sending will happen in the loop in L127. I'll add some tests later today to test that it works.
With a test, LGTM. You even covered the "out of addresses" case. Nice! (I kind of expected you to tell me to tell me my idea was way too complicated and to take a hike...) |
dial_delay.go
Outdated
delay := time.NewTimer(tierDelay) | ||
triggerNext := make(chan struct{}, 1) | ||
|
||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have this broken out into a separate function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
dial_delay.go
Outdated
} | ||
|
||
// get gets the best (lowest tier) multiaddr available | ||
get := func() (ma.Multiaddr, int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth noting that we're using a stack within each tier. Might not be the best approach (though definitely simpler and cheaper to implement)
e2b10f6
to
ef507ae
Compare
|
ef507ae
to
35bbd29
Compare
I would also like @diasdavid to review this. This is a libp2p design decision I think he should have a say in. |
dial_delay.go
Outdated
var pending [numTiers][]ma.Multiaddr | ||
lastTier := -1 | ||
|
||
// put enqueues the mutliaddr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mutliaddr/multiaddr/
dial_delay.go
Outdated
} | ||
put(addr) | ||
default: | ||
break loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Given that we have a function here, I'd just move the return true
statement here and get rid of the named break.
35bbd29
to
4757f91
Compare
2dafe20
to
a945f49
Compare
// swarm and agrees on the | | ||
// muxer to use /---\ | ||
// The result is distributed -> ||||| | ||
// to callers of Dial ||||| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien after the transport refactor only the part after 'dialConnSetup' changed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
dial_delay.go
Outdated
|
||
var TierDelay = 1 * time.Second | ||
|
||
var relay = mafmt.Or(mafmt.And(mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS)), mafmt.And(mafmt.Base(ma.P_IPFS), mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could have the transports have some way to indicate that they should be delayed, possibly an optional interface like
type ExpensiveTransport interface {
TransportCost() int //basically the 'tier' to use here
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be aware of potential problems with changes in the circuit addressing here -- cf libp2p/specs#72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might break with the changes in libp2p/go-libp2p-circuit#48
We should drop the trailing /ipfs part from the filter and just rely on the presence of /p2p-circuit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also problematic for explicit relay addresses.
per @diasdavid's suggestion should we close this temporarily and move to an RFC? edit: i actually mean move it to a discussion and specification session in specs? |
I really can't remember what this was blocked on other than "holy shit this is complicated". |
Reopening this, as it's a high priority issue. |
So where are we blocked here? |
No, but seriously, IIRC the core logic here is done, one thing that would be nice would be to have some method in the transport interface saying that the transport is lower tier so we don't hardcode relay here - https://github.com/libp2p/go-libp2p-swarm/pull/57/files#diff-46c472072de3b864e2fabb74ab784758R202, but I think this can by done separate to this PR |
dial_delay.go
Outdated
mafmt "github.com/whyrusleeping/mafmt" | ||
) | ||
|
||
const p_circuit = 290 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably capitalize this to P_CIRCUIT
-- that's what we do everywhere else.
dial_delay.go
Outdated
|
||
var TierDelay = 1 * time.Second | ||
|
||
var relay = mafmt.Or(mafmt.And(mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS)), mafmt.And(mafmt.Base(ma.P_IPFS), mafmt.Base(p_circuit), mafmt.Base(ma.P_IPFS))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be aware of potential problems with changes in the circuit addressing here -- cf libp2p/specs#72
it's still quite complicated, but it looks ok; will reread in the morning. @Stebalien care for a refresher review? Let's move it forward. |
Just the presence of /p2p-circuit in the addr is enough to warrant delayed dialing. This accepts both explicit relay addresses and is also future-proof for changes in go-libp2p-circuit#48 and specs#72.
|
This is a somewhat naive way to do this, but I can't think of an easy way to make this smarter without making this ugly in some way.
One way this could be improved would be to switch from chan to an array in
dialAddrs
, I just wasn't sure what is the reason behind it being a channel, so I decided to not touch that for now.