-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/src/transport: Add Transport::dial_as_listener
#2363
Conversation
Allows `NetworkBehaviour` implementations to dial a peer, but instruct the dialed connection to be upgraded as if it were the listening endpoint. This is needed when establishing direct connections through NATs and/or Firewalls (hole punching). When hole punching via TCP (QUIC is different but similar) both ends dial the other at the same time resulting in a simultaneously opened TCP connection. To disambiguate who is the dialer and who the listener there are two options: 1. Use the Simultaneous Open Extension of Multistream Select. See [sim-open] specification and [sim-open-rust] Rust implementation. 2. Disambiguate the role (dialer or listener) based on the role within the DCUtR [dcutr] protocol. More specifically the node initiating the DCUtR process will act as a listener and the other as a dialer. This commit enables (2), i.e. enables the DCUtR protocol to specify the role used once the connection is established. While on the positive side (2) requires one round trip less than (1), on the negative side (2) only works for coordinated simultaneous dials. I.e. when a simultaneous dial happens by chance, and not coordinated via DCUtR, the connection attempt fails when only (2) is in place. [sim-open]: https://github.com/libp2p/specs/blob/master/connections/simopen.md [sim-open-rust]: libp2p#2066 [dcutr]: https://github.com/libp2p/specs/blob/master/relay/DCUtR.md
core/src/connection.rs
Outdated
/// thus have one peer dial the other as a dialer and one peer dial the | ||
/// other _as a listener_. | ||
#[derive(Debug, Default, Copy, Clone, Hash, Eq, PartialEq)] | ||
pub struct DialAsListener(bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced this newtype instead of passing a bool
through the various layers.
I like the fact that it (a) increases type safety and (b) carries its documentation. Unfortunately it is not particularly intuitive, nor idiomatic (never seen a newtype around a bool
) nor convenient (e.g. see if as_listener == false
).
@thomaseizinger you usually have good ideas when it comes to idiomatic Rust. Any alternatives that come to your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of:
Dialer {
/// An optional override if the desired role for the upgrade process has already been negotiated out of band.
////
/// If this is set, the endpoint resulting from a successful dial should assume the given role instead of inferring it from the underlying transport.
role_override: Option<Endpoint>,
},
We could be even more descriptive and do:
enum UpgradeRole {
InferredFromTransport,
NegotiatedOutOfBand {
role: Endpoint
}
}
// -------------
Dialer {
/// The strategy, how the role within the upgrade process should be determined.
upgrade_role: UpgradeRole,
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dialer {
/// An optional override if the desired role for the upgrade process has already been negotiated out of band.
////
/// If this is set, the endpoint resulting from a successful dial should assume the given role instead of inferring it from the underlying transport.
role_override: Option<Endpoint>,
},
This is a good idea, especially as it does not introduce a new type. Though it does allow for duplication of information and thus potentially confusion? PendingPoint::Dialer { role_override: Some(Endpoint::Dialer) }
is the same as PendingPoint::Dialer { role_override: None }
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enum UpgradeRole {
InferredFromTransport,
NegotiatedOutOfBand {
role: Endpoint
}
}
Note that this whole out-of-band role negotiation does not only concern upgrades, but some transport protocols as well. The TCP transport implementation does not require this information as it will open a simultaneous open TCP connection either way. But QUIC does. In the case of QUIC the listener sends a random-garbage packet instead of a connection-initiation packet.
- For a QUIC address:
- Upon receiving the
Sync
,A
immediately dials the address toB
.- Upon expiry of the timer,
B
starts to send UDP packets filled with
random bytes toA
's address. Packets should be sent repeatedly in
random intervals between 10 and 200 ms.- This will result in a QUIC connection where
A
is the client andB
is
the server.
https://github.com/libp2p/specs/blob/master/relay/DCUtR.md#the-protocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to your suggestion of role_override
, how would the signatures of Network::dial
and Peer::dial
look like?
pub fn dial(
&mut self,
address: &Multiaddr,
handler: THandler,
role_override: Option<Endpoint>,
) -> Result<ConnectionId, DialError<THandler>>
I find this descriptive but again confusing due to the duplication of information where Some(Endpoint::Dialer)
is the same as None
.
- Use `Endpoint` instead of `DialAsListener` where `Endpoint::Dialer` is the default and `Endpoint::Listener` represents the override. - Use `override_role` instead of `as_listener`. - Extend `Transport` through new method `dial_with_role_override`.
@thomaseizinger I gave this another shot with your previous input. I did the following changes:
Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply!
core/src/transport.rs
Outdated
/// This option is needed for NAT and firewall hole punching. | ||
/// | ||
/// See [`ConnectedPoint::Dialer`](crate::connection::ConnectedPoint::Dialer) for related option. | ||
fn dial_with_role_override( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on this level, it might be more descriptive to use a terminology like dial_as_listener
? There is no overriding parameter here so not having been involved this discussion, I would be thinking "overridden with what?"
ConnectedPoint::Dialer { | ||
address: addr, | ||
role_override: Endpoint::Dialer, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a bit awkward. I am not sure I completely understand this:
Note that this whole out-of-band role negotiation does not only concern upgrades, but some transport protocols as well. The TCP transport implementation does not require this information as it will open a simultaneous open TCP connection either way.
From your PR description, what we are trying to achieve is to explicitly instruct the upgrade phase of a connection, which role to use instead of inferring it from somewhere. If that is true, perhaps we should rename role_override
to upgrade_role
. That naming would make the default case a bit less awkward as we are simply always stating the role explicitly.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the hope to resolve some confusion: The role_override
passed through Network::dial
is relevant in two places:
- in
AndThen
Transport
and laterupgrade::apply
instructing to either use theOutboundUpgrade::upgrade_outbound
(default case, no override) orInboundUpgrade::upgrade_inbound
(override). - In certain
Transport
implementations. E.g.libp2p-quic
, whenrole_override
is set and thusTransport::dial_with_role_override
(orTransport::dial_as_listener
) is called the local node sends a garbage UDP packet instead of a handshake packet, to punch a hole through its NAT / firewall only.
So would you keep role_override
on Network::dial
, but rename to upgrade_role
in ConnectedPoint::Dialer
and dial_as_listener
in Transport
? With 3 different names for related features, would that not be even more confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for giving me more context. I think I would indeed choose a name for each component that makes sense within that scope. For the upgrade phase, an explicit role parameter is pretty clear IMO.
For the transports, wouldn't it be nice if we could use some kind of "back-channel" to feed in this information? We could add a function like this to the transport interface (name subject to bike-shedding):
trait Transport {
fn register_hole_punching_attempt(&mut self, address: Multiaddr) { }
}
A transport that cares about this (QUIC), can override the function and upon dial
check whether or not this address need to be handled in a special way. A transport that doesn't care, like TCP, can just not override this function.
Perhaps the entire feature can implement in such a back-channel way? Sorry for going into so many different directions here, I am just brain storming for better solutions because now that I think I understand the overall problem, I don't really like my original suggestion that much :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the transports, wouldn't it be nice if we could use some kind of "back-channel" to feed in this information? We could add a function like this to the transport interface (name subject to bike-shedding):
Do I understand your suggestion correctly that protocols like libp2p-dcutr
would emit a NetworkBehaviourAction::Dial
with some hole-punching-marker set, this would propagate to ConcurrentDial
where one would then first call Transport::register_hole_punching_attempt
and then call Transport::dial
.
Why associate the hole-punching-marker with a Multiaddr
instead of a dial attempt (call to Swarm::dial
, Network::dial
, Transport::dial
). In other words, why force a Transport
to keep track of a Multiaddr
-to-hole-punching-marker mapping when we can as well just provide it with each call to Transport::dial
as a parameter, or Transport::dial_with_role_override
, or Transport::dial_as_listener
?
Sorry for going into so many different directions here, I am just brain storming for better solutions because now that I think I understand the overall problem, I don't really like my original suggestion that much :D
Very much appreciated. Exactly why I asked for your review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that Combinator
Transport
implementations like AndThen
would need to keep track of the Multiaddr
to hole-punching-marker as well, given that they need to forward the right ConnectedPoint
to the upgrade provided with Transport::and_then
, e.g. either a ConnectedPoint::Dialer { role_override: Endpoint::Dialer }
or a ConnectedPoint::Dialer { role_override: Endpoint::Listener }
.
See below for how this is solved currently in this pull request. Note the different role_override
values.
rust-libp2p/core/src/transport/and_then.rs
Lines 72 to 111 in 5d4d271
fn dial(self, addr: Multiaddr) -> Result<Self::Dial, TransportError<Self::Error>> { | |
let dialed_fut = self | |
.transport | |
.dial(addr.clone()) | |
.map_err(|err| err.map(EitherError::A))?; | |
let future = AndThenFuture { | |
inner: Either::Left(Box::pin(dialed_fut)), | |
args: Some(( | |
self.fun, | |
ConnectedPoint::Dialer { | |
address: addr, | |
role_override: Endpoint::Dialer, | |
}, | |
)), | |
_marker: PhantomPinned, | |
}; | |
Ok(future) | |
} | |
fn dial_with_role_override( | |
self, | |
addr: Multiaddr, | |
) -> Result<Self::Dial, TransportError<Self::Error>> { | |
let dialed_fut = self | |
.transport | |
.dial_with_role_override(addr.clone()) | |
.map_err(|err| err.map(EitherError::A))?; | |
let future = AndThenFuture { | |
inner: Either::Left(Box::pin(dialed_fut)), | |
args: Some(( | |
self.fun, | |
ConnectedPoint::Dialer { | |
address: addr, | |
role_override: Endpoint::Listener, | |
}, | |
)), | |
_marker: PhantomPinned, | |
}; | |
Ok(future) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to leave the interface that serves 90% of the usecases untouched and model the special case using a dedicated interface. That way, we are
What do you think of the current approach, implementing a middle ground? Instead of altering
Transport::dial
which "serves 90% of the usecases", it addsTransport::dial_with_role_override
.Transport
s that don't care can simply delegate toTransport::dial
.
Yeah that is fair!
I am not yet convinced that we need to follow the same naming on all levels. I think I'd prefer a name that is more descriptive for the individual functionality and composed together, they allow for hole punching then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the transport, what do you think of a name like: Transport::dial_for_nat_traversal
or Transport::dial_nat_traversing
.
The reason I'd advocate for such a naming is because the transport doesn't know what happens to the connection once it is established.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For NAT traversal both endpoints dial each other, but only one needs to call Transport::dial_for_nat_traversal
, in other words, only one needs to dial as a listener, in other words only one needs to dial with a role override. The name dial_for_nat_traversal
would thus be confusing, as one would expect both nodes to use dial_for_nat_traversal
for a NAT traversal, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair point!
I think I've come full circle on this and dial_as_listener
is my preferred option now 😅
Sorry about the long back and forth that ended up in the original idea!
What do you think:
- Use
dial_as_listener
inTransport
- Using the "override" notation within the declarative
PendingPoint
struct
This would result in a piece of code that is something like:
if role_override == Listener {
dial_as_listener
} else {
dial
}
Which is quite readable IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've come full circle on this and dial_as_listener is my preferred option now sweat_smile
Sorry about the long back and forth that ended up in the original idea!
🚀 maybe a strange loop 😇 ?
af10530 does the rename. Note that libp2p_swarm::DialOpts
and libp2p_core::DialOpts
use override_role
. Let me know in case you would like that to be changed to as_listener
as well.
See:
Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the dial concurrency factor per dial. This is especially relevant in the case of libp2p-autonat where one wants to probe addresses in sequence to reduce the amount of work a remote peer can force onto the local node. To enable the above, this commit also: - Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`. Passed as an argument to `Network::dial`. - Removes `Peer::dial` in favor of `Network::dial`. - Simplifies `Swarm::dial_with_handler`. The introduction of `libp2p_core::DialOpts` will be useful beyond this feature, e.g. for libp2p#2363. In the long run I would like to move and merge `libp2p_core::Network` and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating `libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`. Fixes libp2p#2385.
Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the dial concurrency factor per dial. This is especially relevant in the case of libp2p-autonat where one wants to probe addresses in sequence to reduce the amount of work a remote peer can force onto the local node. To enable the above, this commit also: - Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`. Passed as an argument to `Network::dial`. - Removes `Peer::dial` in favor of `Network::dial`. - Simplifies `Swarm::dial_with_handler`. The introduction of `libp2p_core::DialOpts` will be useful beyond this feature, e.g. for libp2p#2363. In the long run I would like to move and merge `libp2p_core::Network` and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating `libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`. Fixes libp2p#2385.
) Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the dial concurrency factor per dial. This is especially relevant in the case of libp2p-autonat where one wants to probe addresses in sequence to reduce the amount of work a remote peer can force onto the local node. To enable the above, this commit also: - Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`. Passed as an argument to `Network::dial`. - Removes `Peer::dial` in favor of `Network::dial`. - Simplifies `Swarm::dial_with_handler`. The introduction of `libp2p_core::DialOpts` will be useful beyond this feature, e.g. for #2363. In the long run I would like to move and merge `libp2p_core::Network` and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating `libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`. Fixes #2385.
as_listener
to Transport::dial
Transport::dial_with_role_override
Updated to use |
Transport::dial_with_role_override
Transport::dial_as_listener
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks @thomaseizinger for the help! |
Thank you for your patience! :) |
…404) Enable a `NetworkBehaviour` or a user via `Swarm::dial` to override the dial concurrency factor per dial. This is especially relevant in the case of libp2p-autonat where one wants to probe addresses in sequence to reduce the amount of work a remote peer can force onto the local node. To enable the above, this commit also: - Introduces `libp2p_core::DialOpts` mirroring `libp2p_swarm::DialOpts`. Passed as an argument to `Network::dial`. - Removes `Peer::dial` in favor of `Network::dial`. - Simplifies `Swarm::dial_with_handler`. The introduction of `libp2p_core::DialOpts` will be useful beyond this feature, e.g. for libp2p/rust-libp2p#2363. In the long run I would like to move and merge `libp2p_core::Network` and `libp2p_core::Pool` into `libp2p_swarm::Swarm` thus deduplicating `libp2p_core::DialOpts` and `libp2p_swarm::DialOpts`. Fixes #2385.
Allows `NetworkBehaviour` implementations to dial a peer, but instruct the dialed connection to be upgraded as if it were the listening endpoint. This is needed when establishing direct connections through NATs and/or Firewalls (hole punching). When hole punching via TCP (QUIC is different but similar) both ends dial the other at the same time resulting in a simultaneously opened TCP connection. To disambiguate who is the dialer and who the listener there are two options: 1. Use the Simultaneous Open Extension of Multistream Select. See [sim-open] specification and [sim-open-rust] Rust implementation. 2. Disambiguate the role (dialer or listener) based on the role within the DCUtR [dcutr] protocol. More specifically the node initiating the DCUtR process will act as a listener and the other as a dialer. This commit enables (2), i.e. enables the DCUtR protocol to specify the role used once the connection is established. While on the positive side (2) requires one round trip less than (1), on the negative side (2) only works for coordinated simultaneous dials. I.e. when a simultaneous dial happens by chance, and not coordinated via DCUtR, the connection attempt fails when only (2) is in place. [sim-open]: https://github.com/libp2p/specs/blob/master/connections/simopen.md [sim-open-rust]: libp2p#2066 [dcutr]: https://github.com/libp2p/specs/blob/master/relay/DCUtR.md
Allows
NetworkBehaviour
implementations to dial a peer, but instructthe dialed connection to be upgraded as if it were the listening
endpoint.
This is needed when establishing direct connections through NATs and/or
Firewalls (hole punching). When hole punching via TCP (QUIC is different
but similar) both ends dial the other at the same time resulting in a
simultaneously opened TCP connection. To disambiguate who is the dialer
and who the listener there are two options:
Use the Simultaneous Open Extension of Multistream Select. See
sim-open specification and sim-open-rust Rust implementation.
Disambiguate the role (dialer or listener) based on the role within
the DCUtR dcutr protocol. More specifically the node initiating the
DCUtR process will act as a listener and the other as a dialer.
This commit enables (2), i.e. enables the DCUtR protocol to specify the
role used once the connection is established.
While on the positive side (2) requires one round trip less than (1), on
the negative side (2) only works for coordinated simultaneous dials.
I.e. when a simultaneous dial happens by chance, and not coordinated via
DCUtR, the connection attempt fails when only (2) is in place.
Fixes #2250.