Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swarm/: Support generic connection management #2824

Closed
mxinden opened this issue Aug 17, 2022 · 26 comments · Fixed by #3254
Closed

swarm/: Support generic connection management #2824

mxinden opened this issue Aug 17, 2022 · 26 comments · Fixed by #3254
Assignees

Comments

@mxinden
Copy link
Member

mxinden commented Aug 17, 2022

Description

Today a user can:

  1. Set a limit for incoming/outgoing/pending connections, global and per peer.
  2. Close a connection via NetworkBehaviour.
  3. Keep a connection alive via ConnectionHandler::keep_alive.

A user is restricted in (1) as they can not do anything beyond setting upper bound limits. E.g. they can not decide whether to accept or deny an incoming connection based on the IP, the PeerId or on their current CPU or memory utilization.

(Taken from #2118 (comment).)

I see 3 ways how we can enable users to do advanced connection management:

  1. Make the pool::ConnectionCounter generic and allow users to provide their own.
    • Yet another moving piece.
    • When not Boxed requires an additional generic parameter.
    • Can only do connection management. Not as powerful as the NetworkBehaviour trait and thus limited in its decision making.
  2. Extend NetworkBehaviour with methods called to review a pending or established connection.
    • Yet another set of methods on NetworkBehaviour.
    • Has knowledge of the full state, thus allows powerful decision making.
    • Users are already familiar with it.
  3. Emit a SwarmEvent::ReviewConnection requiring the user to manually accept each pending/established connection via e.g. Swarm::accept().
    • Complicates the getting-started experience. User has to explicitly call a method to make progress. Not intuitive.

Motivation

Allows downstream users to do advanced connection management and can simplify existing implementations working around this today.

Are you planning to do it yourself in a pull request?

Yes

@mxinden
Copy link
Member Author

mxinden commented Aug 17, 2022

I am currently working on proposal (2), i.e. extending NetworkBehaviour. Hope to publish a first proof-of-concept this week.

@thomaseizinger
Copy link
Contributor

How is (2) going to work in terms of composing behaviours? One NetworkBehaviour declining a connection means it will be denied? i.e. Need full agreement across all NetworkBehaviours?

@mxinden
Copy link
Member Author

mxinden commented Aug 18, 2022

How is (2) going to work in terms of composing behaviours? One NetworkBehaviour declining a connection means it will be denied? i.e. Need full agreement across all NetworkBehaviours?

Correct.

@mxinden
Copy link
Member Author

mxinden commented Aug 19, 2022

Draft implementation of (2) is happening on #2828. Still work in progress, but might help understand the idea behind (2). Happy to expand on any of this in written form. No need to dig through the code in case you don't want to.

@thomaseizinger
Copy link
Contributor

Going to start on this once #3011 is merged to avoid the churn.

@mxinden
Copy link
Member Author

mxinden commented Nov 4, 2022

I think we should design this with a concrete use-case in mind. @divagant-martian would you volunteer overseeing
this from the lighthouse user perspective?

@thomaseizinger can you keep @divagant-martian in the loop?

@divagant-martian
Copy link
Contributor

@mxinden for sure! I'm also relatively familiar with substrate's peer management and can probably get in touch with the iroh guys to check what needs we have in common and where do those differ

@thomaseizinger
Copy link
Contributor

NetworkBehaviour already has quite a few callbacks, although we are getting rid of some of them in #3011. Thus, I am a bit hesitant to continue in the direction of #2828.

So here is an alternative idea:

Make IntoConnectionHandler fallible

NetworkBehaviour::new_handler is already invoked for every incoming connection. If a NetworkBehaviour would like to impose a policy on which connections should be established, it can pass the required data to a prototype implementation of IntoConnectionHandler and make the decision as soon as into_handler is called with the PeerId etc.

There are a few drawbacks with the current design but I think those are all fixable:

  1. If somehow possible, I'd like to get rid of IntoConnetionHandler and delay calling new_handler until we have established the connection. This would allows us to directly make new_handler fallible which would make this feature much easier to use. Additionally, it means we won't run into race conditions with simple limit connection managers as new_handler is called with &mut self and thus, will be called sequentially for each connection.
  2. NetworkBehaviourAction::Dial currently also emits an entire handler. To make sure connection limits are enforced in a single place, I'd replace this mechanism with an "OpenInfo" style where Dial can carry some "OpenInfo" struct that is passed back in new_handler.

I see the following benefits of this design:

  • Good use of the type system: We need a ConnectionHandler for each connection. If we fail to construct one, we can never actually establish the connection.
  • Synchronous decision making: Async decision making would require some kind of timeout so we would potentially hold every connection for some amount of time before we can consider it to be established.
  • Small API surface: Very little change to the existing API of NetworkBehaviour.

cc @divagant-martian @dignifiedquire @rkuhn @mxinden

@divagant-martian
Copy link
Contributor

The proposal would work very well for us and save a lot of time spent on disconnecting peers we were not interested in having in the first place 👍

@mxinden
Copy link
Member Author

mxinden commented Nov 8, 2022

  1. f somehow possible, I'd like to get rid of IntoConnetionHandler and delay calling new_handler until we have established the connection. This would allows us to directly make new_handler fallible which would make this feature much easier to use. Additionally, it means we won't run into race conditions with simple limit connection managers as new_handler is called with &mut self and thus, will be called sequentially for each connection.

I am in favor of getting rid of IntoConnectionHandler as in my eyes it adds a lot of complexity.

NetworkBehaviourAction::Dial currently also emits an entire handler. To make sure connection limits are enforced in a single place, I'd replace this mechanism with an "OpenInfo" style where Dial can carry some "OpenInfo" struct that is passed back in new_handler.

It would be wonderful to have a single mechanism only to create a new handler.

Synchronous decision making: Async decision making would require some kind of timeout so we would potentially hold every connection for some amount of time before we can consider it to be established.

In my eyes this is the way to go. We also achieved consensus on this in #2118 (comment).

@thomaseizinger
Copy link
Contributor

I've put up a draft PR here that aims to build the foundation for the above idea.

@thomaseizinger
Copy link
Contributor

The proposal would work very well for us and save a lot of time spent on disconnecting peers we were not interested in having in the first place +1

A note on this: From the other party's perspective, it is still a regular disconnect, it just happens automatically within the Swarm.

@mxinden
Copy link
Member Author

mxinden commented Nov 9, 2022

We do still need a mechanism for a NetworkBehaviour to allow or deny pending inbound or outbound connections.

(In #2828 that happens via new methods on NetworkBehaviour.)

@divagant-martian
Copy link
Contributor

@mxinden wouldn't it be enough making the new handler function fallible? I as I understand this is part of the proposal. Am I missing something?

@thomaseizinger
Copy link
Contributor

Unless I am missing something, the only thing not possible with my proposal is blocking a new incoming connection before the upgrade process is finished.

For any policy that purely operates on the number of connections, these upgrades would be wasted (plus they consume resources).

@thomaseizinger
Copy link
Contributor

We could retain a prototype-based system where creating a handler is a two-step process for inbound connections:

  • Replace new_inbound_handler with on_new_pending_connection that returns a Result.
  • The Ok of said result is a handler prototype.
  • The handler prototype has a fallible conversion into the actual handler.

The first failure point can be used for pending inbound connections before the upgrade.
The second failure point can be used for established connections and policies like banning by peer ID.

@divagant-martian
Copy link
Contributor

Right, that makes sense and would tackle both cases. Would that be reasonable to add to your current work @thomaseizinger or do you think it should be an effort for a second iteration?

@mxinden
Copy link
Member Author

mxinden commented Nov 9, 2022

The second failure point can be used for established connections and policies like banning by peer ID.

But that second failure point needs some synchronization mechanism back to a single point of truth, likely living in the NetworkBehaviour, i.e. some central place e.g. counting the number of inbound connections.

Removing the IntoConnectionHandler indirection, adding a NetworkBehaviour::on_pending_connection and returning a ConnectionHandler via NetworkBehaviour::new_handler would resolve the need for a synchronization mechanism.

@thomaseizinger
Copy link
Contributor

The second failure point can be used for established connections and policies like banning by peer ID.

But that second failure point needs some synchronization mechanism back to a single point of truth, likely living in the NetworkBehaviour, i.e. some central place e.g. counting the number of inbound connections.

Counting policies would use the first failure point, data driven policies like allow/ban lists would use the latter one.

List-based policies don't need synchronisation.

Removing the IntoConnectionHandler indirection, adding a NetworkBehaviour::on_pending_connection and returning a ConnectionHandler via NetworkBehaviour::new_handler would resolve the need for a synchronization mechanism.

The argument for synchronisation with NB stems from it having more knowledge right? But a pending connection doesn't provide any information apart from the incoming multiaddress. This makes me think that policies that need to take into account pending connections are likely going to be resource-based, i.e. RAM usage, number of connections etc.

Pushing these policies into NB doesn't feel right.

  1. NBs are composed but resource-based policies are likely global.
  2. The owner of Swarm likely knows more how constrained the resources are.

This makes me think that policing pending connections should either happen in the Swarm or some other, non-composed module that a Swarm can be configured with. Perhaps it should even happen on a Transport level as a Transport wrapper?

@thomaseizinger
Copy link
Contributor

I've put up a draft PR here that aims to build the foundation for the above idea.

Removing IntoConnectionHandler is proving difficult (see PR comments).

I am tempted to build a version of this that doesn't remove IntoConnectionHandler but makes the conversion fallible instead.

@divagant-martian Would that be sufficient for your usecase?

A connection management behaviour would have to create ConnectionHandler prototypes that have the necessary data embedded to make a decision about the incoming connection.

For example, if you have a list of banned peers, you'd copy this list into the prototype upon constructing it in new_handler.

@thomaseizinger
Copy link
Contributor

I put up a draft PR here for what this could look like: #3118

@thomaseizinger
Copy link
Contributor

I now have a fully working version of the current codebase without the IntoConnectionHandler abstraction: #3099

I think this can be merged as an independent improvement. We can then later decide how we want to manage pending connections.

mergify bot pushed a commit that referenced this issue Jan 26, 2023
Previously, we used the full reference to the `OutEvent` of the `ConnectionHandler` in all implementations of `NetworkBehaviour`. Not only is this very verbose, it is also more brittle to changes. With the current implementation plan for #2824, we will be removing the `IntoConnectionHandler` abstraction. Using a type-alias to refer to the `OutEvent` makes the migration much easier.
@mergify mergify bot closed this as completed in #3254 Feb 23, 2023
mergify bot pushed a commit that referenced this issue Feb 23, 2023
Previously, a `ConnectionHandler` was immediately requested from the `NetworkBehaviour` as soon as a new dial was initiated or a new incoming connection accepted.

With this patch, we delay the creation of the handler until the connection is actually established and fully upgraded, i.e authenticated and multiplexed.

As a consequence, `NetworkBehaviour::new_handler` is now deprecated in favor of a new set of callbacks:

- `NetworkBehaviour::handle_pending_inbound_connection`
- `NetworkBehaviour::handle_pending_outbound_connection`
- `NetworkBehaviour::handle_established_inbound_connection`
- `NetworkBehaviour::handle_established_outbound_connection`

All callbacks are fallible, allowing the `NetworkBehaviour` to abort the connection either immediately or after it is fully established. All callbacks also receive a `ConnectionId` parameter which uniquely identifies the connection. For example, in case a `NetworkBehaviour` issues a dial via `NetworkBehaviourAction::Dial`, it can unambiguously detect this dial in these lifecycle callbacks via the `ConnectionId`.

Finally, `NetworkBehaviour::handle_pending_outbound_connection` also replaces `NetworkBehaviour::addresses_of_peer` by allowing the behaviour to return more addresses to be used for the dial.

Resolves #2824.

Pull-Request: #3254.
@mxinden
Copy link
Member Author

mxinden commented Feb 28, 2023

We had multiple requests for rust-libp2p to be able to prevent dialing any private IP addresses. Thus far we have pointed them to the following Transport wrapper:

https://github.com/mxinden/kademlia-exporter/blob/master/src/exporter/client/global_only.rs

Somehow I was operating with the assumption that we can now use the generic NetworkBehaviour connection management system to implement the above. More specifically one would return an Error from NetworkBehaviour::handle_pending_outbound_connection.

Unfortunately this is not quite possible. Say that a NetworkBehaviour emits a dial request with one private and one global IP address. The connection-management NetworkBehaviour can either deny all or allow all, but not deny the private and all the global via NetworkBehaviour::handle_pending_outbound_connection.

Should we support the above use-case, e.g. by returning both a set of additional addresses and a set of blocked addresses from NetworkBehaviour::handle_pending_outbound_connection? Or should we just continue recommending the above Transport wrapper, thus blocking private IP addresses at the Transport layer instead.

I am leaning towards the latter, i.e. not extend the NetworkBehaviour::handle_pending_outbound_connection but instead provide the GlobalIpOnly Transport implementation from the rust-libp2p mono repository.

@thomaseizinger any thoughts on this?

@thomaseizinger
Copy link
Contributor

That is something that would be nice to support. It is a bit tricky though.

Even if we pass a mutable reference of the existing addresses into the behaviour, you can never guarantee that another behaviour doesn't add more non-global addresses.

How effective such a global-ip address management is depends on how you compose your behaviours which is quite the foot-gun.

In this case, refusing the dial on the Transport layer is much more effective.

@thomaseizinger
Copy link
Contributor

Maybe we've mixed too many concerns here by merging addresses_of_peer into handle_pending_outbound_connection. If we were to undo that, we could gather all addresses before and then pass a custom collection type to handle_pending_outbound_connection that only allows removal and not adding new addresses. (a &mut [Multiaddr] would be too permissive).

@mxinden
Copy link
Member Author

mxinden commented Mar 2, 2023

Agreed with the concerns above. I suggest continuing with the custom Transport implementation for now.

mergify bot pushed a commit that referenced this issue Mar 21, 2023
This patch deprecates the existing connection limits within `Swarm` and uses the new `NetworkBehaviour` APIs to implement it as a plugin instead.

Related #2824.

Pull-Request: #3386.
mergify bot pushed a commit that referenced this issue Mar 21, 2023
Currently, banning peers is a first-class feature of `Swarm`. With the new connection management capabilities of `NetworkBehaviour`, we can now implement allow and block lists as a separate module.

We introduce a new crate `libp2p-allow-block-list` and deprecate `Swarm::ban_peer_id` in favor of that.

Related #2824.

Pull-Request: #3590.
umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this issue Mar 8, 2024
Previously, we used the full reference to the `OutEvent` of the `ConnectionHandler` in all implementations of `NetworkBehaviour`. Not only is this very verbose, it is also more brittle to changes. With the current implementation plan for libp2p#2824, we will be removing the `IntoConnectionHandler` abstraction. Using a type-alias to refer to the `OutEvent` makes the migration much easier.
umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this issue Mar 8, 2024
Previously, a `ConnectionHandler` was immediately requested from the `NetworkBehaviour` as soon as a new dial was initiated or a new incoming connection accepted.

With this patch, we delay the creation of the handler until the connection is actually established and fully upgraded, i.e authenticated and multiplexed.

As a consequence, `NetworkBehaviour::new_handler` is now deprecated in favor of a new set of callbacks:

- `NetworkBehaviour::handle_pending_inbound_connection`
- `NetworkBehaviour::handle_pending_outbound_connection`
- `NetworkBehaviour::handle_established_inbound_connection`
- `NetworkBehaviour::handle_established_outbound_connection`

All callbacks are fallible, allowing the `NetworkBehaviour` to abort the connection either immediately or after it is fully established. All callbacks also receive a `ConnectionId` parameter which uniquely identifies the connection. For example, in case a `NetworkBehaviour` issues a dial via `NetworkBehaviourAction::Dial`, it can unambiguously detect this dial in these lifecycle callbacks via the `ConnectionId`.

Finally, `NetworkBehaviour::handle_pending_outbound_connection` also replaces `NetworkBehaviour::addresses_of_peer` by allowing the behaviour to return more addresses to be used for the dial.

Resolves libp2p#2824.

Pull-Request: libp2p#3254.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants