Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable Peer Filtering #471

Merged
merged 18 commits into from
Apr 1, 2020
Merged

Configurable Peer Filtering #471

merged 18 commits into from
Apr 1, 2020

Conversation

aschmahmann
Copy link
Contributor

Required for libp2p/go-libp2p#803. Allows peers to be filtered out of queries and routing tables, comes with a couple suggested functions for public and private network peer filtering.

Bumps [github.com/ipfs/go-datastore](https://github.com/ipfs/go-datastore) from 0.4.2 to 0.4.4.
- [Release notes](https://github.com/ipfs/go-datastore/releases)
- [Commits](ipfs/go-datastore@v0.4.2...v0.4.4)

Signed-off-by: dependabot-preview[bot] <support@dependabot.com>
Copy link
Contributor Author

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a start and I'd really appreciate feedback. The two main things are:

  1. Are the filters sufficiently aggressive, overly aggressive, or not aggressive enough?
  2. Does the API seem reasonable?

dht.go Outdated Show resolved Hide resolved
query.go Outdated Show resolved Hide resolved
opts/filters.go Outdated Show resolved Hide resolved
opts/filters.go Outdated
Comment on lines 44 to 52
conns := h.Network().ConnsToPeer(ai.ID)
if len(conns) > 0 {
for _, c := range conns {
if manet.IsPrivateAddr(c.RemoteMultiaddr()) {
return true
}
}
return false
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be overkill in the sense that if we're already connected to the peer and we discovered them via a private DHT query then we may as well query them if we're already connected.

The approach here was to be highly defensive of our queries being polluted with non-private peers in order to keep the private DHT small and therefore make the cost of running 2 DHTs small.

Maybe we should remove this check?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this check. We may learn about a few undailable peers, but we won't waste too much time dialing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I would short-circuit and, if we have a private connection, we should assume that the peer is local. I just wouldn't do the inverse "we're connected over a public IP" check.

opts/filters.go Outdated
for _, a := range ai.Addrs {
if manet.IsPublicAddr(a) {
if !isRelayAddr(a) {
return false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is perhaps overly aggressive because it's possible that a peer has both a private address and a public one. Perhaps for queries it's not worth doing this check and only being critical of routing table additions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the idea here? Why would we want to filter out peers that have public addresses? IIRC, peers on the public network should also be in the private DHT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern here isn't so much about undialable nodes, but about accidentally bridging the networks (e.g. starting a local DHT request and ending up with a global DHT request). However, I suspect that it's not really a big deal since if you get the data then you have it and this bridging shouldn't happen with behaving nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't filter out peers with public addresses. All peers on the local network, dialable or not, should be part of the local DHT.

Notes:

  1. We should have some way to pass an address filter function down into the dialer in the future. That's probably the right way to handle this.
  2. We should apply all the checks from https://github.com/libp2p/go-libp2p-kad-dht/pull/471/files#r387453724. E.g., we should filter out peers when (a) we know our public IPv4 address and (b) we see that it's not the same as theirs.

opts/options.go Outdated
Comment on lines 260 to 270
// QueryFilter sets a function that approves which peers may be dialed in a query
func QueryFilter(filter func(h host.Host, ai peer.AddrInfo) bool) Option {
return func(o *Options) error {
o.QueryPeerFilter = filter
return nil
}
}

// RoutingTableFilter sets a function that approves which peers may be added to the routing table. The host should
// already have at least one connection to the peer under consideration.
func RoutingTableFilter(filter func(conns []core.Conn) bool) Option {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would happily take better names for these. Is it obvious from the name the when true is returned that we keep that peer?

opts/options.go Outdated
Comment on lines 260 to 270
// QueryFilter sets a function that approves which peers may be dialed in a query
func QueryFilter(filter func(h host.Host, ai peer.AddrInfo) bool) Option {
return func(o *Options) error {
o.QueryPeerFilter = filter
return nil
}
}

// RoutingTableFilter sets a function that approves which peers may be added to the routing table. The host should
// already have at least one connection to the peer under consideration.
func RoutingTableFilter(filter func(conns []core.Conn) bool) Option {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of me wants to include a host.Host parameter in this function just in case we need it in the future. However, given that most users will probably just use our pre-built function breaking the API in the future may not be that big a deal.

query.go Outdated
for _, np := range newPeers {
closer := q.localPeers.Add(np.ID)
foundCloserPeer = foundCloserPeer || closer
if q.dht.queryPeerFilter(q.dht.host, *next) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How necessary is the query filter really, it seems like it's kind of "plan B" since hopefully the routing tables should be clean and only contain relevant peers. Is it ok for functions to overfilter peers here?

Do we also want to filter which addresses we add into the peerstore and/or which ones we eventually dial? If so is it important to do this now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How necessary is the query filter really, it seems like it's kind of "plan B" since hopefully the routing tables should be clean and only contain relevant peers. Is it ok for functions to overfilter peers here?

For the public DHT, I'd introduce two checks:

  1. I can dial the peer.
  2. They aren't advertising relay addresses. That's a bit over-zealous, but we don't want to accidentally dial it.

For the private DHT, I'm not sure if we want to filter at all.

Do we also want to filter which addresses we add into the peerstore and/or which ones we eventually dial? If so is it important to do this now?

I'd actually keep all addresses. If the peer is on the local network, we might as well use those addresses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dial the peer

What does this mean? If we don't check at all we will end up eventually dialing the peer (if they are useful). To me this sounds like the only query filter we want is for the public DHT to check for relay addresses, is that correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dial the peer

Sorry, I meant to say "the peer has a public address". Really, just checking for "no relay address" should be sufficient.

opts/filters.go Outdated Show resolved Hide resolved
opts/filters.go Outdated
Comment on lines 34 to 36
addr := c.RemoteMultiaddr()
if !isRelayAddr(addr) && manet.IsPublicAddr(addr) {
return true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a bit more aggressive than necessary since AFAIK if two peers are publicly dialable via port forwarding and are also on the same local network they likely will be connected to each other over private multiaddrs instead of public ones. This means we are potentially missing peers from our routing tables that could be there.

In the future we could add more expensive tests like pinging any non-relay public addresses we have for the peer in the peerstore (requires that the function have access to the peerstore), to see if the peer we're locally connected to is also publicly dialable.

If desired there could also be a simple check to see if (as far as we know) the peer and us share the same public network address (e.g. are behind the same NAT).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just check "do we have a public addr for this peer in the peerstore"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but there's no guarantee that it means anything right? This is the backup for what happens if an undialable peer doesn't remove themselves from being a DHT server.

Copy link
Contributor

@aarshkshah1992 aarshkshah1992 Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann

Would I be correct in saying that the problem we want to solve is :

"Even for a public DHT, we should add a peer to the RT if that peer is on the same network as ours, even though we might not see a public address for it" ?

Would such a peer get added to our private DHT then ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aarshkshah1992 The problem we're trying to solve in this function (PublicRoutingTableFilter) is:

How can we help ourselves ensure (at minimal cost) that even if peers are not properly removing themselves from the public DHT (i.e. switching to client mode if they are undialable) that we do not add non-publicly dialable peers to our routing table?

@aschmahmann aschmahmann force-pushed the feat/peer-filtering branch from f4e42b8 to c276983 Compare March 3, 2020 08:55
@aschmahmann aschmahmann requested a review from Stebalien March 3, 2020 09:03
@aschmahmann aschmahmann self-assigned this Mar 3, 2020
opts/filters.go Outdated Show resolved Hide resolved
dht.go Outdated Show resolved Hide resolved
dht.go Outdated Show resolved Hide resolved
dht.go Outdated Show resolved Hide resolved
opts/filters.go Outdated Show resolved Hide resolved
opts/filters.go Outdated
return true
}
}
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check a few more things:

  • If the remote address is in the carrier grade NAT range, abort.
  • IPv4: If the remote address is our public IPv4 address, accept them (NAT hairpin).
  • IPv6: If the peer is advertising the a public address that matches ours, accept them (could be abused?).
  • IPv6: If the peer is advertising an address that looks like an EUI-64 Bit address, check to see if the first 64 bits are the same as ours.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: on linux, we can check the route if the remote address is IPv6:

import 	"github.com/vishvananda/netlink"
...
routes, err := netlink.RouteGet(myIP)
for _, route := range routes {
    if route.Gw == nil {
        // We have a route to this host that doesn't go through a gateway. They must be on the same lan.
    }
}

But we can file this away in the "to be improved" bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can file this away in the "to be improved" bucket.

SGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@willscott is looking into this.

opts/filters.go Outdated
Comment on lines 34 to 36
addr := c.RemoteMultiaddr()
if !isRelayAddr(addr) && manet.IsPublicAddr(addr) {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just check "do we have a public addr for this peer in the peerstore"?

opts/filters.go Outdated
Comment on lines 44 to 52
conns := h.Network().ConnsToPeer(ai.ID)
if len(conns) > 0 {
for _, c := range conns {
if manet.IsPrivateAddr(c.RemoteMultiaddr()) {
return true
}
}
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this check. We may learn about a few undailable peers, but we won't waste too much time dialing them.

opts/filters.go Outdated
for _, a := range ai.Addrs {
if manet.IsPublicAddr(a) {
if !isRelayAddr(a) {
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the idea here? Why would we want to filter out peers that have public addresses? IIRC, peers on the public network should also be in the private DHT.

opts/filters.go Outdated Show resolved Hide resolved
query.go Outdated
for _, np := range newPeers {
closer := q.localPeers.Add(np.ID)
foundCloserPeer = foundCloserPeer || closer
if q.dht.queryPeerFilter(q.dht.host, *next) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How necessary is the query filter really, it seems like it's kind of "plan B" since hopefully the routing tables should be clean and only contain relevant peers. Is it ok for functions to overfilter peers here?

For the public DHT, I'd introduce two checks:

  1. I can dial the peer.
  2. They aren't advertising relay addresses. That's a bit over-zealous, but we don't want to accidentally dial it.

For the private DHT, I'm not sure if we want to filter at all.

Do we also want to filter which addresses we add into the peerstore and/or which ones we eventually dial? If so is it important to do this now?

I'd actually keep all addresses. If the peer is on the local network, we might as well use those addresses.

query.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, it sounds like the requested changes are:

  1. Remove almost all of the query filtering logic from PublicQueryFilter and actually all of the logic from PrivateQueryFilter
  2. Cover more edge cases in IPv4 and IPv6 PrivateRoutingTableFilter
  3. Docs + Cleanup

I left a few clarifying questions

opts/filters.go Outdated
Comment on lines 34 to 36
addr := c.RemoteMultiaddr()
if !isRelayAddr(addr) && manet.IsPublicAddr(addr) {
return true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, but there's no guarantee that it means anything right? This is the backup for what happens if an undialable peer doesn't remove themselves from being a DHT server.

opts/filters.go Outdated
Comment on lines 44 to 52
conns := h.Network().ConnsToPeer(ai.ID)
if len(conns) > 0 {
for _, c := range conns {
if manet.IsPrivateAddr(c.RemoteMultiaddr()) {
return true
}
}
return false
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opts/filters.go Outdated
for _, a := range ai.Addrs {
if manet.IsPublicAddr(a) {
if !isRelayAddr(a) {
return false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concern here isn't so much about undialable nodes, but about accidentally bridging the networks (e.g. starting a local DHT request and ending up with a global DHT request). However, I suspect that it's not really a big deal since if you get the data then you have it and this bridging shouldn't happen with behaving nodes.

opts/filters.go Outdated
return true
}
}
return false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can file this away in the "to be improved" bucket.

SGTM

query.go Outdated
for _, np := range newPeers {
closer := q.localPeers.Add(np.ID)
foundCloserPeer = foundCloserPeer || closer
if q.dht.queryPeerFilter(q.dht.host, *next) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dial the peer

What does this mean? If we don't check at all we will end up eventually dialing the peer (if they are useful). To me this sounds like the only query filter we want is for the public DHT to check for relay addresses, is that correct?

Copy link
Contributor

@aarshkshah1992 aarshkshah1992 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments to improve my own understanding of what we are trying to do.

opts/filters.go Outdated
)

// PublicQueryFilter returns true if the peer is suspected of being publicly accessible
func PublicQueryFilter(h host.Host, ai peer.AddrInfo) bool {
Copy link
Contributor

@aarshkshah1992 aarshkshah1992 Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann

Why not read the addresses from the peer store ?

opts/filters.go Outdated Show resolved Hide resolved
@Stebalien Stebalien force-pushed the cypress branch 2 times, most recently from 22b3963 to 5d313b1 Compare March 5, 2020 06:26
@Stebalien Stebalien force-pushed the feat/peer-filtering branch from 537e8f9 to eeaa1d7 Compare March 5, 2020 06:29
@aschmahmann aschmahmann force-pushed the feat/peer-filtering branch from 5cc259c to afc98b9 Compare March 6, 2020 05:15
@aschmahmann aschmahmann assigned willscott and unassigned aschmahmann Mar 6, 2020
@aschmahmann aschmahmann force-pushed the feat/peer-filtering branch from afc98b9 to 67b9cef Compare March 6, 2020 20:16
@Stebalien Stebalien added this to the Working Kademlia milestone Mar 10, 2020
dht.go Outdated
@@ -259,6 +264,9 @@ func makeRoutingTable(h host.Host, cfg config) (*kb.RoutingTable, error) {
rtPvLogger.Errorf("failed to connect to peer %s for validation, err=%s", p, err)
return false
}
if !cfg.routingTable.peerFilter(h, h.Network().ConnsToPeer(p)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aschmahmann : is using the RT peerFilter within the peerValidationFunction of the routing table the right place to put it after the kbucket refactor?

@willscott willscott marked this pull request as ready for review March 16, 2020 17:03
dht_filters.go Outdated
}

// PrivateQueryFilter returns true if the peer is suspected of being accessible over a shared private network
func PrivateQueryFilter(h host.Host, ai peer.AddrInfo) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dht_filters.go Outdated
return true
}
}
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too aggressive. If they share a public address with us, we're using NAT hairpin.

dht_filters.go Outdated
} else {
hasPrivateAddr = true
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dht_filters.go Outdated

// PrivateRoutingTableFilter allows a peer to be added to the routing table if the connections to that peer indicate
// that it is on a private network
func PrivateRoutingTableFilter(h host.Host, conns []network.Conn) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, wait, why aren't we using all this logic in the PrivateQueryFilter? The query filter should generally be less restrictive than the routing table filter.

@Stebalien
Copy link
Member

@willscott take a look at some of the unresolved discussions including:

(you'll probably have to go through the unresolved discussions one by one)

dht_filters.go Outdated Show resolved Hide resolved
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty much ready.

dht_filters.go Outdated Show resolved Hide resolved
dht_filters.go Outdated Show resolved Hide resolved
dht_filters.go Outdated Show resolved Hide resolved
dht_filters.go Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@willscott willscott requested a review from Stebalien March 26, 2020 16:55
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's work through the filtering logic explaining what we're doing and why (in comments). That will help us figure out which choices appear to be arbitrary and/or inconsistent.

dht.go Outdated Show resolved Hide resolved
dht.go Outdated Show resolved Hide resolved
dht_filters.go Outdated

for _, c := range conns {
ra := c.RemoteMultiaddr()
if manet.IsPrivateAddr(ra) || isRelayAddr(ra) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will add all relayed peers to the private routing table.

dht_filters.go Show resolved Hide resolved
dht_filters.go Outdated
if !isRelayAddr(addr) && manet.IsPublicAddr(addr) {
return true
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further consideration, drop this entire check. We only want peers that announce public addresses. Direct connections form undialable peers will have public addresses.

The check below should (I think?) look at all the addresses advertised by the peer (assuming identify has finished).

dht_filters.go Show resolved Hide resolved
@Stebalien Stebalien merged commit 5cbef78 into cypress Apr 1, 2020
Stebalien added a commit that referenced this pull request Apr 3, 2020
Allows specifying peer filter functions on query and on adding peers to the routing table. This patch also includes some reasonable default functions for a public-only and private-only DHT.

Co-authored-by: Will Scott <will@cypherpunk.email>
Co-authored-by: Steven Allen <steven@stebalien.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants