Explicit public address for authority discovery DHT record and Identify protocol #3657

dmitry-markin · 2024-03-12T10:39:46Z

As reported in #3519 (comment), DHT is flooded by rogue automatically discovered addresses (both in regular DHT records and authority discovery DHT records) when a node is run on a spot instance.

This PR adds flag --public-addr-only that forces publishing of only explicitly set public addresses.

If --public-addr-only flag is passed:

The node publishes only explicitly set --public-addr addresses to the authority discovery DHT record instead af the discovered external addresses (applies to validators).
Only --public_addr addresses are advertised via Identify protocol instead of all listen addresses and discovered external addresses, so that remote nodes do not add them to the node addresses in DHT (applies to all nodes).

This PR also fixes a subtle bug where expired external addresses were never removed from the list in PeerInfo when handling FromSwarm::ExpiredExternalAddr.

altonen · 2024-03-12T10:50:58Z

substrate/client/network/src/peer_info.rs

-				self.identify.on_swarm_event(FromSwarm::NewListenAddr(e));
+				// Do not report listen addresses to `Identify`, because remote nodes should only
+				// know about our external addresses.


Not sure if this is a good idea. Probably needs a burn-in in a real network to see if it's safe to remove because AFAIU many of our nodes rely on external address discovery via Identify and the issues we've experienced are limited Rococo. If this is necessary to fix it, then it probably should be behind a CLI flag to enable this behavior explicitly.

Yes, let's do a burn-in. This should not affect the external address discovery via Identify, as discovery is handled separately via observed_addr field of Identify message.

Yeah I was mixing up listen and external addresses

The confusion is no surprise, as there is only one field listen_addrs in Identify message, where both listen and external addresses are added.

dmitry-markin · 2024-03-13T07:21:27Z

Putting more thought into it, I think this is not enough to clean DHT from temporary spot instance addresses, as they will be discovered via Identify and published as external addresses anyway. Unless connections from temporary addresses are blocked/forwarded by a firewall.

I think we need to explicitly filter all addresses except the explicitly set --public-addr from being added to DHT, even as external addresses. I would not introduce a new cli flag for this and enable this behavior always when --public-addr flag is passed, which seems reasonable as this is the intended dial address set by the operator.

@lexnv @altonen what do you think?

dmitry-markin · 2024-03-13T07:23:17Z

The same applies to authority discovery, as a spot instance temporary addresses will end up in the authority discovery DHT records as external addresses.

dmitry-markin · 2024-03-13T07:29:51Z

I would not introduce a new cli flag for this and enable this behavior always when --public-addr flag is passed, which seems reasonable as this is the intended dial address set by the operator.

Another option is to add one more flag --no-external-addr-discovery, but the operator would need to pass all three: --public-addr --hide-listen-addr --no-external-addr-discovery while this behavior seems intended when --public-addr is passed.

EDIT: another option is --public-addr xxx --public-addr-only 😄

I'm going to update the PR to cover both Identify and authority discovery once we decide on the flag usage.

altonen · 2024-03-13T11:31:02Z

Out of caution, I'd go with an extra flag instead of modifying behavior of the current flags, simply because this is trying to address an issue we've encountered in Rococo, not on all networks, and because we don't know if there is someone relying on the current behavior of the flags. Your call though

lexnv · 2024-03-13T15:58:02Z

I do like the --public-addr xxx --public-addr-only, this keeps things simple for us and node operations (as opposed of having 2 extra CLI options to hide the listen address and no external address).
Then, Rococo maintainers could add the extra flag to enable this behavior :D

…en-addresses

dmitry-markin · 2024-03-15T11:34:41Z

@lexnv could you have a look again, please? The logic with Identify was reworked + the action of the flag was extended on authority discovery DHT records.

substrate/client/network/src/peer_info.rs

substrate/client/authority-discovery/src/worker.rs

lexnv · 2024-03-15T12:59:27Z

substrate/client/authority-discovery/src/worker.rs


-		debug!(target: LOG_TARGET, "Authority DHT record peer_id='{:?}' addresses='{:?}'", peer_id, addresses.clone().collect::<Vec<_>>());
+		debug!(


dq:
I think moving the debug log here changes a bit the details we expose.
I was curious to know the addresses we publish in DHT prior to adding the /p2p/local_peer_id.

Having the log line here might hint us about /p2p/other_peer entries (because they will remain untouched).
Could you see any value in distinguishing between rogue_ip_address and rogue_ip_address/p2p/local_peer_id?

I moved the log here to see what addresses exactly are published to DHT. In case we want pretty addresses without /p2p/... part, we would also need to drop the /p2p/... part if it was provided by the operator.

substrate/client/authority-discovery/src/worker.rs

lexnv

LGTM! Nice one 👍

substrate/client/authority-discovery/src/worker.rs

…en-addresses

bkchr · 2024-03-18T09:59:54Z

I'm not convinced by this pull request. You are trying to add some duct tape. This change will require manual intervention of node operators. Aka this will not fix anything. We have almost all the data that we need. Other nodes report us out addresses. We just need to take the data and put it together in a better way.

This PR also fixes a subtle bug where expired external addresses were never removed from the list in PeerInfo when handling FromSwarm::ExpiredExternalAddr.

This is a correct fix. However, I'm not sure if you went down to libp2p checking when this is being generated. (I have done this and I think it is generated almost never because they have a address cache for 200 automatically).

As I already said in the issue or somewhere else, we need to start track the "lifeness" of these addresses manually in our networking code and prune them if they are too old.

Authority discovery should be changed to report the addresses in the following order:

Our own listen addresses/public addresses (This should already be good enough for proper validators as they are reachable from the outside)
The rest you fill up with addresses reported from other nodes.

dmitry-markin · 2024-03-18T11:30:42Z

As I already said in the issue or somewhere else, we need to start track the "lifeness" of these addresses manually in our networking code and prune them if they are too old.

For this, we'd need to introduce the address probing protocol, something like AutoNAT v2 proposed in libp2p. Otherwise we don't have any source of truth for what addresses are real and what are incorrectly reported by remote peers with faulty NAT / port forwarding configurations or incorrectly translated by libp2p (it assigns a local listen port to an observed external address and breaks things when the external address has a different port than the listen address).

Authority discovery should be changed to report the addresses in the following order:

* Our own listen addresses/public addresses (This should already be good enough for proper validators as they are reachable from the outside)

* The rest you fill up with addresses reported from other nodes.

Currently, upon fetching an authority-discovery DHT record, we just insert the addresses into Discovery behavior via different forms of add_known_address, which all boil down to calling Discovery::add_know_address, effectively just adding the addresses to the DHT routing table. When we dial a peer, we lookup addresses in libp2p Kad behavior's handle_pending_outbound_connection() where they are fetched from a k-bucket's entry in the order they were inserted there. As the validator have already participated in DHT, there are chances there are already addresses in the routing table reported via Identify protocol, including all the listen addresses and incorrectly discovered external addresses.

TLDR; we can order addresses to make the address provided by the operator go first, but we'd need to maintain the order in both (1) authority-discovery records and (2) DHT routing table (via providing the needed order in Identify).

@bkchr does this make sense?

bkchr · 2024-03-18T16:14:29Z

For this, we'd need to introduce the address probing protocol, something like AutoNAT v2 proposed in libp2p. Otherwise we don't have any source of truth for what addresses are real and what are incorrectly reported by remote peers with faulty NAT / port forwarding configurations or incorrectly translated by libp2p (it assigns a local listen port to an observed external address and breaks things when the external address has a different port than the listen address).

Generally we are not supporting authorities behind NATs. Authorities should be reachable directly. Otherwise we would need support for hole punching etc.

Currently, upon fetching an authority-discovery DHT record, we just insert the addresses into Discovery behavior via different forms of add_known_address, which all boil down to calling Discovery::add_know_address, effectively just adding the addresses to the DHT routing table.

Generally I was talking about the sending site. AKA what we put into the DHT for our node.

When we dial a peer, we lookup addresses in libp2p Kad behavior's handle_pending_outbound_connection() where they are fetched from a k-bucket's entry in the order they were inserted there. As the validator have already participated in DHT, there are chances there are already addresses in the routing table reported via Identify protocol, including all the listen addresses and incorrectly discovered external addresses.

But we call this code in Kademlia, from here. And before we already use this ephemeral list. So, we could give addresses in the ephemeral list a higher "weight" when they come from authority discovery.

dmitry-markin · 2024-03-18T16:23:13Z

Generally we are not supporting authorities behind NATs. Authorities should be reachable directly. Otherwise we would need support for hole punching etc.

Unfortunately, this is exactly what we are doing at Parity — running rococo validators on spot instances with hole punching.

Generally I was talking about the sending site. AKA what we put into the DHT for our node.

👍

But we call this code in Kademlia, from here. And before we already use this ephemeral list. So, we could give addresses in the ephemeral list a higher "weight" when they come from authority discovery.

Yes, makes sense, missed this. Should be already working as needed if we publish the authority-discovery DHT records with the right order of addresses.

bkchr · 2024-03-18T18:59:02Z

Unfortunately, this is exactly what we are doing at Parity — running rococo validators on spot instances with hole punching.

Then we need to change the infrastructure and not the code ;)

bkchr · 2024-03-18T19:01:13Z

Should be already working as needed if we publish the authority-discovery DHT records with the right order of addresses.

Yes, but we could also optimize the function to not return duplicated addresses.

dmitry-markin · 2024-03-20T12:59:14Z

Closing in favor of #3757.

) Make sure explicitly set by the operator public addresses go first in the authority discovery DHT records. Also update `Discovery` behavior to eliminate duplicates in the returned addresses. This PR should improve situation with #3519. Obsoletes #3657.

…ritytech#3757) Make sure explicitly set by the operator public addresses go first in the authority discovery DHT records. Also update `Discovery` behavior to eliminate duplicates in the returned addresses. This PR should improve situation with paritytech#3519. Obsoletes paritytech#3657.

Do not report listen addresses via Identify protocol

dad4a3a

dmitry-markin requested review from altonen and lexnv March 12, 2024 10:39

altonen reviewed Mar 12, 2024

View reviewed changes

dmitry-markin mentioned this pull request Mar 12, 2024

Make concurrent dialing more aggressive #3656

Closed

dmitry-markin added A0-needs_burnin Pull request needs to be tested on a live validator node before merge. DevOps is notified via matrix T0-node This PR/Issue is related to the topic “node”. labels Mar 12, 2024

dmitry-markin mentioned this pull request Mar 12, 2024

rococo asset-hub and others don't always advertise their collations #3519

Open

1 task

dmitry-markin added 2 commits March 12, 2024 15:32

Add --hide-listen-addr cli flag

4faed09

Add PRDoc

c52a10e

dmitry-markin requested a review from altonen March 12, 2024 13:52

minor: formatting

702749f

dmitry-markin marked this pull request as draft March 13, 2024 15:58

dmitry-markin changed the title ~~Do not report listen addresses via Identify protocol~~ Explicit public address for authority discovery DHT record and Identify protocol Mar 14, 2024

dmitry-markin added 5 commits March 14, 2024 17:07

Rework public addresses advertising via Identify

7172f67

minor: comments

19a0122

Merge remote-tracking branch 'origin/master' into dm-identify-no-list…

3648e85

…en-addresses

Extend --public-addr-only on authority discovery DHT records

714448f

Update PRDoc

5d3e398

dmitry-markin marked this pull request as ready for review March 15, 2024 11:34

dmitry-markin removed the request for review from altonen March 15, 2024 11:36

lexnv reviewed Mar 15, 2024

View reviewed changes

substrate/client/network/src/peer_info.rs Outdated Show resolved Hide resolved

lexnv reviewed Mar 15, 2024

View reviewed changes

substrate/client/network/src/peer_info.rs Show resolved Hide resolved

lexnv reviewed Mar 15, 2024

View reviewed changes

substrate/client/authority-discovery/src/worker.rs Outdated Show resolved Hide resolved

lexnv reviewed Mar 15, 2024

View reviewed changes

substrate/client/authority-discovery/src/worker.rs Outdated Show resolved Hide resolved

lexnv reviewed Mar 15, 2024

View reviewed changes

substrate/client/authority-discovery/src/worker.rs Outdated Show resolved Hide resolved

dmitry-markin added 2 commits March 15, 2024 15:10

Apply review suggestions

ed885b5

Use Display for peer_id when printing authority DHT record

d558633

lexnv approved these changes Mar 15, 2024

View reviewed changes

Apply review suggestions

8202565

lexnv reviewed Mar 15, 2024

View reviewed changes

substrate/client/authority-discovery/src/worker.rs Show resolved Hide resolved

dmitry-markin added 2 commits March 18, 2024 10:39

Merge remote-tracking branch 'origin/master' into dm-identify-no-list…

5daba3e

…en-addresses

Apply review suggestions

a8da478

dmitry-markin mentioned this pull request Mar 20, 2024

Make public addresses go first in authority discovery DHT records #3757

Merged

dmitry-markin closed this Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit public address for authority discovery DHT record and Identify protocol #3657

Explicit public address for authority discovery DHT record and Identify protocol #3657

dmitry-markin commented Mar 12, 2024 •

edited

Loading

altonen Mar 12, 2024

dmitry-markin Mar 12, 2024 •

edited

Loading

altonen Mar 12, 2024

dmitry-markin Mar 12, 2024

dmitry-markin commented Mar 13, 2024 •

edited

Loading

dmitry-markin commented Mar 13, 2024

dmitry-markin commented Mar 13, 2024 •

edited

Loading

altonen commented Mar 13, 2024

lexnv commented Mar 13, 2024

dmitry-markin commented Mar 15, 2024

lexnv Mar 15, 2024

dmitry-markin Mar 15, 2024

lexnv left a comment

bkchr commented Mar 18, 2024

dmitry-markin commented Mar 18, 2024 •

edited

Loading

bkchr commented Mar 18, 2024

dmitry-markin commented Mar 18, 2024

bkchr commented Mar 18, 2024

bkchr commented Mar 18, 2024

dmitry-markin commented Mar 20, 2024


		debug!(target: LOG_TARGET, "Authority DHT record peer_id='{:?}' addresses='{:?}'", peer_id, addresses.clone().collect::<Vec<_>>());
		debug!(

Explicit public address for authority discovery DHT record and Identify protocol #3657

Explicit public address for authority discovery DHT record and Identify protocol #3657

Conversation

dmitry-markin commented Mar 12, 2024 • edited Loading

altonen Mar 12, 2024

Choose a reason for hiding this comment

dmitry-markin Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

altonen Mar 12, 2024

Choose a reason for hiding this comment

dmitry-markin Mar 12, 2024

Choose a reason for hiding this comment

dmitry-markin commented Mar 13, 2024 • edited Loading

dmitry-markin commented Mar 13, 2024

dmitry-markin commented Mar 13, 2024 • edited Loading

altonen commented Mar 13, 2024

lexnv commented Mar 13, 2024

dmitry-markin commented Mar 15, 2024

lexnv Mar 15, 2024

Choose a reason for hiding this comment

dmitry-markin Mar 15, 2024

Choose a reason for hiding this comment

lexnv left a comment

Choose a reason for hiding this comment

bkchr commented Mar 18, 2024

dmitry-markin commented Mar 18, 2024 • edited Loading

bkchr commented Mar 18, 2024

dmitry-markin commented Mar 18, 2024

bkchr commented Mar 18, 2024

bkchr commented Mar 18, 2024

dmitry-markin commented Mar 20, 2024

dmitry-markin commented Mar 12, 2024 •

edited

Loading

dmitry-markin Mar 12, 2024 •

edited

Loading

dmitry-markin commented Mar 13, 2024 •

edited

Loading

dmitry-markin commented Mar 13, 2024 •

edited

Loading

dmitry-markin commented Mar 18, 2024 •

edited

Loading