Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses getting removed from Kademlia after a disconnection #498

Open
altonen opened this issue Jul 20, 2023 · 0 comments
Open

Addresses getting removed from Kademlia after a disconnection #498

altonen opened this issue Jul 20, 2023 · 0 comments
Labels
I2-bug The node fails to follow expected behavior. I3-annoyance The node behaves within expectations, however this “expected behaviour” itself is at issue.

Comments

@altonen
Copy link
Contributor

altonen commented Jul 20, 2023

There is this pattern in the logs:

sub-libp2p: Addresses of PeerId("12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN"): ["/ip4/43.130.126.82/tcp/30333", "/ip4/174.138.0.205/tcp/30333"]    
...
sub-libp2p: Libp2p => Connected(12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN, SetId(0), Dialer { address: "/ip4/43.130.126.82/tcp/30333/p2p/12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN", role_override: Dialer }, ConnectionId(218)): Not requested by PSM, disabling.    
...
sub-libp2p: Libp2p => Disconnected(PeerId("12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN"), Some(KeepAliveTimeout))    
...
peerset: Connecting to 12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN on SetId(0) (5/8 num_out/max_out).    
sub-libp2p: PSM => Connect(12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN, SetId(0)): Starting to connect    
sub-libp2p: Libp2p <= Dial 12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN    
sub-libp2p: Addresses of PeerId("12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN"): []    
sub-libp2p: Libp2p => Dial failure for PeerId("12D3KooWHy8Zwq8SQBfShPTySwhKwr39QAGMccter7dmC9eDsuSN")    

As the node disconnects, the address is removed from Kademlia and since the address it not stored anywhere else, the next time ProtocolController attempts to connect to the peer, the dial fails because there are no addresses available. The problem is that the addresses that get removed when a node disconnects (such as due to keep-alive timeout) are actually valid so there is no reason to delete them because after they're deleted, the node can no longer dial the peer.

I looked into the Kademlia code and there wasn't anything obviously incorrect so it could be that this is the expected behavior. This is relatively easily fixed on Substrate side by keeping an LRU cache of peers and their addresses and in case libp2p doesn't give any addresses, we could take the addresses from this cache.

Not sure if this is an actual bug or just annoyance but adding both tags.

@altonen altonen transferred this issue from paritytech/substrate Aug 24, 2023
@the-right-joyce the-right-joyce added I2-bug The node fails to follow expected behavior. I3-annoyance The node behaves within expectations, however this “expected behaviour” itself is at issue. and removed I3-bug labels Aug 25, 2023
bkchr pushed a commit that referenced this issue Apr 10, 2024
* removed unused file

* another unused file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I3-annoyance The node behaves within expectations, however this “expected behaviour” itself is at issue.
Projects
Status: Backlog 🗒
Development

No branches or pull requests

2 participants