Make concurrent dialing more aggressive #3656

dmitry-markin · 2024-03-12T10:27:33Z

This PR changes the default value for concurrent dials from 8 to 16, and sets the TCP dial timeout to 20 secs. This should help connecting to nodes that have a lot of stale addresses in the DHT.

This is a remediation for #3519 (collator side).

lexnv · 2024-03-12T10:41:01Z

substrate/client/network/src/transport.rs

@@ -31,6 +31,9 @@ use std::{sync::Arc, time::Duration};

 pub use libp2p::bandwidth::BandwidthSinks;

+/// Timeout after which a TCP connection attempt is considered failed.
+const TCP_DIAL_TIMEOUT: Duration = Duration::from_secs(20);


dq: The default timeout is in the order of minutes?

On my linux box it's 2 min 10 secs.

lexnv · 2024-03-12T10:44:02Z

substrate/client/network/src/service.rs

+				.max_negotiating_inbound_streams(2048)
+				// Increase the default dial concurrency factor 8 to 16 to help with cases where DHT
+				// has plenty of stale peer addresses.
+				.dial_concurrency_factor(NonZeroU8::new(16).expect("0 < 16 < 256; qed"));


dq: I think this is fine, but do you think there is a chance we might run into issues with the total number of file descriptors opened for the concurrent requests? 🤔

Yes, this is theoretically possible, especially for nodes with high out peer count. All my systems show ulimit -n as 1024. So, it means maximum 125 simultaneous connections to peers with bloated DHT records before the change, and 60 connections after the change.

Don't think the practical probability of hitting that many peers at once with bloated DHT records is high, but at least something to keep in mind.

altonen

LGTM but I think @bkchr's suggestion of pruning addresses more aggressively would be worth exploring as well. The number of known addresses shown in #3519 (comment) looks like a bug to me.

dmitry-markin · 2024-03-12T11:06:52Z

LGTM but I think @bkchr's suggestion of pruning addresses more aggressively would be worth exploring as well. The number of known addresses shown in #3519 (comment) looks like a bug to me.

Yes, hopefully that one is fixed by #3657 you already reviewed.

dmitry-markin · 2024-03-14T09:47:14Z

As this doesn't help with reaching validators with stale DHT records, I don't think we should increase the default concurrency factor. As for the TCP dial timeout, I'm not that sure, but unless we see any benefits from setting it to 20 secs, I would keep everything as is.

So, I'm inclined to close this PR.

dmitry-markin added 2 commits March 12, 2024 11:41

Make TCP dials to timeout after 20 secs

452fada

Increase dial concurrency factor from 8 to 16

10d5047

dmitry-markin requested review from altonen and lexnv March 12, 2024 10:27

lexnv reviewed Mar 12, 2024

View reviewed changes

lexnv approved these changes Mar 12, 2024

View reviewed changes

altonen approved these changes Mar 12, 2024

View reviewed changes

dmitry-markin added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. labels Mar 12, 2024

dmitry-markin mentioned this pull request Mar 12, 2024

rococo asset-hub and others don't always advertise their collations #3519

Open

1 task

dmitry-markin closed this Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make concurrent dialing more aggressive #3656

Make concurrent dialing more aggressive #3656

dmitry-markin commented Mar 12, 2024

lexnv Mar 12, 2024

dmitry-markin Mar 12, 2024

lexnv Mar 12, 2024

dmitry-markin Mar 12, 2024 •

edited

Loading

altonen left a comment

dmitry-markin commented Mar 12, 2024

dmitry-markin commented Mar 14, 2024

Make concurrent dialing more aggressive #3656

Make concurrent dialing more aggressive #3656

Conversation

dmitry-markin commented Mar 12, 2024

lexnv Mar 12, 2024

Choose a reason for hiding this comment

dmitry-markin Mar 12, 2024

Choose a reason for hiding this comment

lexnv Mar 12, 2024

Choose a reason for hiding this comment

dmitry-markin Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

altonen left a comment

Choose a reason for hiding this comment

dmitry-markin commented Mar 12, 2024

dmitry-markin commented Mar 14, 2024

dmitry-markin Mar 12, 2024 •

edited

Loading