Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make concurrent dialing more aggressive #3656

Closed
wants to merge 2 commits into from

Conversation

dmitry-markin
Copy link
Contributor

This PR changes the default value for concurrent dials from 8 to 16, and sets the TCP dial timeout to 20 secs. This should help connecting to nodes that have a lot of stale addresses in the DHT.

This is a remediation for #3519 (collator side).

@dmitry-markin dmitry-markin requested review from altonen and lexnv March 12, 2024 10:27
@@ -31,6 +31,9 @@ use std::{sync::Arc, time::Duration};

pub use libp2p::bandwidth::BandwidthSinks;

/// Timeout after which a TCP connection attempt is considered failed.
const TCP_DIAL_TIMEOUT: Duration = Duration::from_secs(20);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dq: The default timeout is in the order of minutes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my linux box it's 2 min 10 secs.

.max_negotiating_inbound_streams(2048)
// Increase the default dial concurrency factor 8 to 16 to help with cases where DHT
// has plenty of stale peer addresses.
.dial_concurrency_factor(NonZeroU8::new(16).expect("0 < 16 < 256; qed"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dq: I think this is fine, but do you think there is a chance we might run into issues with the total number of file descriptors opened for the concurrent requests? 🤔

Copy link
Contributor Author

@dmitry-markin dmitry-markin Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is theoretically possible, especially for nodes with high out peer count. All my systems show ulimit -n as 1024. So, it means maximum 125 simultaneous connections to peers with bloated DHT records before the change, and 60 connections after the change.

Don't think the practical probability of hitting that many peers at once with bloated DHT records is high, but at least something to keep in mind.

Copy link
Contributor

@altonen altonen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think @bkchr's suggestion of pruning addresses more aggressively would be worth exploring as well. The number of known addresses shown in #3519 (comment) looks like a bug to me.

@dmitry-markin
Copy link
Contributor Author

LGTM but I think @bkchr's suggestion of pruning addresses more aggressively would be worth exploring as well. The number of known addresses shown in #3519 (comment) looks like a bug to me.

Yes, hopefully that one is fixed by #3657 you already reviewed.

@dmitry-markin dmitry-markin added R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”. labels Mar 12, 2024
@dmitry-markin
Copy link
Contributor Author

As this doesn't help with reaching validators with stale DHT records, I don't think we should increase the default concurrency factor. As for the TCP dial timeout, I'm not that sure, but unless we see any benefits from setting it to 20 secs, I would keep everything as is.

So, I'm inclined to close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T0-node This PR/Issue is related to the topic “node”.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants