-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow setting TCP_NODELAY on socket (to save 40ms on connection latency) #3043
Comments
Update:
23ms is a pretty reasonable number - there are probably other optimizations to do here but this seems to be the major issue. |
I'd be fine setting |
@abonander thanks for looking - with internal buffering, do you think this should be configurable at all? I'm happy to prep a PR if we agree on what to do here: (a) always set it - very easy, not sure if there are cases where you'd not want this (b) add it to PgConnectOptions/MySqlConnectOptions and set via WithSocket |
Libpq hardcodes it, that's sufficient precedent for me: https://github.com/postgres/postgres/blob/e70abd67c3e6155fe8e853c4e29255578d9cf48d/src/backend/libpq/pqcomm.c#L743 There's no real benefit to Nagle's algorithm if the application does its own buffering so I don't see a need for a toggle. IMO, NODELAY should be the default nowadays. |
Is your feature request related to a problem? Please describe.
We're using sqlx for a latency-sensitive use case and while connection pooling mitigates the problem, connecting to (postgres) database is very slow, causing latency spikes in our service. Running a very basic
takes consistently 140ms. When we do the same with psycopg2 in Python (which uses libpq), it's consistently 16-20ms. Based on tcp dump (see below) it's pretty clear that 40ms can be attributed to waiting for a delayed tcp ack from database - the client doesn't finish sending data due to Nagle's algorithm.
Describe the solution you'd like
I'd like an option to set TCP_NODELAY option on the socket when it's created, i.e.
When we add the second line in https://github.com/launchbadge/sqlx/blob/v0.7.3/sqlx-core/src/net/socket/mod.rs#L198, we see a consistent improvement by 40ms. This bring connection latency to ~100ms, which is still bad, but slightly less bad. If people have ideas on how we could improve this further, I'm all ears.
I was able to confirm with
strace
thatpyscopg2
does set TCP_NODELAY by default and it's not even runtime-configurable (it's compile-time configurable in libpq).I don't understand the code well enough to recommend where this configuration should live.
ConnectOptions
?Describe alternatives you've considered
Turn it on without a way to opt-out? Some frameworks do that but seems better to make it configurable
Additional context
This is on sqlx 0.7.3 with tokio, running on debian-bullseye on AWS ECS connecting to Aurora Postgres in the same AWS region. We're seeing the problem with multiple services and multiple databases - the connection latency is consistently bad.
TCP dumps before and after (=with a patch that adds
stream.set_nodelay(true)?
):tcpdump-before.txt
tcpdump-after.txt
Good context on TCP_NODELAY is here or here.
The text was updated successfully, but these errors were encountered: