-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark net.Conn
failed writes as recoverable when 0 bytes were written.
#1013
Conversation
if err != nil { | ||
if n == 0 { | ||
err = &safeRetryError{Err: err} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation here is pgx
:
The underlying connection just invokes Write()
on a net.Conn
and a "broken pipe" comes backs (same as here). But then a safeToRetry
flag is set if exactly n == 0
bytes went through the write (https://github.com/jackc/pgconn/blob/cba610c245265ff50ea3c56a9961da218ed7d730/pgconn.go#L923).
Then in the high-level code used by the driver, if an error is detected but has safeToRetry
set, a driver.ErrBadConn
will be surfaced so the database/sql
package can handle evicting the connection and replacing it in the pool (https://github.com/jackc/pgx/blob/1df45d758d4f57b45d56fbce1e5bb8cf9aabaaf0/stdlib/sql.go#L336).
Merging this because it looks pretty good? Unsure if it'll break anything. |
Done. |
This reverts commit ac3fb2a. Switch back to the upstream lib/pq version. The issue that was the cause for switching to simplesurance/pq was fixed in lib/pq 1.8 (lib/pq#1013).
I'm seeing "broken pipe" errors when working with CRDB using sqlx. The issue seemed to be the tcp connections were diconnected while the conns in db driver (pq) still has stale connection. It happens more often when the DB is behind a proxy. In our cases, the pods were proxied by the envoy sidecar. There were other instances on the community reporting similar issues, and took different workaround by sebding perodic dummy queries in app mimicing keepalive, enlenghthen proxy idle timeout, or shortening the lifetime of db conn. This has been reported and fixed by the lib/pq upstream in v1.9+ lib/pq#1013 lib/pq#723 lib/pq#897 lib/pq#870 grafana/grafana#29957
Refer to upstream lib#1013
Fixes #723, #870, #897.