Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: always enable TCP keepalives with OS defaults #6834

Merged
merged 4 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion benchmark/benchmain/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ import (
"google.golang.org/grpc/credentials/insecure"
"google.golang.org/grpc/experimental"
"google.golang.org/grpc/grpclog"
"google.golang.org/grpc/internal"
"google.golang.org/grpc/internal/channelz"
"google.golang.org/grpc/keepalive"
"google.golang.org/grpc/metadata"
Expand Down Expand Up @@ -373,7 +374,7 @@ func makeClients(bf stats.Features) ([]testgrpc.BenchmarkServiceClient, func())
logger.Fatalf("Failed to listen: %v", err)
}
opts = append(opts, grpc.WithContextDialer(func(ctx context.Context, address string) (net.Conn, error) {
return nw.ContextDialer((&net.Dialer{KeepAlive: time.Duration(-1)}).DialContext)(ctx, "tcp", lis.Addr().String())
return nw.ContextDialer((internal.NetDialerWithTCPKeepalive().DialContext))(ctx, "tcp", lis.Addr().String())
}))
}
lis = nw.Listener(lis)
Expand Down
10 changes: 6 additions & 4 deletions dialoptions.go
Original file line number Diff line number Diff line change
Expand Up @@ -401,10 +401,12 @@ func WithTimeout(d time.Duration) DialOption {
// returned by f, gRPC checks the error's Temporary() method to decide if it
// should try to reconnect to the network address.
//
// Note: As of Go 1.21, the standard library overrides the OS defaults for
// TCP keepalive time and interval to 15s.
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a
// negative value.
// Note: All supported releases of Go (as of December 2023) override the OS
// defaults for TCP keepalive time and interval to 15s. To enable TCP keepalive
// with OS defaults for keepalive time and interval, use a net.Dialer that sets
// the KeepAlive field to a negative value, and sets the SO_KEEPALIVE socket
// option to true from the Control field. For a concrete example of how to do
// this, see internal.NetDialerWithTCPKeepalive().
//
// For more information, please see [issue 23459] in the Go github repo.
//
Expand Down
50 changes: 50 additions & 0 deletions internal/tcp_keepalive.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Copyright 2023 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package internal

import (
"net"
"syscall"
"time"
)

// NetDialerWithTCPKeepalive returns a net.Dialer that enables TCP keepalives on
// the underlying connection with OS default values for keepalive parameters.
//
// TODO: Once https://github.com/golang/go/issues/62254 lands, and the
// appropriate Go version becomes less than our least supported Go version, we
// should look into using the new API to make things more straightforward.
func NetDialerWithTCPKeepalive() *net.Dialer {
return &net.Dialer{
// Setting a negative value here prevents the Go stdlib from overriding
// the values of TCP keepalive time and interval. It also prevents the
// Go stdlib from enabling TCP keepalives by default.
KeepAlive: time.Duration(-1),
// This method is called after the underlying network socket is created,
// but before dialing the socket (or calling its connect() method). The
// combination of unconditionally enabling TCP keepalives here, and
// disabling the overriding of TCP keepalive parameters by setting the
// KeepAlive field to a negative value above, results in OS defaults for
// the TCP keealive interval and time parameters.
Control: func(_, _ string, c syscall.RawConn) error {
return c.Control(func(fd uintptr) {
syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_KEEPALIVE, 1)
})
},
}
}
9 changes: 4 additions & 5 deletions internal/transport/http2_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,15 @@ import (
"golang.org/x/net/http2/hpack"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/credentials"
"google.golang.org/grpc/internal"
"google.golang.org/grpc/internal/channelz"
icredentials "google.golang.org/grpc/internal/credentials"
"google.golang.org/grpc/internal/grpclog"
"google.golang.org/grpc/internal/grpcsync"
"google.golang.org/grpc/internal/grpcutil"
imetadata "google.golang.org/grpc/internal/metadata"
istatus "google.golang.org/grpc/internal/status"
"google.golang.org/grpc/internal/syscall"
isyscall "google.golang.org/grpc/internal/syscall"
"google.golang.org/grpc/internal/transport/networktype"
"google.golang.org/grpc/keepalive"
"google.golang.org/grpc/metadata"
Expand Down Expand Up @@ -176,9 +177,7 @@ func dial(ctx context.Context, fn func(context.Context, string) (net.Conn, error
if networkType == "tcp" && useProxy {
return proxyDial(ctx, address, grpcUA)
}
// KeepAlive is set to a negative value to prevent Go's override of the TCP
// keepalive time and interval; retain the OS default.
return (&net.Dialer{KeepAlive: time.Duration(-1)}).DialContext(ctx, networkType, address)
return internal.NetDialerWithTCPKeepalive().DialContext(ctx, networkType, address)
}

func isTemporary(err error) bool {
Expand Down Expand Up @@ -264,7 +263,7 @@ func newHTTP2Client(connectCtx, ctx context.Context, addr resolver.Address, opts
}
keepaliveEnabled := false
if kp.Time != infinity {
if err = syscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil {
if err = isyscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function sets the TCP user timeout but does not enable keepalives.. should(n't) we update it to do that, too, in case they are disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCP_USER_TIMEOUT specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection and return ETIMEDOUT to the application. If the option value is specified as 0, TCP will to use the system default.

I think it is totally a reasonable thing to use it without TCP keepalives.

As per A18, the reason we set TCP_USER_TIMEOUT when gRPC keepalives are configured is to ensure that if things are stuck at the TCP layer, setting the TCP_USER_TIMEOUT will ensure that the connection is closed anyways (even if gRPC keepalives are not able to do the same).

Are you concerned about the case where a user specifies a custom net.Dialer to disable TCP keepalives, and configures gRPC keepalives, and therefore, we end up setting TCP_USER_TIMEOUT, but TCP keepalives are disabled? Even in this case, I feel it should be fine, since TCP_USER_TIMEOUT can work irrespective of whether TCP keepalives are configured on the connection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, sorry that makes sense, I'm not sure what I was thinking yesterday.. :)

return nil, connectionErrorf(false, err, "transport: failed to set TCP_USER_TIMEOUT: %v", err)
}
keepaliveEnabled = true
Expand Down
15 changes: 8 additions & 7 deletions internal/transport/proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ import (
"net/http"
"net/http/httputil"
"net/url"
"time"

"google.golang.org/grpc/internal"
)

const proxyAuthHeaderKey = "Proxy-Authorization"
Expand Down Expand Up @@ -113,7 +114,7 @@ func doHTTPConnectHandshake(ctx context.Context, conn net.Conn, backendAddr stri
// proxyDial dials, connecting to a proxy first if necessary. Checks if a proxy
// is necessary, dials, does the HTTP CONNECT handshake, and returns the
// connection.
func proxyDial(ctx context.Context, addr string, grpcUA string) (conn net.Conn, err error) {
func proxyDial(ctx context.Context, addr string, grpcUA string) (net.Conn, error) {
newAddr := addr
proxyURL, err := mapAddress(addr)
if err != nil {
Expand All @@ -123,15 +124,15 @@ func proxyDial(ctx context.Context, addr string, grpcUA string) (conn net.Conn,
newAddr = proxyURL.Host
}

conn, err = (&net.Dialer{KeepAlive: time.Duration(-1)}).DialContext(ctx, "tcp", newAddr)
conn, err := internal.NetDialerWithTCPKeepalive().DialContext(ctx, "tcp", newAddr)
if err != nil {
return
return nil, err
}
if proxyURL != nil {
if proxyURL == nil {
// proxy is disabled if proxyURL is nil.
conn, err = doHTTPConnectHandshake(ctx, conn, addr, proxyURL, grpcUA)
return conn, err
}
return
return doHTTPConnectHandshake(ctx, conn, addr, proxyURL, grpcUA)
}

func sendHTTPRequest(ctx context.Context, req *http.Request, conn net.Conn) error {
Expand Down
19 changes: 11 additions & 8 deletions server.go
Original file line number Diff line number Diff line change
Expand Up @@ -814,14 +814,17 @@ func (l *listenSocket) Close() error {
// this method returns.
// Serve will return a non-nil error unless Stop or GracefulStop is called.
//
// Note: As of Go 1.21, the standard library overrides the OS defaults for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I will be sending a follow-up PR to fix things on the server-side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this with the latest commit, based on our discussion on the issue.

// TCP keepalive time and interval to 15s.
// To retain OS defaults, pass a net.Listener created by calling the Listen method
// on a net.ListenConfig with the `KeepAlive` field set to a negative value.
//
// For more information, please see [issue 23459] in the Go github repo.
//
// [issue 23459]: https://github.com/golang/go/issues/23459
// Note: All supported releases of Go (as of December 2023) override the OS
// defaults for TCP keepalive time and interval to 15s. To enable TCP keepalive
// with OS defaults for keepalive time and interval, callers need to do the
// following two things:
// - pass a net.Listener created by calling the Listen method on a
// net.ListenConfig with the `KeepAlive` field set to a negative value. This
// will result in the Go standard library not overriding OS defaults for TCP
// keepalive interval and time. But this will also result in the Go standard
// library not enabling TCP keepalives by default.
// - override the Accept method on the passed in net.Listener and set the
// SO_KEEPALIVE socket option to enable TCP keepalives, with OS defaults.
func (s *Server) Serve(lis net.Listener) error {
s.mu.Lock()
s.printf("serving")
Expand Down
Loading