-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: always enable TCP keepalives with OS defaults #6834
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -401,10 +401,12 @@ func WithTimeout(d time.Duration) DialOption { | |
// returned by f, gRPC checks the error's Temporary() method to decide if it | ||
// should try to reconnect to the network address. | ||
// | ||
// Note: As of Go 1.21, the standard library overrides the OS defaults for | ||
// TCP keepalive time and interval to 15s. | ||
// To retain OS defaults, use a net.Dialer with the KeepAlive field set to a | ||
// negative value. | ||
// Note: As of Go 1.21, the standard library overrides the OS defaults for TCP | ||
// keepalive time and interval to 15s. To enable TCP keepalive with OS defaults | ||
// for keepalive time and interval, use a net.Dialer that sets the KeepAlive | ||
// field set to a negative value, and sets the SO_KEEPALIVE socket option to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "sets (the KeepAlive field) set to a negative". Remove the 2nd "set". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
// true from the Control or ControlContext field. For a concrete example of how | ||
// to do this, see internal.NetDialerWithTCPKeepalive(). | ||
// | ||
// For more information, please see [issue 23459] in the Go github repo. | ||
// | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
/* | ||
* Copyright 2023 gRPC authors. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* | ||
*/ | ||
|
||
package internal | ||
|
||
import ( | ||
"context" | ||
"net" | ||
"syscall" | ||
"time" | ||
) | ||
|
||
// NetDialerWithTCPKeepalive returns a net.Dialer that enables TCP keepalives on | ||
// the underlying connection with OS default values for keepalive parameters. | ||
// | ||
// TODO: Once https://github.com/golang/go/issues/62254 lands, and the | ||
// appropriate Go version becomes less than our least supported Go version, we | ||
// should look into using the new API to make things more straightforward. | ||
func NetDialerWithTCPKeepalive() *net.Dialer { | ||
return &net.Dialer{ | ||
// Setting a negative value here prevents the Go stdlib from overriding | ||
// the values of TCP keepalive time and interval. It also prevents the | ||
// Go stdlib from enabling TCP keepalives by default. | ||
KeepAlive: time.Duration(-1), | ||
// This method is called after the underlying network socket is created, | ||
// but before dialing the socket (or calling its connect() method). The | ||
// combination of unconditionally enabling TCP keepalives here, and | ||
// disabling the overriding of TCP keepalive parameters by setting the | ||
// KeepAlive field to a negative value above, results in OS defaults for | ||
// the TCP keealive interval and time parameters. | ||
ControlContext: func(_ context.Context, _, _ string, c syscall.RawConn) error { | ||
Check failure on line 45 in internal/tcp_keepalive.go GitHub Actions / tests (tests, 1.19)
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. feel free to ignore: you can use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like a minor improvement to me, but I don't feel strongly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like the |
||
return c.Control(func(fd uintptr) { | ||
syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_KEEPALIVE, 1) | ||
}) | ||
}, | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,14 +36,15 @@ import ( | |
"golang.org/x/net/http2/hpack" | ||
"google.golang.org/grpc/codes" | ||
"google.golang.org/grpc/credentials" | ||
"google.golang.org/grpc/internal" | ||
"google.golang.org/grpc/internal/channelz" | ||
icredentials "google.golang.org/grpc/internal/credentials" | ||
"google.golang.org/grpc/internal/grpclog" | ||
"google.golang.org/grpc/internal/grpcsync" | ||
"google.golang.org/grpc/internal/grpcutil" | ||
imetadata "google.golang.org/grpc/internal/metadata" | ||
istatus "google.golang.org/grpc/internal/status" | ||
"google.golang.org/grpc/internal/syscall" | ||
isyscall "google.golang.org/grpc/internal/syscall" | ||
"google.golang.org/grpc/internal/transport/networktype" | ||
"google.golang.org/grpc/keepalive" | ||
"google.golang.org/grpc/metadata" | ||
|
@@ -176,9 +177,7 @@ func dial(ctx context.Context, fn func(context.Context, string) (net.Conn, error | |
if networkType == "tcp" && useProxy { | ||
return proxyDial(ctx, address, grpcUA) | ||
} | ||
// KeepAlive is set to a negative value to prevent Go's override of the TCP | ||
// keepalive time and interval; retain the OS default. | ||
return (&net.Dialer{KeepAlive: time.Duration(-1)}).DialContext(ctx, networkType, address) | ||
return internal.NetDialerWithTCPKeepalive().DialContext(ctx, networkType, address) | ||
} | ||
|
||
func isTemporary(err error) bool { | ||
|
@@ -264,7 +263,7 @@ func newHTTP2Client(connectCtx, ctx context.Context, addr resolver.Address, opts | |
} | ||
keepaliveEnabled := false | ||
if kp.Time != infinity { | ||
if err = syscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil { | ||
if err = isyscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this function sets the TCP user timeout but does not enable keepalives.. should(n't) we update it to do that, too, in case they are disabled? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TCP_USER_TIMEOUT specifies the maximum amount of time that transmitted data may remain unacknowledged before TCP will forcibly close the corresponding connection and return ETIMEDOUT to the application. If the option value is specified as 0, TCP will to use the system default. I think it is totally a reasonable thing to use it without TCP keepalives. As per A18, the reason we set TCP_USER_TIMEOUT when gRPC keepalives are configured is to ensure that if things are stuck at the TCP layer, setting the TCP_USER_TIMEOUT will ensure that the connection is closed anyways (even if gRPC keepalives are not able to do the same). Are you concerned about the case where a user specifies a custom There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah no, sorry that makes sense, I'm not sure what I was thinking yesterday.. :) |
||
return nil, connectionErrorf(false, err, "transport: failed to set TCP_USER_TIMEOUT: %v", err) | ||
} | ||
keepaliveEnabled = true | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -813,15 +813,6 @@ func (l *listenSocket) Close() error { | |
// Serve returns when lis.Accept fails with fatal errors. lis will be closed when | ||
// this method returns. | ||
// Serve will return a non-nil error unless Stop or GracefulStop is called. | ||
// | ||
// Note: As of Go 1.21, the standard library overrides the OS defaults for | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this deleted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because I will be sending a follow-up PR to fix things on the server-side. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated this with the latest commit, based on our discussion on the issue. |
||
// TCP keepalive time and interval to 15s. | ||
// To retain OS defaults, pass a net.Listener created by calling the Listen method | ||
// on a net.ListenConfig with the `KeepAlive` field set to a negative value. | ||
// | ||
// For more information, please see [issue 23459] in the Go github repo. | ||
// | ||
// [issue 23459]: https://github.com/golang/go/issues/23459 | ||
func (s *Server) Serve(lis net.Listener) error { | ||
s.mu.Lock() | ||
s.printf("serving") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this wording is confusing, and have been trying to think of something better... Maybe:
Note: All supported releases of Go (as of December 2023) override the OS defaults for...
Or "as of the Go 1.21 release" or "up to and including at least Go 1.21" for the parenthetical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.