Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google_grpc: add a runtime flag to disable TLSv1.3 #32315

Merged
merged 8 commits into from
Feb 21, 2024

Conversation

kyessenov
Copy link
Contributor

@kyessenov kyessenov commented Feb 10, 2024

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4
Signed-off-by: Kuat Yessenov <kuat@google.com>
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #32315 was opened by kyessenov.

see: more, trace.

Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #32315 was opened by kyessenov.

see: more, trace.

…rsion

Change-Id: Ie1801869d4470a701dd9e32a2ece3417489d323a
@kyessenov kyessenov marked this pull request as ready for review February 15, 2024 00:46
@kyessenov kyessenov requested a review from yanavlasov February 15, 2024 00:46
@stevenzzzz
Copy link
Contributor

pin myself here @stevenzzzz

@yanavlasov yanavlasov self-assigned this Feb 15, 2024
Copy link
Contributor

@matthewstevenson88 matthewstevenson88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd like to get @gtcooke94 to also take a quick look.

Change-Id: Ic71b4df34e21a4939b4d0cca01812564e5e1c91a
Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov
Copy link
Contributor Author

@yanavlasov This is now complete. Added a test to verify that the connection is closed before the request is cancelled. There is no direct hook to check for a handshake failure:

  1. Downstream TLS transport socket does not signal a handshake failure in any way except closing.
  2. Google gRPC has infinite retries somewhere and does not respect the request timeout because it is translated to gRPC deadlines which only start when a request starts.

Both are valid issues, but cannot be fixed in this PR with a limited scope.

@kyessenov
Copy link
Contributor Author

The test failure is real - it appears google_grpc calls do return failures sometimes, but I'm not sure why that doesn't happen locally to me.

@yanavlasov
Copy link
Contributor

/wait

@gtcooke94
Copy link

The grpc TLS API usage LGTM

cc @matthewstevenson88

Change-Id: I3110ab4960f1e966ca4d3b51d12d206822b57a2b
Signed-off-by: Kuat Yessenov <kuat@google.com>
Change-Id: I53773a70c20a0d48adc624d896751574166e88b5
Signed-off-by: Kuat Yessenov <kuat@google.com>
@kyessenov
Copy link
Contributor Author

I think I deflaked it, since it passes bazel test --config=clang test/common/grpc:grpc_client_integration_test --test_arg="-l trace" --test_arg="--gtest_filter=SslIpVersionsClientType/GrpcSslClientIntegrationTest.BasicSslRequestHandshakeFailure/IPv4_GoogleGrpc" --runs_per_test=1000

Change-Id: Ic005a7072051c038635a115f693f44530d6a32c5
Signed-off-by: Kuat Yessenov <kuat@google.com>
@yanavlasov
Copy link
Contributor

Build failure seems legit: https://source.cloud.google.com/results/invocations/9c6a93e0-adaf-4ba6-8faa-8bbcd7a40c99

Note this is compile_time_options build.

@yanavlasov
Copy link
Contributor

/wait

Change-Id: I329279ab04766d92bb6c243e093b74822132db51
Signed-off-by: Kuat Yessenov <kuat@google.com>
…rsion

Change-Id: I42b65771d214a3e791fbaf4e1e65eb8e13548e92
@kyessenov
Copy link
Contributor Author

@RyanTheOptimist Yan suggested to assign this to you since he's out this week.

if (Runtime::runtimeFeatureEnabled("envoy.reloadable_features.google_grpc_disable_tls_13")) {
options.set_max_tls_version(grpc_tls_version::TLS1_2);
}
return grpc::experimental::TlsCredentials(options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code all looks reasonable, but I notice that we are now returning experimental::TlsCredentials instead of SslCredentials. Is there any behavior change associated with this new class? (I'm guessing not, but thought I should confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked @matthewstevenson88 from gRPC team to confirm the API usage - and they did per #32315 (review). We only need to set a root cert, a private key, and a cert chain in Envoy, and the API seems to be compatible with that and offer more knobs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Sounds good.

kyessenov added a commit to kyessenov/envoy that referenced this pull request Feb 23, 2024
Change-Id: I5b5ec42d5b3138060505c74d313d759199083e8d
Signed-off-by: Kuat Yessenov <kuat@google.com>
phlax pushed a commit that referenced this pull request Feb 23, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov why can't this be a simple configuration in GoogleGrpcService? I think for our internal needs we would want something that can be set via xDS.

@kyessenov
Copy link
Contributor Author

@htuch We should follow-up with that. We need this for backporting - we have 5 versions of Envoy in use and cannot possibly backport xDS additions. Also, GoogleGrpcService is part of Wasm ABI - that complicates things further.

kyessenov added a commit to kyessenov/envoy that referenced this pull request Feb 23, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (envoyproxy#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
kyessenov added a commit to kyessenov/envoy that referenced this pull request Feb 23, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (envoyproxy#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
kyessenov added a commit to kyessenov/envoy that referenced this pull request Feb 23, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (envoyproxy#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
ohadvano pushed a commit to ohadvano/envoy that referenced this pull request Feb 25, 2024
Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Signed-off-by: Kuat Yessenov <kuat@google.com>
briansonnenberg pushed a commit to briansonnenberg/envoy that referenced this pull request Mar 4, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (envoyproxy#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
phlax pushed a commit that referenced this pull request Mar 4, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
phlax pushed a commit that referenced this pull request Mar 4, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
phlax pushed a commit that referenced this pull request Mar 5, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
SeanKilleen pushed a commit to SeanKilleen/envoy that referenced this pull request Apr 3, 2024
* google_grpc: add a runtime flag to disable TLSv1.3 (envoyproxy#32315)

Change-Id: Id88723a81d4b1586bf12be6f4dc7a81ae7b0d9c4

Commit Message: Adds a temporary runtime flag to disable TLSv1.3 by gRPC SDK until a proper xDS extension can be added.
Additional Description:
Risk Level: low, default false
Testing: regression

Change-Id: I34daae55ede7c8093b0dac1fa6ff5a5dc8df677d
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Sean Killeen <SeanKilleen@gmail.com>
@alyssawilk
Copy link
Contributor

What's the game plan for flipping this? It has no TODO / tracking issue

@kyessenov
Copy link
Contributor Author

@alyssawilk Good question. I believe the game plan is to extend/refresh google_grpc xDS definition to capture the latest changes to gRPC SDK, such as ability to control TLS version, certificate providers, and what not. We did not do it at the time because it is difficult to pick up xDS changes.

@kyessenov
Copy link
Contributor Author

@markdroth - since you are both xDS and gRPC shepherd.

@alyssawilk
Copy link
Contributor

Can you find someone willing to take this and add a TODO, or take it on yourself? Or make it a non-feature-flag runtime opttion
As documented at the top of runtime_guards these are for transient features and it sounds like we're not sure of this one

@kyessenov
Copy link
Contributor Author

@markdroth Can you give some guidance from xDS angle - do we need to extend google_grpc xDS stub to support full gRPC TLS configuration? How are certificate frameworks supposed to interop between gRPC and Envoy?

@markdroth
Copy link
Contributor

I discussed this with @kyessenov on slack last week. Fleshing out what we discussed there a bit, I think we have the following options:

  1. Configure gRPC to use xDS itself, in which case it will establish its own connection to the control plane and request its own config, just as it would if it were running outside of Envoy. To do this, we'd need to configure the gRPC xDS bootstrap in Envoy's environment so that the gRPC xDS client can function. The down-side of this approach is the independent communication with the control plane from the rest of Envoy.
  2. Configure gRPC using TlsCredentials with the FileWatcherCertificateProvider. The main thing that would be needed here would be to to represent the cert provider configuration in the GoogleGrpc config somehow. The one significant down-side of directly configuring the cert provider here is that this nullifies the main benefit of the cert provider API, which is that details like the local paths to read the certs from are configured in the bootstrap instead of being sent by the control plane. (In gRPC's xDS code, the cert provider configuration is in the bootstrap file under a particular instance name, and the control plane just tells the client which cert provider instance to use, which means that control plane doesn't need to know deployment-specific details like how the cert provider is configured. But in this option, we would be encoding the cert provider config in the GoogleGrpc message that is sent from the control plane.)
  3. Add a mechanism in Envoy GoogleGrpc that uses Envoy code to get the certs and then passes them to gRPC. There are two main ways we could inject the certs into gRPC: we could write them to the local filesystem and then configure TlsCredentials to read those files, or we could inject them into TlsCredentials via a StaticDataCertificateProvider. (The latter doesn't currently have the capability to update the cert dynamically after the initial creation of the gRPC channel, but we could provide an API to do that on the gRPC side if needed.) The main down-side of this approach is that it would be tricky for this configuration to be something that gRPC itself could use if/when we need to support GoogleGrpc messages in the future.
  4. As a variant of the option (3), we could add support for cert provider instances to Envoy's bootstrap, and then add configuration in GoogleGrpc to refer to those cert provider instances. This would be more consistent with how cert providers are really expected to be configured in gRPC (i.e., the control plane would not need to send local paths), and it would also give us a nice path forward to eventually adding cert provider support more broadly in Envoy (i.e., supporting it in the TLS transport socket, just like gRPC does). It would also be completely consistent with how gRPC itself would want to use the GoogleGrpc message, if/when we need to do that in the future.

I think option 4 is the most appealling of these options.

@alyssawilk
Copy link
Contributor

for (1) I don't know why we'd want to use the google-grpc xds fetch - it feel like as with any other envoy extension point, we could have Envoy config to do an xDS fetch of "arbitrary" (in this case bootstrap) config which would be handed to the gRPC client to do what it wills with.

For any of 1-4, do we have anyone planning to work on it or plausible line items in some team's 2025 roadmap? If not I'd like to understand where this knob is used (I assume google3?) and suggest we replace the existing runtime guard (designed for transient features only) to a boolean runtime field (which can hang around indefinitely until the feature is handled)

@kyessenov
Copy link
Contributor Author

@alyssawilk I will take the task to move it to the boolean field. This is used everywhere except google3 for Istio/CSM in FIPS mode, since boringssl cannot use TLSv1.3 in FIPS mode for the certified builds.

@alyssawilk
Copy link
Contributor

excellent thanks - I was a bit wary to move it myself as I didn't want to break whoever depended on it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants