Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable TLS on gRPCIngress if RAY_USE_TLS is on #34403

Merged
merged 1 commit into from
May 3, 2023

Conversation

ashahab
Copy link
Contributor

@ashahab ashahab commented Apr 14, 2023

Enabling TLS for the serve GRPC endpoints.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ashahab ashahab force-pushed the abin-tls branch 4 times, most recently from 3883d4c to af1e460 Compare April 15, 2023 04:20
@ashahab ashahab changed the title [WIP] Enable TLS on gRPCIngress if RAY_USE_TLS is on Enable TLS on gRPCIngress if RAY_USE_TLS is on Apr 15, 2023
)
address = "[::]:{}".format(self.port)
try:
self.grpc_port = add_port_to_grpc_server(self.server, address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In high level, we should make secured/insecure configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the add_port_to_grpc_server(self.server, address) function, which internally depends on RAY_USE_TLS environment variable. Does that make it configurable? I can add a comment in my code to indicate this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashahab , RAY_USE_TLS will trigger all communication internally with TLS, is this the case you are expecting? Or you only want to ingress port to be TLS secured?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest having separate variable to control it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sihanwang41 Thank you for your review.
If RAY_USE_TLS is separate from another variable (e.g. RAY_USE_TLS_INGRESS), that may confuse the meaning of "RAY_USE_TLS" and allow insecure communication when it's on.
IMHO it is better to error on the side of caution and ensure all endpoints(ingress, headnode, and worker) encrypt on the wire when RAY_USE_TLS is on. This also allows reusing the add_port_to_grpc_server function which already has built in support. Happy to hear more on your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ashahab , if you set RAY_USE_TLS, all internal communication will be under TLS (potentially hurting performance), I think you only want to have ingress port as TLS right?
If yes above,
RAY_SERVE_GRPC_TLS_INGRESS = True or RAY_USE_TLS = True, we both set the secure port. Otherwise we use insecure port, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sihanwang41 Thank you for the prompt response.

Yes, I agree with you on the potential performance hit with RAY_USE_TLS.
My intent and proposal is to encrypt all the endpoints, and not just the ingress port. From a security perspective, this is the safer approach, given that if the data that is consumed by the endpoint needs to be encrypted between client and RayServe, it is unlikely that it does not need to be encrypted when it's routed from head to worker.

As for performance, I plan to follow this up with some benchmarks and potential improvements on the TLS communication:

  1. Ensuring TLS 1.3 is used at each endpoint, reducing handshake time by merging server hello, client key verify, cert, and verify.
  2. Session resumption
  3. Dynamic record sizes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sihanwang41 I'd like. your feedback on this. Thank you!

@sihanwang41
Copy link
Contributor

lint is still failing

Signed-off-by: Abin Shahab <ashahab@linkedin.com>
@ashahab
Copy link
Contributor Author

ashahab commented May 1, 2023

@sihanwang41 Thank you for the approval. Please let me know when/how it can be merged? We plan to use this capability soon. Thank you!

@sihanwang41
Copy link
Contributor

Hi @edoakes ^^ can you take a look?

@edoakes edoakes merged commit ecc41df into ray-project:master May 3, 2023
architkulkarni pushed a commit to architkulkarni/ray that referenced this pull request May 16, 2023
Signed-off-by: Abin Shahab <ashahab@linkedin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants