Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set HTTP2 keepalive timeout and interval via env in proxy container #12269

Closed
UsingCoding opened this issue Mar 15, 2024 · 5 comments
Closed

Comments

@UsingCoding
Copy link
Contributor

What problem are you trying to solve?

When linkerd2 is used for proxying HTTP/2 (gRPC) traffic there is maybe issue with tcp sockets dangling. In this case HTTP/2 ping frames or keepalive messages can help with it.
gRPC documentation for keepalive messages

In linkerd2-proxy setting keepalive timeout and interval already implemented here: issue and pull-request. It's implemened in 2020, but now in proxy source code there is no way to set up keepalive via env.

I found that keepalive can be setup in h2::Settings but the code that configures the h2::Settings from environment variables (linkerd/app/src/env.rs) does not set the parameter keepalive_timeout.
Which means - there is no ping frames sent by linkerd-proxy intentionally and there is no way to configure it

How should the problem be solved?

Linkerd2-proxy should have env variables to configure keepalive timeout and interval.
Something like this:
LINKERD2_PROXY_INBOUND_HTTP2_KEEPALIVE_INTERVAL
LINKERD2_PROXY_OUTBOUND_HTTP2_KEEPALIVE_INTERVAL
LINKERD2_PROXY_INBOUND_HTTP2_KEEPALIVE_TIMEOUT
LINKERD2_PROXY_OUTBOUND_HTTP2_KEEPALIVE_TIMEOUT

Any alternatives you've considered?

I don't have

How would users interact with this feature?

Manual set up env variable in proxy container or in inject chart config

Would you like to work on this feature?

yes

@olix0r
Copy link
Member

olix0r commented Mar 18, 2024

@UsingCoding Thanks for pointing this out.

The proxy already supports LINKERD2_PROXY_INBOUND_ACCEPT_KEEPALIVE, LINKERD2_PROXY_INBOUND_CONNECT_KEEPALIVE, LINKERD2_PROXY_OUTBOUND_ACCEPT_KEEPALIVE, LINKERD2_PROXY_OUTBOUND_CONNECT_KEEPALIVE. That is, we support configuring independent keeplalives for the inbound server, inbound client, outbound server, and outbound clients.

In the original PR you reference, these values were used to configure the h2 keepalives. It appears that, at some point, there was a regression that lost that configuration.

Would it be suitable to restore that behavior without introducing new configurations?

@UsingCoding
Copy link
Contributor Author

@olix0r there is several points why i added new configurations:

  1. ACCEPT_KEEPALIVE & CONNECT_KEEPALIVE is more specific for TCP and for proxying common TCP traffic, for HTTP/2 make sense to configure pings and use different timeout nether than for TCP ACCEPT_KEEPALIVE & CONNECT_KEEPALIVE, more smaller to make ping more frequently
  2. PR in addition to timeout introduces configuration for interval (how frequently sent ping frames), in original PR it was interval = timeout / 4 which is also not suitable for most cases. It's more reasonable to setup interval smaller than timeout. More frequent HTTP/2 ping will allow proxy to detect dangling connections freezes faster.

If it's necessary to restore previous behavior(I don't have enough information to make a judgment) for configuration we can make some tricks:

  • Use ACCEPT_KEEPALIVE & CONNECT_KEEPALIVE to configure HTTP/2 ping by only when LINKERD2_PROXY_{INBOUND,OUTBOUND}_HTTP2_KEEPALIVE_TIMEOUT is not set.
  • So when LINKERD2_PROXY_{INBOUND,OUTBOUND}_HTTP2_KEEPALIVE_TIMEOUT is defined is it for HTTP/2 ping configuration

In my opinion, this is not so important - after all, the behavior has been lost for a long time and there were no recovery requests, perhaps such behavior was not expected for the end user of the proxy.

@olix0r it's up to you to decide(restore or not) - on your decision I'll make edits to PR

@olix0r olix0r self-assigned this Mar 21, 2024
@UsingCoding
Copy link
Contributor Author

@olix0r I supported the behavior of putting variables as in PR, leaving support for new variables(LINKERD2_PROXY_{INBOUND,OUTBOUND}_HTTP2_KEEPALIVE_{TIMEOUT,INTERVAL}), as suggested in the comment.

So I also removed the keepalive timeout and intervals to the control plane(which added by myself). Since the same timeouts as between proxy containers are not suitable for connections to the control plane

Copy link

stale bot commented Jun 26, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jun 26, 2024
@UsingCoding
Copy link
Contributor Author

After additional exploration about gRPC keepalive i came to conclusion: gRPC keepalive are not suitable for linkerd as service mesh and may be potentially dangerous.

Why it's dangerous:
Linkerd has no rollout strategy(and should not have to) that application server will be rolled out earlier than application client when applying gRPC Keepalive policy, so during rollout application may encount closed gRPC connections due to violated KEEPALIVE_TIME and KEEPALIVE_TIMEOUT docs on application server.

So, having gRPC Keepalive in linkerd when it used as Service Mesh is dangerous and should not be implemented.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.