Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: Teardown serverside connections on error #747

Merged
merged 8 commits into from
Nov 18, 2020
Merged

http: Teardown serverside connections on error #747

merged 8 commits into from
Nov 18, 2020

Conversation

olix0r
Copy link
Member

@olix0r olix0r commented Nov 17, 2020

When the proxy encounters an error, it's usually the case that we want
to gracefully teardown the serverside connection so that the
application's client has an opportunity to re-resolve the endpoint
before reconnecting.

This change configures HTTP servers so that the error-handling layer
initiates server shutdown when an error is not a request timeout. Socket
errors, failfast errors, etc, are now met with a 502 Bad Gateway, as
they were before; but after in-flight requests are responded to, the
proxy's server closes its connection with the application client.

Addresses linkerd/linkerd2#5209

hawkw and others added 4 commits November 18, 2020 15:46
This change modifies the ClientAddr extension to be ClientHandle, which
holds a handle that can be used to signal shutdown to HTTP server tasks.
It's not useful in its current place in the stack.
@olix0r olix0r changed the title Ver/connfail http: Teardown serverside connections on error Nov 18, 2020
When the proxy encounters an error, it's usually the case that we want
to gracefully teardown the serverside connection so that the
application's client has an opportunity to re-resolve the endpoint
before reconnecting.

This change configures HTTP servers so that the error-handling layer
initiates server shutdown when an error is not a request timeout. Socket
errors, failfast errors, etc, are now met with a 502 Bad Gateway, as
they were before; but after in-flight requests are responded to, the
proxy's server closes its connection with the application client.
@olix0r olix0r marked this pull request as ready for review November 18, 2020 16:22
@olix0r olix0r requested a review from a team November 18, 2020 16:22
Copy link
Contributor

@kleimkuhler kleimkuhler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Adding the close handle to the request extension makes sense, and it works as expected when used with linkerd/linkerd2#5227

self.0.notified().await
}

pub fn ignore(self) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this for future use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, since Closed has a must_use, this seemed like a nice convenience.

Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great to me!

we should probably add tests on the inbound side as well, i suppose? but we can do that later.

Comment on lines +228 to +229
} else if let Some(e) = error.source() {
should_teardown_connection(e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it occurs to me that it might be more efficient to do source-chain traversal with a loop rather than recursively in these functions. but, i don't think we ever have a lot of errors that are more than one or two sources deep, so i don't think this actually matters.

Comment on lines 18 to 25
/// A handle that signals the client connection to close.
#[derive(Clone, Debug)]
pub struct Close(Arc<Notify>);

/// A handle to be notified when the client connection is complete.
#[must_use = "Closed handle must be used"]
#[derive(Debug)]
pub struct Closed(Arc<Notify>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice how simple this is, seems good to me. I was a little surprised we didn't tie do this with the existing drain code, rather than adding a separate channel for signalling a connection should be closed, but after looking at it, I realize that the drain channel is significantly more code than this case needs...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it's kind of inverted. With drain, we have multiple observers and a single notifier. With this, we have a single observer and many notifiers.

Comment on lines 176 to 191
tokio::select! {
res = &mut conn => {
debug!("The client is shutting down the connection");
res?
}
shutdown = drain.signal() => {
debug!("The process is shutting down the connection");
Pin::new(&mut conn).graceful_shutdown();
shutdown.release_after(conn).await?;
}
() = closed.closed() => {
debug!("The stack is tearing down the connection");
Pin::new(&mut conn).graceful_shutdown();
conn.await?;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the new debug logs for indicating why the connection is being shut down is really nice, good call. do we know that this future will be wrapped in a span that clearly indicates which task is being shut down?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the server sets an info_span that should be available here

@olix0r olix0r merged commit f7a8ee9 into main Nov 18, 2020
@olix0r olix0r deleted the ver/connfail branch November 18, 2020 18:39
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Nov 18, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionaly, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Nov 18, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionaly, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Nov 19, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
alpeb pushed a commit to linkerd/linkerd2 that referenced this pull request Nov 30, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
alpeb pushed a commit to alpeb/linkerd2 that referenced this pull request Dec 1, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
alpeb added a commit to alpeb/linkerd2 that referenced this pull request Dec 1, 2020
* proxy: v2.119.0 (linkerd#5200)

This release modifies the default idle timeout to 5s for outbound
clients and 20s for inbound clients. This prevents idle clients from
consuming memory at the cost of performing more discovery resolutions
for periodic but infrequent traffic. This is intended to reduce the
proxy's memory footprint, especially on Prometheus instances.

The proxy's *ring* and rustls dependencies have also been updated.

---

* Update *ring* and rustls dependencies (linkerd/linkerd2-proxy#735)
* http: Configure client connection pools (linkerd/linkerd2-proxy#734)

* Add endpoint to GetProfile response (linkerd#5227)

Context: linkerd#5209

This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.

The `Endpoint` field is only set if the IP maps to a Pod--not a Service.

Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.

### Examples

```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local"  retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090"  weight:10000}
```

Before:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```

After:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  endpoint:{addr:{ip:{ipv4:170524692}}  weight:10000  metric_labels:{key:"control_plane_ns"  value:"linkerd"}  metric_labels:{key:"deployment"  value:"fast-1"}  metric_labels:{key:"pod"  value:"fast-1-5cc87f64bc-9hx7h"}  metric_labels:{key:"pod_template_hash"  value:"5cc87f64bc"}  metric_labels:{key:"serviceaccount"  value:"default"}  tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}}  protocol_hint:{h2:{}}}
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>

* cli: Fix custom namespace installation (linkerd#5241)

The `--linkerd-namespace` flag was not honored by the `install`
command. This change updates the install templating to use the
value of this flag.

* cli: Don't check for SAN in root and intermediate certs (linkerd#5237)

As discussed in linkerd#5228, it is not correct for root and intermediate
certs to have SAN. This PR updates the check to not verify the
intermediate issuer cert with the identity dns name (which checks with
SAN and not CN as the the `verify` func is used to verify leaf certs and
not root and intermediate certs). This PR also avoids setting a SAN
field when generating certs in the `install` command.

Fixes linkerd#5228

* proxy: v2.121.0 (linkerd#5253)

This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)

* Check correct label value when setting protocl hint (linkerd#5267)

This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.

A test for this has been added to linkerd#5266.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>

* proxy: v2.122.0 (linkerd#5279)

This release addresses some issues reported around clients seeing
max-concurrency errors by increasing the default in-flight request limit
to 100K pending requests.

Additionally, the proxy now sets an appropriate content-type when
synthesizing gRPC error responses.

---

* style: fix some random clippy lints (linkerd/linkerd2-proxy#749)
* errors: Set `content-type` for synthesized grpc errors (linkerd/linkerd2-proxy#750)
* concurrency-limit: Drop permit on readiness (linkerd/linkerd2-proxy#751)
* Increase the default buffer capacity to 100K (linkerd/linkerd2-proxy#752)
* Change default max-in-flight and buffer-capacity (linkerd/linkerd2-proxy#753)

* notes for 2.9.1

Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
alpeb pushed a commit to linkerd/linkerd2 that referenced this pull request Dec 1, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
GMarkfjard pushed a commit to GMarkfjard/linkerd2 that referenced this pull request Dec 2, 2020
This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)
alpeb added a commit to linkerd/linkerd2 that referenced this pull request Dec 4, 2020
* proxy: v2.119.0 (#5200)

This release modifies the default idle timeout to 5s for outbound
clients and 20s for inbound clients. This prevents idle clients from
consuming memory at the cost of performing more discovery resolutions
for periodic but infrequent traffic. This is intended to reduce the
proxy's memory footprint, especially on Prometheus instances.

The proxy's *ring* and rustls dependencies have also been updated.

---

* Update *ring* and rustls dependencies (linkerd/linkerd2-proxy#735)
* http: Configure client connection pools (linkerd/linkerd2-proxy#734)

* cli: Remove get cmd and relevant tests (#5202)

Fixes #5190

`linkerd get` is not used currently and works only for pods. This can be
removed instead as per the issue. This branch removes the command and
also the associated unit and integration tests.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* cli: remove logs subcommand and tests (#5203)

Fixes #5191

The logs command adds a external dependency that we forked to work but
does not fit within linkerd's core set of responsibilities. Hence, This
is being removed.

For capabilities like this, The Kubernetes plugin ecosystem has better
and well maintained tools that can be used.

Signed-off-by: Tarun Pothulapati <tarunpothulapati@outlook.com>

* Remove logs comparisons in integration tests (#5223)

The rare cases where these tests were useful don't make up for the burden of
maintaing them, having different k8s version change the messages and
having unexpected warnings come up that didn't affect the final
convergence of the system.

With this we also revert the indirection added back in #4538 that
fetched unmatched warnings after a test had failed.

* Add endpoint to GetProfile response (#5227)

Context: #5209

This updates the destination service to set the `Endpoint` field in `GetProfile`
responses.

The `Endpoint` field is only set if the IP maps to a Pod--not a Service.

Additionally in this scenario, the default Service Profile is used as the base
profile so no other significant fields are set.

### Examples

```
# GetProfile for an IP that maps to a Service
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.43.222.0:9090
INFO[0000] fully_qualified_name:"linkerd-prometheus.linkerd.svc.cluster.local"  retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  dst_overrides:{authority:"linkerd-prometheus.linkerd.svc.cluster.local.:9090"  weight:10000}
```

Before:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2 min_retries_per_second:10 ttl:{seconds:10}}
```

After:

```
# GetProfile for an IP that maps to a Pod
❯ go run controller/script/destination-client/main.go -method getProfile -path 10.42.0.20
INFO[0000] retry_budget:{retry_ratio:0.2  min_retries_per_second:10  ttl:{seconds:10}}  endpoint:{addr:{ip:{ipv4:170524692}}  weight:10000  metric_labels:{key:"control_plane_ns"  value:"linkerd"}  metric_labels:{key:"deployment"  value:"fast-1"}  metric_labels:{key:"pod"  value:"fast-1-5cc87f64bc-9hx7h"}  metric_labels:{key:"pod_template_hash"  value:"5cc87f64bc"}  metric_labels:{key:"serviceaccount"  value:"default"}  tls_identity:{dns_like_identity:{name:"default.default.serviceaccount.identity.linkerd.cluster.local"}}  protocol_hint:{h2:{}}}
```

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>

* cli: Fix custom namespace installation (#5241)

The `--linkerd-namespace` flag was not honored by the `install`
command. This change updates the install templating to use the
value of this flag.

* cli: Don't check for SAN in root and intermediate certs (#5237)

As discussed in #5228, it is not correct for root and intermediate
certs to have SAN. This PR updates the check to not verify the
intermediate issuer cert with the identity dns name (which checks with
SAN and not CN as the the `verify` func is used to verify leaf certs and
not root and intermediate certs). This PR also avoids setting a SAN
field when generating certs in the `install` command.

Fixes #5228

* proxy: v2.121.0 (#5253)

This release changes error handling to teardown the server-side
connection when an unexpected error is encountered.

Additionally, the outbound TCP routing stack can now skip redundant
service discovery lookups when profile responses include endpoint
information.

Finally, the cache implementation has been updated to reduce latency by
removing unnecessary buffers.

---

* h2: enable HTTP/2 keepalive PING frames (linkerd/linkerd2-proxy#737)
* actions: Add timeouts to GitHub actions (linkerd/linkerd2-proxy#738)
* outbound: Skip endpoint resolution on profile hint (linkerd/linkerd2-proxy#736)
* Add a FromStr for dns::Name (linkerd/linkerd2-proxy#746)
* outbound: Avoid redundant TCP endpoint resolution (linkerd/linkerd2-proxy#742)
* cache: Make the cache cloneable with RwLock (linkerd/linkerd2-proxy#743)
* http: Teardown serverside connections on error (linkerd/linkerd2-proxy#747)

* Check correct label value when setting protocl hint (#5267)

This fixes an issue where the protocol hint is always set on endpoint responses.
We now check the right value which determines if the pod has the required label.

A test for this has been added to #5266.

Signed-off-by: Kevin Leimkuhler <kevin@kleimkuhler.com>

* Add safe accessor for Global in linkerd-config (#5269)

CLI crashes if linkerd-config contains unexpected values.

Add a safe accessor that initializes an empty Global on the first
access. Refactor all accesses to use the newly introduced accessor using
gopls.

Add test for linkerd-config data without Global.

Fixes #5215

Co-authored-by: Itai Schwartz <yitai27@gmail.com>
Signed-off-by: hodbn <hodbn@users.noreply.github.com>

* proxy: v2.122.0 (#5279)

This release addresses some issues reported around clients seeing
max-concurrency errors by increasing the default in-flight request limit
to 100K pending requests.

Additionally, the proxy now sets an appropriate content-type when
synthesizing gRPC error responses.

---

* style: fix some random clippy lints (linkerd/linkerd2-proxy#749)
* errors: Set `content-type` for synthesized grpc errors (linkerd/linkerd2-proxy#750)
* concurrency-limit: Drop permit on readiness (linkerd/linkerd2-proxy#751)
* Increase the default buffer capacity to 100K (linkerd/linkerd2-proxy#752)
* Change default max-in-flight and buffer-capacity (linkerd/linkerd2-proxy#753)

* proxy: v2.123.0 (#5301)

This release removes a potential panic: it was assumed that looking up a
socket's peer address was infallible, but in practice this call can
fail when a host is under high load. Now these failures only impact the
connection-level task and not the whole proxy proces.

Also, the `process_cpu_seconds_total` metric is now exposed as a float
so that its value may include fractional seconds with 10ms granularity.

---

* io: Make peer_addr fallible (linkerd/linkerd2-proxy#755)
* metrics: Expose process_cpu_seconds_total as a float (linkerd/linkerd2-proxy#754)

* Release notes for stable-2.9.1

## stable-2.9.1

This stable release contains a number of proxy enhancements: better support for
high-traffic workloads, improved performance by eliminating unnecessary endpoint
resolutions for TCP traffic and properly tearing down serverside connections
when errors occur, and reduced memory consumption on proxies which maintain many
idle connections (such as Prometheus' proxy).

On the CLI and control plane sides, it relaxes checks on root and intermediate
certificates (following X509 best practices), and fixes two issues: one that
prevented installation of the control plane into a custom namespace and one
which failed to update endpoint information when a headless service was
modified.

* Proxy:
  * Addressed some issues reported around clients seeing max-concurrency errors
    by increasing the default in-flight request limit to 100K pending requests
  * Reduced the default idle connection timeout to 5s for outbound clients and
    for inbound clients to reduce the proxy's memory footprint, especially on
      Prometheus instances
  * Fixed an issue where the proxy did not receive updated endpoint information
    when a headless service was modified
  * Added HTTP/2 keepalive PING frames
  * Removed logic to avoid redundant TCP endpoint resolution
  * Fixed an issue where serverside connections were not torn down when an error
    occurred

* CLI / Control Plane:
  * Fixed a CLI issue where the `linkerd-namespace` flag was not honored when
    passed to the `install` and `upgrade` commands
  * Updated `linkerd check` so that it doesn't attempt to validate the subject
    alternative name (SAN) on root and intermediate certificates. SANs for leaf
    certificates will continue to be validated
  * Fixed an issue in the destination service where endpoints always included a
    protocol hint, regardless of the controller label being present or not
  * Removed the `get` and `logs` command from the CLI
  * No longer panic in rare cases when `linkerd-config` doesn't have an entry
    for `Global` configs (thanks @hodbn!)

* proxy: v2.124.0 (#5323)

This release updates the proxy's `*ring*` dependency to pick up the
latest changes from BoringSSL.

Additionally, we've audited uses of non-cryptographic random number
generators in the proxy to ensure that each balancer/router intializes
its own RNG state.

---

* Audit uses of SmallRng (linkerd/linkerd2-proxy#757)
* Update *ring* to 0.6.19 (linkerd/linkerd2-proxy#758)
* metrics: Support the Summary metric type (linkerd/linkerd2-proxy#756)

Co-authored-by: Oliver Gould <ver@buoyant.io>
Co-authored-by: Tarun Pothulapati <tarunpothulapati@outlook.com>
Co-authored-by: Kevin Leimkuhler <kevin@kleimkuhler.com>
Co-authored-by: hodbn <hodbn@users.noreply.github.com>
Co-authored-by: Itai Schwartz <yitai27@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants