Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: refuse incoming connections unless dialback succeeds #84289

Closed
knz opened this issue Jul 12, 2022 · 11 comments · Fixed by #94778
Closed

rpc: refuse incoming connections unless dialback succeeds #84289

knz opened this issue Jul 12, 2022 · 11 comments · Fixed by #94778
Assignees
Labels
A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team

Comments

@knz
Copy link
Contributor

knz commented Jul 12, 2022

Is your feature request related to a problem? Please describe.

Followup to #49220 / cockroachlabs/support#1690

We know of multiple situations where a cluster can find itself into an asymmetric partition, whtiich causes all kinds of symptoms (#49220), these include at least:

  • low-level network partitions, when e.g IP packets get dropped in one direction but not the other
  • firewall partitions, when e.g. existing TCP sessions stay alive but new TCP sessions cannot be established
  • TLS misconfigurations, when two nodes use different client/server certs, so that the TLS handshake succeeds in one direction but fails in the other
  • DNS misconfigurations, where the advertised hostname of a node is not resolvable from (some) other nodes

It would be good if we had some automation to exclude nodes which appear to be partially partitioned away. (and require operator attention)

Describe the solution you'd like

We could have a couple relatively simple mechanisms to protect a cluster:

Two point-to-point mechanisms, to protect against pairwise partions:

  1. When receiving an incoming connection from another node, refuse to respond to heartbeats until we get a successful outgoing dial to the same node. In other words: when n2 connects to n1, have n1 refuse the conn until it can dial back to n2 successfully.

  2. When an outgoing dial fails, or a heartbeat fails, keep a timestamp of the failure for the remote node ID, and when receiving a heartbeat from that ID, refuse to respond to the heartbeat if there's a recent failure (and possibly actively close the connection). In other words, start refusing a heartbeat from n2 to n1, if n1 has failed to connect/heartbeat to n2 recently.

Then a cluster-wisde mechanism, to protect against global partitions (e.g. n1-n2 and n2-n3 can connect, but not n1-n3)

  1. if one of the two mechanisms above decides to exclude an ID, they would also register a gossip entry (with suitable TTL) that says the node is potentially blocked, containing the IDs of both the blocked node and the decider node. Then, on every node, use these gossip entries as follows: if a node X connects to node Y, and node Y is in a "blocked node" entry created by node Z, and node X has a valid connection to the node Z (i.e. X and Z are in the same side of the partition), then have X proactively start blocking Y too.

Jira issue: CRDB-17572

gz#13169

Epic CRDB-2488

@knz knz added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 12, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Jul 12, 2022
@knz knz added A-kv-server Relating to the KV-level RPC server T-kv KV Team and removed T-kv KV Team labels Jul 12, 2022
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jul 12, 2022

This is related to #70111, in that we don't currently attempt to form a full mesh network -- a node will only connect to a different node if some component needs to reach it. I don't know exactly if or how this impacts the proposal here, but it's possible that we may find ourself in a situation where a node connects to a peer but the peer does not attempt to connect back (unless we explicitly make it so).

@knz
Copy link
Contributor Author

knz commented Jul 12, 2022

we may find ourself in a situation where a node connects to a peer but the peer does not attempt to connect back (unless we explicitly make it so).

100% agree, hence the proposal above to:

When receiving an incoming connection from another node, refuse to respond to heartbeats until we get a successful outgoing dial to the same node. In other words: when n2 connects to n1, have n1 refuse the conn until it can dial back to n2 successfully.

@zhoufang-joe
Copy link
Contributor

Do we have ETA for this one?
We recently experienced the very similar issue: when a host DNS was unavailable, but the one direction connection/dial still worked, the cluster rejected almost all the incoming client connections to the cluster. Thus, it caused a production outage.
Thx

@erikgrinaker
Copy link
Contributor

As seen in https://github.com/cockroachlabs/support/issues/1875, 22.2 includes two additional protections that may help here, by preventing lease transfers to nodes that aren't able to apply the lease:

These are particularly relevant for the case where a lease is transferred to a node which is able to establish outbound connections (and thus can successfully heartbeat), but can't receive inbound connections (in which case it won't catch up on the Raft log and can't apply the lease). Previously, this would cause the range to become unavailable. However, this doesn't help when leases are already present on the semi-partitioned node, so the proposal here is still relevant (in particular by preventing the node from heartbeating, assuming the liveness leaseholder is also one-way partitioned from the node).

Do we have ETA for this one?

No concrete ETA I'm afraid, but we're very aware of this class of problem.

@ajwerner
Copy link
Contributor

A simpler thing than described here which feels worth doing is finding a way to at least observe and identify when one-way partitions do happen. Today we can't tell. The latency tracking code is happy to use just one direction of connection to update its view.

@knz
Copy link
Contributor Author

knz commented Nov 16, 2022

The latency tracking code is happy to use just one direction of connection to update its view.

Technically both sides of the connection maintain the latency metric for "their side" and we can see both in the UI (across the diagonal). We'd lose information if we merged them together.

@erikgrinaker
Copy link
Contributor

In addition to this, we should require that all connection classes can be established. We have seen cases where nodes were able to establish one but not the other connection class due to a bug in the RPC circuit breakers.

@andrewbaptist
Copy link
Collaborator

Looking at solutions for this (both the initial and ongoing partitions), there are two general ways it could be done:

  1. Don't send heartbeat PingResponse unless a backward connection is set up. If the reverse connection cannot be established or broken, stop sending the PingResponse messages. (what is described in the original proposal)
  2. Add a field to PingResponse called badReverseConnections which specifies the list of reverse connections that are in an error state. A receiver of a PingResponse would treat a message with this field set as an error to the caller.

The second since it allows for differentiating why the heartbeats are not making it back and has more information such as the status of each connection class. It is not exactly clear what we should do if some connection classes are able to connect backward and others can't but at least we would have the information for logging/debugging purposes.

Regarding the gossip of the data and its usage for blocking. I agree it is useful to put the data in gossip, but I'm less clear about exactly how it should be used. If we are blocking certain connections based on "hearsay" then it seems possible that a bad node can cause a larger outage than it should (in the example, what if X is connected to both Y and Z and both report the other through gossip)? I'm not exactly sure about the conditions where this would happen, but it seems possible. Unless anyone feels strongly I wasn't going to use this "proxy" information initially, but it will be available in the logs / gossip dump.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jan 5, 2023

I think we can start simple with 1 and not send ping responses. We can log an error on the rejecting node, which I think should be sufficient and wouldn't need any protocol changes.

As for the gossip aspect, let's defer that. This is related to global failure detection, and there are multiple conversations going on about this, so let's not implement anything until we have a clear picture of how failure detection should work in general (in particular wrt. leases).

@andrewbaptist
Copy link
Collaborator

The implementation of this was a little more complicated than I originally thought due to the behavior of the
The original thought was that it would look something like this:

Time 0 - Node A connects to Node B
Time 1 - Node A calls `heartbeatClient.Ping` to Node B
Time 2 - Node B receives PingRequest from A - does not send PingResponse since no backward connection
Time 3 - Node B connects to Node A
Time 4 - Node B calls `heartbeatClient.Ping` to Node A
Time 5 - Node A receives PingRequest from B - notices that a connection attempt is in progress and waits for it 

But this doesn't work because they are deadlocked. I was hoping to examine the state of the connection that is being used since there are really three "important" states

  1. TCP connection set up
  2. PingRequest sent
  3. PingResponse received

However, there does not appear to be a reliable way to tell after either 1 or 2 is complete with the way our connections are set up today.

The approach I am looking at now is the following:

Time 0 - Node A connects to Node B
Time 1 - Node A calls `heartbeatClient.Ping` to Node B
Time 2 - Node B receives PingRequest from A - does not send PingResponse since no fully established backward connection
Time 3 - Node B connects to Node A on a new connection set up with `grpc.WithBlock()`
Time 4 - Node B waits until TCP connection is established
Time 5 - Node B sends a PingResponse to node A (on the original connection)
Time 6 - Node B closes the TCP connection it just established

The other alternative is to send an error back immediately on the first PingRequest but that has the unfortunate impact of also failing the first request sent and also not sending another PingResponse for a while.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Jan 24, 2023
Previously the latency to remote nodes was tracked by address rather
than the node's id. This could result in a few problems. First, the
remote address could be reused across nodes. This could result in
incorrect information. Additionally, places that used this information
(such as the allocator) needed to unnecessarily map the node id to
address just to do a lookup.

Finally in preparation for dialback on heartbeat cockroachdb#84289 the use of the
OriginAddr field in the PingRequest will change to be the actual address
that a node should use to dial back. Currently this field is not set
correctly.

Epic: none
Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Jan 25, 2023
Previously the latency to remote nodes was tracked by address rather
than the node's id. This could result in a few problems. First, the
remote address could be reused across nodes. This could result in
incorrect information. Additionally, places that used this information
(such as the allocator) needed to unnecessarily map the node id to
address just to do a lookup.

Finally in preparation for dialback on heartbeat cockroachdb#84289 the use of the
OriginAddr field in the PingRequest is deprecated. That PR will add back
another field that can be used to lookup the Locality correct field to
use.

Epic: none
Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Jan 25, 2023
Previously the latency to remote nodes was tracked by address rather
than the node's id. This could result in a few problems. First, the
remote address could be reused across nodes. This could result in
incorrect information. Additionally, places that used this information
(such as the allocator) needed to unnecessarily map the node id to
address just to do a lookup.

Finally in preparation for dialback on heartbeat cockroachdb#84289 the use of the
OriginAddr field in the PingRequest is deprecated. That PR will add back
another field that can be used to lookup the Locality correct field to
use.

Epic: none
Release note: None
craig bot pushed a commit that referenced this issue Jan 25, 2023
94438: sql,storage: add support for COL_BATCH_RESPONSE scan format r=yuzefovich a=yuzefovich

This commit introduces a new `COL_BATCH_RESPONSE` scan format for Scans
and ReverseScans which results only in needed columns to be returned
from the KV server. In other words, this commit introduces the ability
to perform the KV projection pushdown.

The main idea of this feature is to use the injected decoding logic from
SQL in order to process each KV and keep only the needed parts (i.e.
necessary SQL columns). Those needed parts are then propagated back to
the KV client as coldata.Batch'es (serialized in the Apache Arrow format).

Here is the outline of all components involved:
```
     ┌────────────────────────────────────────────────┐
     │                       SQL                      │
     │________________________________________________│
     │          colfetcher.ColBatchDirectScan         │
     │                        │                       │
     │                        ▼                       │
     │                 row.txnKVFetcher               │
     │    (behind the row.KVBatchFetcher interface)   │
     └────────────────────────────────────────────────┘
                              │
                              ▼
     ┌────────────────────────────────────────────────┐
     │                    KV Client                   │
     └────────────────────────────────────────────────┘
                              │
                              ▼
     ┌────────────────────────────────────────────────┐
     │                    KV Server                   │
     │________________________________________________│
     │           colfetcher.cFetcherWrapper           │
     │ (behind the storage.CFetcherWrapper interface) │
     │                        │                       │
     │                        ▼                       │
     │              colfetcher.cFetcher               │
     │                        │                       │
     │                        ▼                       │
     │          storage.mvccScanFetchAdapter ────────┐│
     │    (behind the storage.NextKVer interface)    ││
     │                        │                      ││
     │                        ▼                      ││
     │           storage.pebbleMVCCScanner           ││
     │ (which put's KVs into storage.singleResults) <┘│
     └────────────────────────────────────────────────┘
```
On the KV client side, `row.txnKVFetcher` issues Scans and ReverseScans with
the `COL_BATCH_RESPONSE` format and returns the response (which contains the
columnar data) to the `colfetcher.ColBatchDirectScan`.

On the KV server side, we create a `storage.CFetcherWrapper` that asks the
`colfetcher.cFetcher` for the next `coldata.Batch`. The `cFetcher`, in turn,
fetches the next KV, decodes it, and keeps only values for the needed SQL
columns, discarding the rest of the KV. The KV is emitted by the
`mvccScanFetchAdapter` which - via the `singleResults` struct - exposes
access to the current KV that the `pebbleMVCCScanner` is pointing at.

Note that there is an additional "implicit synchronization" between
components that is not shown on this diagram. In particular,
`storage.singleResults.maybeTrimPartialLastRow` must be in sync with the
`colfetcher.cFetcher` which is achieved by
- the `cFetcher` exposing access to the first key of the last incomplete SQL
  row via the `FirstKeyOfRowGetter`,
- the `singleResults` using that key as the resume key for the response,
- and the `cFetcher` removing that last partial SQL row when `NextKV()`
  returns `partialRow=true`.
This "upstream" link (although breaking the layering a bit) allows us to
avoid a performance penalty for handling the case with multiple column
families. (This case is handled by the `storage.pebbleResults` via tracking
offsets into the `pebbleResults.repr`.)

This code structure deserves some elaboration. First, there is a mismatch
between the "push" mode in which the `pebbleMVCCScanner` operates and the
"pull" mode that the `NextKVer` exposes. The adaption between two different
modes is achieved via the `mvccScanFetcherAdapter` grabbing (when the control
returns to it) the current unstable KV pair from the `singleResults` struct
which serves as a one KV pair buffer that the `pebbleMVCCScanner` `put`s into.
Second, in order be able to use the unstable KV pair without performing a
copy, the `pebbleMVCCScanner` stops at the current KV pair and returns the
control flow (which is exactly what `pebbleMVCCScanner.getOne` does) back to
the `mvccScanFetcherAdapter`, with the adapter advancing the scanner only when
the next KV pair is needed.

There are multiple scenarios which are currently not supported:
- SQL cannot issue Get requests (likely will support in 23.1)
- `TraceKV` option is not supported (likely will support in 23.1)
- user-defined types other than enums are not supported (will _not_
support in 23.1)
- non-default key locking strength as well as SKIP LOCKED wait policy
are not supported (will _not_ support in 23.1).

The usage of this feature is currently disabled by default, but I intend
to enable it by default for multi-tenant setups. The rationale is that
currently there is a large performance hit when enabling it for
single-tenant deployments whereas it offers significant speed up in the
multi-tenant world.

The microbenchmarks [show](https://gist.github.com/yuzefovich/669c295a8a4fdffa6490532284c5a719)
the expected improvement in multi-tenant setups when the tenant runs in
a separate process whenever we don't need to decode all of the columns
from the table.

The TPCH numbers, though, don't show the expected speedup:
```
Q1:	before: 11.47s	after: 8.84s	 -22.89%
Q2:	before: 0.41s	after: 0.29s	 -27.71%
Q3:	before: 7.89s	after: 9.68s	 22.63%
Q4:	before: 4.48s	after: 4.52s	 0.86%
Q5:	before: 10.39s	after: 10.35s	 -0.29%
Q6:	before: 33.57s	after: 33.41s	 -0.48%
Q7:	before: 23.82s	after: 23.81s	 -0.02%
Q8:	before: 3.78s	after: 3.76s	 -0.68%
Q9:	before: 28.15s	after: 28.03s	 -0.42%
Q10:	before: 5.00s	after: 4.98s	 -0.42%
Q11:	before: 2.44s	after: 2.44s	 0.22%
Q12:	before: 34.78s	after: 34.65s	 -0.37%
Q13:	before: 3.20s	after: 2.94s	 -8.28%
Q14:	before: 3.13s	after: 3.21s	 2.43%
Q15:	before: 16.80s	after: 16.73s	 -0.38%
Q16:	before: 1.60s	after: 1.65s	 2.96%
Q17:	before: 0.85s	after: 0.96s	 13.04%
Q18:	before: 16.39s	after: 15.47s	 -5.61%
Q19:	before: 13.76s	after: 13.01s	 -5.45%
Q20:	before: 55.33s	after: 55.12s	 -0.38%
Q21:	before: 24.31s	after: 24.31s	 -0.00%
Q22:	before: 1.28s	after: 1.41s	 10.26%
```

At the moment, `coldata.Batch` that is included into the response is
always serialized into the Arrow format, but I intend to introduce the
local fastpath to avoid that serialization. That work will be done in
a follow-up and should be able to reduce the perf hit for single-tenant
deployments.

A quick note on the TODOs sprinkled in this commit:
- `TODO(yuzefovich)` means that this will be left for 23.2 or later.
- `TODO(yuzefovich, 23.1)` means that it should be addressed in 23.1.

A quick note on testing: this commit randomizes the fact whether the new
infrastructure is used in almost all test builds. Introducing some unit
testing (say, in `storage` package) seems rather annoying since we must
create keys that are valid SQL keys (i.e. have TableID / Index ID
prefix) and need to come with the corresponding
`fetchpb.IndexFetchSpec`. Not having unit tests in the `storage` seems
ok to me given that the "meat" of the work there is still done by the
`pebbleMVCCScanner` which is exercised using the regular Scans.
End-to-end testing is well covered by all of our existing tests which
now runs randomly. I did run the CI multiple times with the new feature
enabled by default with no failure, so I hope that it shouldn't become
flaky.

Addresses: #82323.
Informs: #87610.

Epic: CRDB-14837

Release note: None

95701: gossip: Track latency by nodeID rather than addr r=kvoli,erikgrinaker a=andrewbaptist

Previously the latency to remote nodes was tracked by address rather than the node's id. This could result in a few problems. First, the remote address could be reused across nodes. This could result in incorrect information. Additionally, places that used this information (such as the allocator) needed to unnecessarily map the node id to address just to do a lookup.

Finally in preparation for dialback on heartbeat #84289 the use of the OriginAddr field in the PingRequest will change to be the actual address that a node should use to dial back. Currently this field is not set correctly.

Epic: none
Release note: None

95796: ui: add CPU Time chart do statement details r=maryliag a=maryliag

This commit adds a new chart for CPU time
on Statement Details page.

Part Of #87213

<img width="1508" alt="Screen Shot 2023-01-24 at 6 01 07 PM" src="https://user-images.githubusercontent.com/1017486/214440274-c48d3bb6-ecbe-47a2-861a-0a8407d219c4.png">


Release note (ui change): Add CPU Time chart to Statement Details page.

95832: cdc: remove 'nonsensitive' tag from changefeed description in telemetry logs r=jayshrivastava a=jayshrivastava

Previously, the description field in changefeed telemetry logs was marked as `nonsensitive`. This is incorrect because the description field may contain an SQL statement which is not safe to report. This change removes the `nonsensitive` tag so the field is redacted by default.

Fixes: #95823
Epic: none
Release note: none

95838: logictest: remove smallEngineBlocks randomization r=yuzefovich a=yuzefovich

This metamorphic randomization has caused some flakiness (due to a subset of tests taking very long time) so is now removed. This feature should be tested in a more targeted fashion.

Fixes: #95799.
Fixes: #95829

Release note: None

95840: opt: replace make with dev in test instructions r=mgartner a=mgartner

Epic: None

Release note: None

95842: roachtest: fix parameters passed to require.NoError r=yuzefovich,srosenberg,herkolategan a=renatolabs

When context is passed to an assertion, the parameters *must* be a string format, followed by arguments (as you would in a call to `fmt.Sprintf`). The previous code would panic trying to cast int to string.

Informs #95416

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com>
Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
@blathers-crl
Copy link

blathers-crl bot commented Mar 8, 2023

Hi @andrewbaptist, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 8, 2023
Fixes: cockroachdb#84289

Previously one-way partitions where a node could initiate a successful
TCP connection in one direction, but the reverse connection fails which
causes problems. The node that initiates outgoing connections can
acquire leases and cause failures for reads and writes to those ranges.
This is particularly a problem if it acquires the liveness range leases,
but is a problem even for other ranges.

This commit adds an additional check during server-to-server
communication where the recipient of a new PingRequest first validates
that it is able to open a reverse connection to the initiator before
responding. Additionally, it will monitor whether it has a successful
reverse connection over time and asynchronously validate reverse
connections to the sender. The ongoing validation is asynchronous to
avoid adding delays to PingResponses as they are used for measuring
clock offsets.

Also the onlyOnceDialer prevents retrying after a dial error, however
this can get into a state where it continually retries for certain
network connections. This is not easy to reproduce in a unit tests as
it requires killing the connection using iptables (normal closes don't
cauuse this).

After this change the onlyOnceDialer will no longer repeatedly retry to
reconnect after a broken connection during setup.

Release note (bug fix): RPC connections between nodes now require RPC
connections to be established in both directions, otherwise the
connection will be closed. This is done to prevent asymmetric network
partitions where nodes are able to send outbound messages but not
receive inbound messages, which could result in persistent
unavailability. This behavior can be disabled by the cluster setting
rpc.dialback.enabled.

Epic: CRDB-2488
@craig craig bot closed this as completed in 5800482 Mar 9, 2023
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 29, 2023
Fixes: cockroachdb#99104
Informs cockroachdb#84289

As part of the previous fix for partition handling, we tracked the state
of a previous attempt and use that result on the next attempt. However
if there were multiple connections, we may only block system traffic
connections and not default class connections.  This change addresses
that by ensuring a failed dialback attempt is remembered until we are
able to successfully connect back to the pinging node.

Epic: none
Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Mar 30, 2023
Fixes: cockroachdb#99104
Informs cockroachdb#84289

As part of the previous fix for partition handling, we tracked the state
of a previous attempt and use that result on the next attempt. However
if there were multiple connections, we may only block system traffic
connections and not default class connections.  This change addresses
that by ensuring a failed dialback attempt is remembered until we are
able to successfully connect back to the pinging node.

Epic: none
Release note: None
craig bot pushed a commit that referenced this issue Mar 30, 2023
99429: ccl/sqlproxyccl: add PROXY protocol support via CLI flag to sqlproxy r=pjtatlow a=jaylim-crl

This commits adds a new `require-proxy-protocol` flag to `mt start-proxy`, and
that changes the sqlproxy's behavior to support the PROXY protocol. When the
flag is set, the protocol will be enforced on the SQL listener, and supported
on a best-effort basis on the HTTP listener. If the PROXY protocol isn't used,
but is enforced, the connection will be rejected. The rationale behind doing
best-effort basis on the HTTP listener is that some healthcheck systems don't
support the protocol.

This work is needed for the AWS PrivateLink work in CockroachCloud, which
requires the use of the PROXY protocol.

Release note: None

Epic: none

Release justification: SQL Proxy change only. Changes are needed for the AWS
PrivateLink work in CockroachCloud.

99840: rpc: Fix blackhole recv r=erikgrinaker a=andrewbaptist

Fixes: #99104
Informs #84289

As part of the previous fix for partition handling, we tracked the state of a previous attempt and use that result on the next attempt. However if there were multiple connections, we may only block system traffic connections and not default class connections.  This change addresses that by ensuring a failed dialback attempt is remembered until we are able to successfully connect back to the pinging node.

Epic: none
Release note: None

Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Andrew Baptist <baptist@cockroachlabs.com>
andrewbaptist added a commit that referenced this issue Mar 31, 2023
Fixes: #99104
Informs #84289

As part of the previous fix for partition handling, we tracked the state
of a previous attempt and use that result on the next attempt. However
if there were multiple connections, we may only block system traffic
connections and not default class connections.  This change addresses
that by ensuring a failed dialback attempt is remembered until we are
able to successfully connect back to the pinging node.

Epic: none
Release note: None
andrewbaptist added a commit to andrewbaptist/cockroach that referenced this issue Dec 21, 2023
Previously the latency to remote nodes was tracked by address rather
than the node's id. This could result in a few problems. First, the
remote address could be reused across nodes. This could result in
incorrect information. Additionally, places that used this information
(such as the allocator) needed to unnecessarily map the node id to
address just to do a lookup.

Finally in preparation for dialback on heartbeat cockroachdb#84289 the use of the
OriginAddr field in the PingRequest is deprecated. That PR will add back
another field that can be used to lookup the Locality correct field to
use.

Epic: none
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-server Relating to the KV-level RPC server C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants