Disconnect from outbound peers with bad headers chains #11490

sdaftuar · 2017-10-12T21:06:07Z

The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.

The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer.

We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

sipa · 2017-10-13T00:40:18Z

src/net_processing.cpp

@@ -502,7 +517,7 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
    NodeId nodeid = pnode->GetId();
    {
        LOCK(cs_main);
-        mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName)));
+        auto it = mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName)));


Unused variable

sipa · 2017-10-13T00:48:50Z

src/net_processing.cpp

+            // we should sync to it, unless it's invalid, in which case we
+            // should find that out and disconnect from them elsewhere).
+            if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork)
+            {


Nit: same line

sipa · 2017-10-13T01:55:09Z

I've spent some time verifying the logic, and it looks very good.

I briefly misunderstood it to mean that if a peer makes progress during the 20 minute window, it would be extended. That would have been a problem (an incompatible chain that has less work but does grow continuously), but it's not what happens (it requires reaching the same amount of work as we had at the beginning of the window).

A test for the second commit would be great.

gmaxwell · 2017-10-13T09:17:39Z

src/net_processing.cpp

+                // whitelisted).
+                if (!(pfrom->fInbound || pfrom->fWhitelisted || pfrom->fAddnode)) {
+                    pfrom->fDisconnect = true;
+                }


I think this needs to be backed off a bit: Consider-- imagine we start and are fully synced up, but we are still in IsInitialBlockDownload because the network has been slow and so the tip's timestamp is a ways back. Then we have a peer we get headers from who is a single block behind us. This would disconnect them, and I think that is probably undesirable.

It's probably a good idea to review all code as if IsInitialBlockDownload() were just return true; and if there would be misbehavior in that case, then the usage is perhaps suboptimal.
An obvious conservative thing to do would be to just use the nMinimumChainWork ... though perhaps too conservative. It might also be reasonable to use a GetBlockProofEquivalentTime like test, or just the work several blocks back from chainactive tip. I think if it was conservative enough it could apply at all times, not just IsIninitialBlockDownload().

Maybe the existence of the second commit means that it would be okay for this one to be less aggressive?

I guess with the second commit, we will eventually disconnect some peers if they stay on chains with less work than our tip. So I think we could just leave this as nMinimumChainWork, to protect against the case that all our peers only have incomplete/bogus chains (which wouldn't be prevented by the second commit).

Does it make sense to avoid the push above:

bitcoin/src/net_processing.cpp

Line 2332 in e9d5802

connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::GETHEADERS, chainActive.GetLocator(pindexLast), uint256()));

now that the node will disconnect?

Edit: nevermind, since this is protected with nCount != MAX_HEADERS_RESULTS.

sdaftuar · 2017-10-13T13:16:01Z

Thanks for the review. I've pushed up a couple quick fixes for the nits. Will work on a test, as well as trying to implement a suggestion @gmaxwell gave me offline for improving the behavior in the situation where none of our peers are giving us any block announcements.

TheBlueMatt

Looks pretty good. This is begging for a clearer node-sync-state model, but its fine for 0.15.0.2. Does want a test.

TheBlueMatt · 2017-10-18T20:23:32Z

src/net_processing.cpp

+        // If we're in IBD, we want outbound peers that will serve us a useful
+        // chain.  Disconnect peers that are behind us or on chains with
+        // insufficient work.
+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {


I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.

Eventually I really think we need to have peers in a "mode" which describes were we are 1. connect-handshake (version/verack/sendcmpct/sendheaders/feefilter/etc) 2. header-chain-sync (slightly complicated by the fact that we want to block all peers but one during ibd until we've finished one) 3. normal operation and then we can simplify and document strange things like this significantly instead of making everything yet more of a mess.

I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.

I don't think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) -- if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.

Swap conditions as the second is faster?

I don't think this really matters? In the steady state (ie after we've caught up from initial sync), the second condition will generally be true, while the first condition will be false.

if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

I'm confused, then. If we randomly decide to request the header at height 1 in the middle of syncing our peer's header chain, we'd do fine in the rest of the ::HEADERS handling here, just (correctly) not requesting any further headers built on top of it. However, we'd immediately disconnect them if we're not in IBD by this new check. Previously this wouldn't break anything, as the next HEADERS they send us will result in us requesting further headers.

Ah, you're right about that case -- I was thinking of the case where we send a random getheaders after we've already synced the chain, and therefore updated pindexBestKnownBlock to be some recent-ish thing.

I guess it is true that if you just added a random getheaders message in the middle of initial sync, you might not break the sync logic, but this new logic would cause a disconnect... I can add a comment indicating that we shouldn't do anything funky until after we've synced a peer's headers chain?

TheBlueMatt · 2017-10-18T22:13:07Z

src/net_processing.cpp

@@ -200,6 +203,14 @@ struct CNodeState {
     * otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns.
     */
    bool fSupportsDesiredCmpctVersion;
+    //! Whether this peer is protected from disconnection due to a bad/slow chain
+    bool fProtectFromDisconnect;


new code should have the m_ prefix here (and other places), and probably drop the hungarian type prefix.

TheBlueMatt · 2017-10-18T23:21:17Z

src/net_processing.cpp

+                    pto->fDisconnect = true;
+                } else {
+                    const CBlockIndex *pindexStart = state.header_with_required_work;
+                    assert (pindexStart != nullptr);


Ugh, just for a print?

agreed, chucking it

sdaftuar

Thanks for the review -- pushed up some fixup commits, which I'll squash.

I switched to using GetTime() everywhere, and was then able to write up a unit test. This is currently next-to-impossible to test in the regtest environment, I believe, since we don't support non-manual connections in regtest.

sdaftuar · 2017-10-19T15:54:48Z

src/net_processing.cpp

+        // If we're in IBD, we want outbound peers that will serve us a useful
+        // chain.  Disconnect peers that are behind us or on chains with
+        // insufficient work.
+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {


I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.

I don't think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) -- if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.

sdaftuar · 2017-10-19T16:59:09Z

src/net_processing.cpp

+                    pto->fDisconnect = true;
+                } else {
+                    const CBlockIndex *pindexStart = state.header_with_required_work;
+                    assert (pindexStart != nullptr);


agreed, chucking it

sdaftuar · 2017-10-19T17:23:24Z

Squashed (https://github.com/sdaftuar/bitcoin/commits/11490.1 -> 032dc531b911a0c0ea570aa74531508f54a36073)

jonasschnelli · 2017-10-19T19:03:51Z

Travis found:

test/DoS_tests.cpp(80): error in "outbound_slow_chain_eviction": check dummyNode1.fDisconnect == true failed

sdaftuar · 2017-10-19T21:02:45Z

Unit test issue should be fixed now.

promag · 2017-10-19T23:07:59Z

src/net_processing.cpp

@@ -534,6 +549,8 @@ void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTim
    nPreferredDownload -= state->fPreferredDownload;
    nPeersWithValidatedDownloads -= (state->nBlocksInFlightValidHeaders != 0);
    assert(nPeersWithValidatedDownloads >= 0);
+    g_outbound_peers_with_protect_from_disconnect -= state->m_protect_from_disconnect;


int -= bool?

More verbose but IMO more clear:

if (state->m_protect_from_disconnect) { assert(g_outbound_peers_with_protect_from_disconnect > 0); --g_outbound_peers_with_protect_from_disconnect; }

I was following the pattern already in use in the lines above (nPreferredDownload -= state->fPreferredDownload). But I can change to something like what you suggest if this is frowned upon for some reason.

promag · 2017-10-19T23:10:16Z

src/net_processing.cpp

@@ -124,6 +124,9 @@ namespace {
    /** Number of peers from which we're downloading blocks. */
    int nPeersWithValidatedDownloads = 0;

+    /** Number of outbound peers with m_protect_from_disconnect. */
+    int g_outbound_peers_with_protect_from_disconnect = 0;


Remove since this is easily calculated by counting nodes with m_protect_from_disconnect?

By looping over all the CNodeState objects when this value is needed? I don't think that makes sense. Note that tracking the value here is analogous to what we do with nPeersWithValidatedDownloads and nPreferredDownload.

promag · 2017-10-19T23:12:44Z

src/net_processing.h

+/** Protect at least this many outbound peers from disconnection due to slow/
+ * behind headers chain.
+ */
+static constexpr int32_t MAX_OUTBOUND_PEERS_TO_PROTECT = 4;


_FROM_DISCONNECT?

promag · 2017-10-19T23:28:11Z

src/net_processing.cpp

+        // If we're in IBD, we want outbound peers that will serve us a useful
+        // chain.  Disconnect peers that are behind us or on chains with
+        // insufficient work.
+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {


Swap conditions as the second is faster?

promag · 2017-10-19T23:28:26Z

src/net_processing.cpp

@@ -2383,6 +2401,31 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
                }
            }
        }
+        // If we're in IBD, we want outbound peers that will serve us a useful
+        // chain.  Disconnect peers that are behind us or on chains with


Remove extra space.

promag · 2017-10-19T23:34:18Z

src/net_processing.cpp

+                state.m_sent_getheaders_to_check_chain_sync = false;
+            } else if (state.m_headers_chain_timeout > 0 && time_in_seconds > state.m_headers_chain_timeout) {
+                // No evidence yet that our peer has synced to a chain with work equal to that
+                // of our tip, when we first detected it was behind.  Send a single getheaders


Remove extra space.

sdaftuar · 2017-10-20T00:32:33Z

Addressed some of @promag's nits at https://github.com/sdaftuar/bitcoin/commits/11490.2, squashed -> 39e0c8e

ryanofsky

utACK 39e0c8e

ryanofsky · 2017-10-23T19:58:25Z

src/net_processing.cpp

+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
+            // When nCount < MAX_HEADERS_RESULTS, we know we have no more
+            // headers to fetch from this peer.
+            if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {


In commit "Disconnecting from bad outbound peers in IBD"

Can you update comment to say why this is checking against nMinimumChainWork (and not chainActive.Tip()->nChainWork)?

Thanks. New "compare their tip" text might make a little more sense as part of the "peer has too little work" comment below rather than the "no more headers to fetch from this peer" comment up here.

ryanofsky · 2017-10-23T20:06:59Z

test/functional/minchainwork.py

@@ -27,7 +27,7 @@ class MinimumChainWorkTest(BitcoinTestFramework):
    def set_test_params(self):
        self.setup_clean_chain = True
        self.num_nodes = 3
-        self.extra_args = [[], ["-minimumchainwork=0x65"], ["-minimumchainwork=0x65"]]
+        self.extra_args = [['-whitelist=127.0.0.1'], ["-minimumchainwork=0x65", '-whitelist=127.0.0.1'], ["-minimumchainwork=0x65"]]


In commit "Disconnecting from bad outbound peers in IBD"

A comment saying why -whitelist is needed would be helfpul I think

ryanofsky · 2017-10-23T20:19:14Z

src/net_processing.cpp

@@ -200,6 +203,14 @@ struct CNodeState {
     * otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns.
     */
    bool fSupportsDesiredCmpctVersion;
+    //! Whether this peer is protected from disconnection due to a bad/slow chain


In commit "Permit disconnection of outbound peers on bad/slow chains"

I think you should do something to more clearly group these variables together. You could use doxygen grouping syntax:

//! State used to enforce CHAIN_SYNC_TIMEOUT. //! @{ ... variable declarations here ... //! @}

You could also give them a common prefix like m_sync_timeout_enable / m_sync_timeout_deadline / m_sync_timeout_min_work / m_sync_timeout_sent_getheaders or move them to a nested struct, though that's probably asking a lot this far into the review.

ryanofsky · 2017-10-23T20:40:31Z

src/net_processing.cpp

+        // Check that outbound peers have reasonable chains
+        // GetTime() is used by this anti-DoS logic so we can test this using mocktime
+        int64_t time_in_seconds = GetTime();
+        if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {


In commit "Permit disconnection of outbound peers on bad/slow chains"

It seems like it could be good to have an pto->isOutbound() helper returning !pto->fInbound && !pto->m_manual_connection && !pto->fFeeler && !pto->fOneShot so this long string of checks doesn't need to be repeated here and above in getheaders.

ryanofsky · 2017-10-23T20:48:48Z

src/net_processing.cpp

+        int64_t time_in_seconds = GetTime();
+        if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {
+            // This is an outbound peer subject to disconnection if their chain
+            // lags behind ours (note: if their chain has more work than ours,


In commit "Permit disconnection of outbound peers on bad/slow chains"

Instead of "if their chain lags behind ours" maybe be more specific and say "if they don't announce a block with as much work as the current tip within CHAIN_SYNC_TIMEOUT+HEADERS_RESPONSE_TIME seconds."

ryanofsky · 2017-10-23T21:08:56Z

src/net_processing.cpp

+            // lags behind ours (note: if their chain has more work than ours,
+            // we should sync to it, unless it's invalid, in which case we
+            // should find that out and disconnect from them elsewhere).
+            if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork) {


In commit "Permit disconnection of outbound peers on bad/slow chains":

I don't see why the first and second branches of this if/else are distinct. It seems like you could unite them and drop unneeded checks with:

if (!state.m_header_with_required_work || (state.pindexBestKnownBlock && state.pindexBestKnownBlock->nChainWork >= state.m_header_with_required_work->nChainWork) { // Reset timeout and min required work if peer caught up to previous requirement. state.m_headers_chain_timeout = time_in_seconds + CHAIN_SYNC_TIMEOUT; state.m_header_with_required_work = chainActive.Tip(); state.m_sent_getheaders_to_check_chain_sync = false; } else if (time_in_seconds > state.m_headers_chain_timeout) { ...

Hm, I'll think about this. I haven't quite convinced myself yet that there are no edge-case differences between your simplification and my original patch but you may be right.

ryanofsky · 2017-10-23T21:34:58Z

src/test/DoS_tests.cpp

+    int64_t nStartTime = GetTime();
+    // Wait 21 minutes, to
+    SetMockTime(nStartTime+21*60);
+    peerLogic->SendMessages(&dummyNode1, interruptDummy); // should result in getheaders


In commit "Add unit test for outbound peer eviction"

Maybe check dummyNode1.vSendMsg size or contents to see if there actually was a getheaders.

sdaftuar · 2017-10-24T13:43:48Z

@ryanofsky Thanks for the review; I've addressed all but one of your comments, which I'm still thinking about.

sdaftuar · 2017-10-24T16:20:34Z

Squashed https://github.com/sdaftuar/bitcoin/commits/11490.3 -> 9dafb45

ryanofsky

Thanks for the updates! utACK 9dafb45

ryanofsky · 2017-10-24T16:17:11Z

src/net_processing.cpp

+    /** State used to enforce CHAIN_SYNC_TIMEOUT
+      * Only in effect for outbound, non-manual connections, with
+      * m_protect == false
+      * Algorithm: if a peer's best known block has less work than our tip,


Could maybe be more specific about the timeouts in the description. Instead of "T in the future", could write "CHAIN_SYNC_TIMEOUT seconds in the future", instead of "bump the timeout" could write "reset the timeout", instead of "new shorter timeout" could refer to HEADERS_RESPONSE_TIME.

ryanofsky · 2017-10-24T16:20:22Z

src/net_processing.cpp

+        if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
+            // When nCount < MAX_HEADERS_RESULTS, we know we have no more
+            // headers to fetch from this peer.
+            if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {


Thanks. New "compare their tip" text might make a little more sense as part of the "peer has too little work" comment below rather than the "no more headers to fetch from this peer" comment up here.

theuni · 2017-10-24T17:27:23Z

src/net_processing.cpp

+            // won't start block download until we have a headers chain that
+            // has at least nMinimumChainWork, even if a peer has a chain past
+            // our tip, as an anti-DoS measure.
+            if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {


I'm unsure if it would be possible to get here without setting a pindexBestKnownBlock, but please double-check to be safe: if (nodestate->pindexBestKnownBlock && ...

ryanofsky

utACK d9a0506 (just adds two suggested comment updates).

theuni · 2017-10-24T18:36:19Z

src/net_processing.cpp

+        if (!pfrom->fDisconnect && IsOutboundPeer(pfrom)) {
+            // If this is an outbound peer, check to see if we should protect
+            // it from the bad/lagging chain logic.
+            if (g_outbound_peers_with_protect_from_disconnect < MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT && nodestate->pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork && !nodestate->m_chain_sync.m_protect) {


nodestate->pindexBestKnownBlock needs a nullptr check here too. Though after a quick look, it doesn't look like we can get here without a pindexBestKnownBlock. If that's the case (I haven't traced it fully) an assert may make more sense.

Yeah it should be impossible due to the call to UpdateBlockAvailability() above, but I'll add the nullptr check anyway.

theuni · 2017-10-24T18:57:09Z

src/net_processing.cpp

@@ -3265,6 +3315,54 @@ bool PeerLogicValidation::SendMessages(CNode* pto, std::atomic<bool>& interruptM
            }
        }

+        // Check that outbound peers have reasonable chains
+        // GetTime() is used by this anti-DoS logic so we can test this using mocktime
+        int64_t time_in_seconds = GetTime();


Can this be factored out into a function? It seems we really only need to call it when we get new headers, or when a timer expires. Moving it out will make it easier to break up the SendMessages monster down the road.

sdaftuar · 2017-10-24T20:43:12Z

Updated to address latest comments from @theuni.

achow101 · 2017-10-25T04:09:19Z

utACK 0114230 Needs squash

sdaftuar · 2017-10-25T15:55:15Z

Squashed https://github.com/sdaftuar/bitcoin/commits/11490.4 -> d6c4ad7

gmaxwell · 2017-10-25T19:22:41Z

ACK. Looks good and I've been running it on a node for a couple days, seen some getheaders but no disconnections yet, though I've started disconnecting healthy outbound peers with the hope of finding some that fail naturally. :)

gmaxwell · 2017-10-26T03:22:14Z

After a kicking all my outbounds every couple hours, I eventually hit on this disconnecting a peer... logs show that it was obviously broken. Good job patch.

When in IBD, we'd like to use all our outbound peers to help us sync the chain. Disconnect any outbound peers whose headers have insufficient work.

Currently we have no rotation of outbound peers. If an outbound peer stops serving us blocks, or is on a consensus-incompatible chain with less work than our tip (but otherwise valid headers), then we will never disconnect that peer, even though that peer is using one of our 8 outbound connection slots. Because we rely on our outbound peers to find an honest node in order to reach consensus, allowing an incompatible peer to occupy one of those slots is undesirable, particularly if it is possible for all such slots to be occupied by such peers. Protect against this by always checking to see if a peer's best known block has less work than our tip, and if so, set a 20 minute timeout -- if the peer is still not known to have caught up to a chain with as much work as ours after 20 minutes, then send a single getheaders message, wait 2 more minutes, and if a better header hasn't been received by then, disconnect that peer. Note: - we do not require that our peer sync to the same tip as ours, just an equal or greater work tip. (Doing otherwise would risk partitioning the network in the event of a chain split, and is also unnecessary.) - we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology.

jnewbery · 2017-10-26T17:48:50Z

utACK 453545f

sdaftuar · 2017-10-26T17:53:17Z

@jnewbery pointed out to that the functional test didn't actually need a change, due to protections given to manual connections (and every connection in regtest is a manual connection). So I reverted the change to minchainwork and just added a comment that explains the situation. Also fixed a comment typo that he pointed out in the unit test.

Squashed https://github.com/sdaftuar/bitcoin/commits/11490.6 -> e065249

jnewbery · 2017-10-26T18:25:46Z

utACK e065249

e065249 Add unit test for outbound peer eviction (Suhas Daftuar) 5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar) c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar) Pull request description: The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD. The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours: For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer. We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains. We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate. Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656

When in IBD, we'd like to use all our outbound peers to help us sync the chain. Disconnect any outbound peers whose headers have insufficient work. Github-Pull: bitcoin#11490 Rebased-From: c60fd71

Currently we have no rotation of outbound peers. If an outbound peer stops serving us blocks, or is on a consensus-incompatible chain with less work than our tip (but otherwise valid headers), then we will never disconnect that peer, even though that peer is using one of our 8 outbound connection slots. Because we rely on our outbound peers to find an honest node in order to reach consensus, allowing an incompatible peer to occupy one of those slots is undesirable, particularly if it is possible for all such slots to be occupied by such peers. Protect against this by always checking to see if a peer's best known block has less work than our tip, and if so, set a 20 minute timeout -- if the peer is still not known to have caught up to a chain with as much work as ours after 20 minutes, then send a single getheaders message, wait 2 more minutes, and if a better header hasn't been received by then, disconnect that peer. Note: - we do not require that our peer sync to the same tip as ours, just an equal or greater work tip. (Doing otherwise would risk partitioning the network in the event of a chain split, and is also unnecessary.) - we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology. Github-Pull: bitcoin#11490 Rebased-From: 5a6d00c

Github-Pull: bitcoin#11490 Rebased-From: e065249

…chains e065249 Add unit test for outbound peer eviction (Suhas Daftuar) 5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar) c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar) Pull request description: The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD. The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours: For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer. We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains. We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate. Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656

fanquake added the P2P label Oct 12, 2017

fanquake added this to the 0.15.0.2 milestone Oct 12, 2017

sipa reviewed Oct 13, 2017

View reviewed changes

gmaxwell reviewed Oct 13, 2017

View reviewed changes

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from 12d7bcd to 4b8a4bd Compare October 14, 2017 15:40

TheBlueMatt reviewed Oct 18, 2017

View reviewed changes

laanwj added the Needs backport label Oct 19, 2017

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from d9b0eee to 1725fb0 Compare October 19, 2017 17:01

sdaftuar commented Oct 19, 2017

View reviewed changes

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from 1725fb0 to 032dc53 Compare October 19, 2017 17:21

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch 2 times, most recently from c1e129b to 0a3d957 Compare October 19, 2017 20:49

promag reviewed Oct 19, 2017

View reviewed changes

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from ea03103 to 39e0c8e Compare October 20, 2017 00:31

sdaftuar mentioned this pull request Oct 20, 2017

Evict outbound peers if tip is stale #11534

Closed

ryanofsky reviewed Oct 23, 2017

View reviewed changes

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from eafe053 to 9dafb45 Compare October 24, 2017 16:19

ryanofsky reviewed Oct 24, 2017

View reviewed changes

theuni reviewed Oct 24, 2017

View reviewed changes

ryanofsky reviewed Oct 24, 2017

View reviewed changes

theuni reviewed Oct 24, 2017

View reviewed changes

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from 0114230 to d6c4ad7 Compare October 25, 2017 15:54

sdaftuar and others added 2 commits October 26, 2017 13:43

Disconnecting from bad outbound peers in IBD

c60fd71

When in IBD, we'd like to use all our outbound peers to help us sync the chain. Disconnect any outbound peers whose headers have insufficient work.

Add unit test for outbound peer eviction

e065249

sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from d7f1222 to e065249 Compare October 26, 2017 17:53

laanwj merged commit e065249 into bitcoin:master Oct 26, 2017

sdaftuar mentioned this pull request Oct 26, 2017

Connect to a new outbound peer if our tip is stale #11560

Merged

maflcko removed the Needs backport label Nov 1, 2017

morcos mentioned this pull request Nov 2, 2017

0.15: Backports #11592

Merged

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Nov 2, 2017

Add unit test for outbound peer eviction

e327224

Github-Pull: bitcoin#11490 Rebased-From: e065249

underdarkskies mentioned this pull request Jul 11, 2018

Correct Ravencoin's Commit History RavenProject/Ravencoin#119

Closed

sdaftuar mentioned this pull request Jun 11, 2019

p2p: Add 2 outbound block-relay-only connections #15759

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Disconnect from outbound peers with bad headers chains #11490

Disconnect from outbound peers with bad headers chains #11490

Conversation

sdaftuar commented Oct 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sipa commented Oct 13, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

promag Oct 24, 2017 • edited Loading

Choose a reason for hiding this comment

sdaftuar commented Oct 13, 2017

TheBlueMatt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar commented Oct 19, 2017

jonasschnelli commented Oct 19, 2017

sdaftuar commented Oct 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar commented Oct 20, 2017

ryanofsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar commented Oct 24, 2017

sdaftuar commented Oct 24, 2017

ryanofsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanofsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdaftuar commented Oct 24, 2017

achow101 commented Oct 25, 2017

sdaftuar commented Oct 25, 2017

gmaxwell commented Oct 25, 2017

gmaxwell commented Oct 26, 2017

jnewbery commented Oct 26, 2017

sdaftuar commented Oct 26, 2017 • edited Loading

jnewbery commented Oct 26, 2017

promag Oct 24, 2017 •

edited

Loading

sdaftuar Oct 20, 2017 •

edited

Loading

sdaftuar Oct 20, 2017 •

edited

Loading

sdaftuar commented Oct 26, 2017 •

edited

Loading