Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disconnect from outbound peers with bad headers chains #11490

Merged
merged 3 commits into from
Oct 26, 2017

Conversation

sdaftuar
Copy link
Member

The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.

The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer.

We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

@fanquake fanquake added the P2P label Oct 12, 2017
@fanquake fanquake added this to the 0.15.0.2 milestone Oct 12, 2017
@@ -502,7 +517,7 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
NodeId nodeid = pnode->GetId();
{
LOCK(cs_main);
mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName)));
auto it = mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable

// we should sync to it, unless it's invalid, in which case we
// should find that out and disconnect from them elsewhere).
if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: same line

@sipa
Copy link
Member

sipa commented Oct 13, 2017

I've spent some time verifying the logic, and it looks very good.

I briefly misunderstood it to mean that if a peer makes progress during the 20 minute window, it would be extended. That would have been a problem (an incompatible chain that has less work but does grow continuously), but it's not what happens (it requires reaching the same amount of work as we had at the beginning of the window).

A test for the second commit would be great.

// whitelisted).
if (!(pfrom->fInbound || pfrom->fWhitelisted || pfrom->fAddnode)) {
pfrom->fDisconnect = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be backed off a bit: Consider-- imagine we start and are fully synced up, but we are still in IsInitialBlockDownload because the network has been slow and so the tip's timestamp is a ways back. Then we have a peer we get headers from who is a single block behind us. This would disconnect them, and I think that is probably undesirable.

It's probably a good idea to review all code as if IsInitialBlockDownload() were just return true; and if there would be misbehavior in that case, then the usage is perhaps suboptimal.
An obvious conservative thing to do would be to just use the nMinimumChainWork ... though perhaps too conservative. It might also be reasonable to use a GetBlockProofEquivalentTime like test, or just the work several blocks back from chainactive tip. I think if it was conservative enough it could apply at all times, not just IsIninitialBlockDownload().

Maybe the existence of the second commit means that it would be okay for this one to be less aggressive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with the second commit, we will eventually disconnect some peers if they stay on chains with less work than our tip. So I think we could just leave this as nMinimumChainWork, to protect against the case that all our peers only have incomplete/bogus chains (which wouldn't be prevented by the second commit).

Copy link
Contributor

@promag promag Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to avoid the push above:

connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::GETHEADERS, chainActive.GetLocator(pindexLast), uint256()));

now that the node will disconnect?

Edit: nevermind, since this is protected with nCount != MAX_HEADERS_RESULTS.

@sdaftuar
Copy link
Member Author

Thanks for the review. I've pushed up a couple quick fixes for the nits. Will work on a test, as well as trying to implement a suggestion @gmaxwell gave me offline for improving the behavior in the situation where none of our peers are giving us any block announcements.

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from 12d7bcd to 4b8a4bd Compare October 14, 2017 15:40
Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. This is begging for a clearer node-sync-state model, but its fine for 0.15.0.2. Does want a test.

// If we're in IBD, we want outbound peers that will serve us a useful
// chain. Disconnect peers that are behind us or on chains with
// insufficient work.
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.

Eventually I really think we need to have peers in a "mode" which describes were we are 1. connect-handshake (version/verack/sendcmpct/sendheaders/feefilter/etc) 2. header-chain-sync (slightly complicated by the fact that we want to block all peers but one during ibd until we've finished one) 3. normal operation and then we can simplify and document strange things like this significantly instead of making everything yet more of a mess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.

I don't think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) -- if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap conditions as the second is faster?

Copy link
Member Author

@sdaftuar sdaftuar Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this really matters? In the steady state (ie after we've caught up from initial sync), the second condition will generally be true, while the first condition will be false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

I'm confused, then. If we randomly decide to request the header at height 1 in the middle of syncing our peer's header chain, we'd do fine in the rest of the ::HEADERS handling here, just (correctly) not requesting any further headers built on top of it. However, we'd immediately disconnect them if we're not in IBD by this new check. Previously this wouldn't break anything, as the next HEADERS they send us will result in us requesting further headers.

Copy link
Member Author

@sdaftuar sdaftuar Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right about that case -- I was thinking of the case where we send a random getheaders after we've already synced the chain, and therefore updated pindexBestKnownBlock to be some recent-ish thing.

I guess it is true that if you just added a random getheaders message in the middle of initial sync, you might not break the sync logic, but this new logic would cause a disconnect... I can add a comment indicating that we shouldn't do anything funky until after we've synced a peer's headers chain?

@@ -200,6 +203,14 @@ struct CNodeState {
* otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns.
*/
bool fSupportsDesiredCmpctVersion;
//! Whether this peer is protected from disconnection due to a bad/slow chain
bool fProtectFromDisconnect;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new code should have the m_ prefix here (and other places), and probably drop the hungarian type prefix.

pto->fDisconnect = true;
} else {
const CBlockIndex *pindexStart = state.header_with_required_work;
assert (pindexStart != nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, just for a print?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, chucking it

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from d9b0eee to 1725fb0 Compare October 19, 2017 17:01
Copy link
Member Author

@sdaftuar sdaftuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review -- pushed up some fixup commits, which I'll squash.

I switched to using GetTime() everywhere, and was then able to write up a unit test. This is currently next-to-impossible to test in the regtest environment, I believe, since we don't support non-manual connections in regtest.

// If we're in IBD, we want outbound peers that will serve us a useful
// chain. Disconnect peers that are behind us or on chains with
// insufficient work.
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.

I don't think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) -- if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.

In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.

pto->fDisconnect = true;
} else {
const CBlockIndex *pindexStart = state.header_with_required_work;
assert (pindexStart != nullptr);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, chucking it

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from 1725fb0 to 032dc53 Compare October 19, 2017 17:21
@sdaftuar
Copy link
Member Author

Squashed (https://github.com/sdaftuar/bitcoin/commits/11490.1 -> 032dc531b911a0c0ea570aa74531508f54a36073)

@jonasschnelli
Copy link
Contributor

Travis found:

test/DoS_tests.cpp(80): error in "outbound_slow_chain_eviction": check dummyNode1.fDisconnect == true failed

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch 2 times, most recently from c1e129b to 0a3d957 Compare October 19, 2017 20:49
@sdaftuar
Copy link
Member Author

Unit test issue should be fixed now.

@@ -534,6 +549,8 @@ void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTim
nPreferredDownload -= state->fPreferredDownload;
nPeersWithValidatedDownloads -= (state->nBlocksInFlightValidHeaders != 0);
assert(nPeersWithValidatedDownloads >= 0);
g_outbound_peers_with_protect_from_disconnect -= state->m_protect_from_disconnect;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int -= bool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More verbose but IMO more clear:

if (state->m_protect_from_disconnect) {
    assert(g_outbound_peers_with_protect_from_disconnect > 0);
    --g_outbound_peers_with_protect_from_disconnect;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern already in use in the lines above (nPreferredDownload -= state->fPreferredDownload). But I can change to something like what you suggest if this is frowned upon for some reason.

@@ -124,6 +124,9 @@ namespace {
/** Number of peers from which we're downloading blocks. */
int nPeersWithValidatedDownloads = 0;

/** Number of outbound peers with m_protect_from_disconnect. */
int g_outbound_peers_with_protect_from_disconnect = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove since this is easily calculated by counting nodes with m_protect_from_disconnect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looping over all the CNodeState objects when this value is needed? I don't think that makes sense. Note that tracking the value here is analogous to what we do with nPeersWithValidatedDownloads and nPreferredDownload.

/** Protect at least this many outbound peers from disconnection due to slow/
* behind headers chain.
*/
static constexpr int32_t MAX_OUTBOUND_PEERS_TO_PROTECT = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_FROM_DISCONNECT?

// If we're in IBD, we want outbound peers that will serve us a useful
// chain. Disconnect peers that are behind us or on chains with
// insufficient work.
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap conditions as the second is faster?

@@ -2383,6 +2401,31 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr
}
}
}
// If we're in IBD, we want outbound peers that will serve us a useful
// chain. Disconnect peers that are behind us or on chains with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space.

state.m_sent_getheaders_to_check_chain_sync = false;
} else if (state.m_headers_chain_timeout > 0 && time_in_seconds > state.m_headers_chain_timeout) {
// No evidence yet that our peer has synced to a chain with work equal to that
// of our tip, when we first detected it was behind. Send a single getheaders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove extra space.

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from ea03103 to 39e0c8e Compare October 20, 2017 00:31
@sdaftuar
Copy link
Member Author

Addressed some of @promag's nits at https://github.com/sdaftuar/bitcoin/commits/11490.2, squashed -> 39e0c8e

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 39e0c8e

if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
// When nCount < MAX_HEADERS_RESULTS, we know we have no more
// headers to fetch from this peer.
if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Disconnecting from bad outbound peers in IBD"

Can you update comment to say why this is checking against nMinimumChainWork (and not chainActive.Tip()->nChainWork)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. New "compare their tip" text might make a little more sense as part of the "peer has too little work" comment below rather than the "no more headers to fetch from this peer" comment up here.

@@ -27,7 +27,7 @@ class MinimumChainWorkTest(BitcoinTestFramework):
def set_test_params(self):
self.setup_clean_chain = True
self.num_nodes = 3
self.extra_args = [[], ["-minimumchainwork=0x65"], ["-minimumchainwork=0x65"]]
self.extra_args = [['-whitelist=127.0.0.1'], ["-minimumchainwork=0x65", '-whitelist=127.0.0.1'], ["-minimumchainwork=0x65"]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Disconnecting from bad outbound peers in IBD"

A comment saying why -whitelist is needed would be helfpul I think

@@ -200,6 +203,14 @@ struct CNodeState {
* otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns.
*/
bool fSupportsDesiredCmpctVersion;
//! Whether this peer is protected from disconnection due to a bad/slow chain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Permit disconnection of outbound peers on bad/slow chains"

I think you should do something to more clearly group these variables together. You could use doxygen grouping syntax:

//! State used to enforce CHAIN_SYNC_TIMEOUT.
//! @{
... variable declarations here ...
//! @}

You could also give them a common prefix like m_sync_timeout_enable / m_sync_timeout_deadline / m_sync_timeout_min_work / m_sync_timeout_sent_getheaders or move them to a nested struct, though that's probably asking a lot this far into the review.

// Check that outbound peers have reasonable chains
// GetTime() is used by this anti-DoS logic so we can test this using mocktime
int64_t time_in_seconds = GetTime();
if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Permit disconnection of outbound peers on bad/slow chains"

It seems like it could be good to have an pto->isOutbound() helper returning !pto->fInbound && !pto->m_manual_connection && !pto->fFeeler && !pto->fOneShot so this long string of checks doesn't need to be repeated here and above in getheaders.

int64_t time_in_seconds = GetTime();
if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) {
// This is an outbound peer subject to disconnection if their chain
// lags behind ours (note: if their chain has more work than ours,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Permit disconnection of outbound peers on bad/slow chains"

Instead of "if their chain lags behind ours" maybe be more specific and say "if they don't announce a block with as much work as the current tip within CHAIN_SYNC_TIMEOUT+HEADERS_RESPONSE_TIME seconds."

// lags behind ours (note: if their chain has more work than ours,
// we should sync to it, unless it's invalid, in which case we
// should find that out and disconnect from them elsewhere).
if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Permit disconnection of outbound peers on bad/slow chains":

I don't see why the first and second branches of this if/else are distinct. It seems like you could unite them and drop unneeded checks with:

if (!state.m_header_with_required_work || (state.pindexBestKnownBlock && state.pindexBestKnownBlock->nChainWork >= state.m_header_with_required_work->nChainWork) {
    // Reset timeout and min required work if peer caught up to previous requirement.
    state.m_headers_chain_timeout = time_in_seconds + CHAIN_SYNC_TIMEOUT;
    state.m_header_with_required_work = chainActive.Tip();
    state.m_sent_getheaders_to_check_chain_sync = false;
} else if (time_in_seconds > state.m_headers_chain_timeout) {
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'll think about this. I haven't quite convinced myself yet that there are no edge-case differences between your simplification and my original patch but you may be right.

int64_t nStartTime = GetTime();
// Wait 21 minutes, to
SetMockTime(nStartTime+21*60);
peerLogic->SendMessages(&dummyNode1, interruptDummy); // should result in getheaders
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "Add unit test for outbound peer eviction"

Maybe check dummyNode1.vSendMsg size or contents to see if there actually was a getheaders.

@sdaftuar
Copy link
Member Author

@ryanofsky Thanks for the review; I've addressed all but one of your comments, which I'm still thinking about.

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from eafe053 to 9dafb45 Compare October 24, 2017 16:19
@sdaftuar
Copy link
Member Author

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! utACK 9dafb45

/** State used to enforce CHAIN_SYNC_TIMEOUT
* Only in effect for outbound, non-manual connections, with
* m_protect == false
* Algorithm: if a peer's best known block has less work than our tip,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe be more specific about the timeouts in the description. Instead of "T in the future", could write "CHAIN_SYNC_TIMEOUT seconds in the future", instead of "bump the timeout" could write "reset the timeout", instead of "new shorter timeout" could refer to HEADERS_RESPONSE_TIME.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) {
// When nCount < MAX_HEADERS_RESULTS, we know we have no more
// headers to fetch from this peer.
if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. New "compare their tip" text might make a little more sense as part of the "peer has too little work" comment below rather than the "no more headers to fetch from this peer" comment up here.

// won't start block download until we have a headers chain that
// has at least nMinimumChainWork, even if a peer has a chain past
// our tip, as an anti-DoS measure.
if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if it would be possible to get here without setting a pindexBestKnownBlock, but please double-check to be safe: if (nodestate->pindexBestKnownBlock && ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK d9a0506 (just adds two suggested comment updates).

if (!pfrom->fDisconnect && IsOutboundPeer(pfrom)) {
// If this is an outbound peer, check to see if we should protect
// it from the bad/lagging chain logic.
if (g_outbound_peers_with_protect_from_disconnect < MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT && nodestate->pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork && !nodestate->m_chain_sync.m_protect) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodestate->pindexBestKnownBlock needs a nullptr check here too. Though after a quick look, it doesn't look like we can get here without a pindexBestKnownBlock. If that's the case (I haven't traced it fully) an assert may make more sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it should be impossible due to the call to UpdateBlockAvailability() above, but I'll add the nullptr check anyway.

@@ -3265,6 +3315,54 @@ bool PeerLogicValidation::SendMessages(CNode* pto, std::atomic<bool>& interruptM
}
}

// Check that outbound peers have reasonable chains
// GetTime() is used by this anti-DoS logic so we can test this using mocktime
int64_t time_in_seconds = GetTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be factored out into a function? It seems we really only need to call it when we get new headers, or when a timer expires. Moving it out will make it easier to break up the SendMessages monster down the road.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sdaftuar
Copy link
Member Author

Updated to address latest comments from @theuni.

@achow101
Copy link
Member

utACK 0114230 Needs squash

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from 0114230 to d6c4ad7 Compare October 25, 2017 15:54
@sdaftuar
Copy link
Member Author

@gmaxwell
Copy link
Contributor

ACK. Looks good and I've been running it on a node for a couple days, seen some getheaders but no disconnections yet, though I've started disconnecting healthy outbound peers with the hope of finding some that fail naturally. :)

@gmaxwell
Copy link
Contributor

After a kicking all my outbounds every couple hours, I eventually hit on this disconnecting a peer... logs show that it was obviously broken. Good job patch.

sdaftuar and others added 2 commits October 26, 2017 13:43
When in IBD, we'd like to use all our outbound peers to help us
sync the chain.  Disconnect any outbound peers whose headers have
insufficient work.
Currently we have no rotation of outbound peers.  If an outbound peer
stops serving us blocks, or is on a consensus-incompatible chain with
less work than our tip (but otherwise valid headers), then we will never
disconnect that peer, even though that peer is using one of our 8
outbound connection slots.  Because we rely on our outbound peers to
find an honest node in order to reach consensus, allowing an
incompatible peer to occupy one of those slots is undesirable,
particularly if it is possible for all such slots to be occupied by such
peers.

Protect against this by always checking to see if a peer's best known
block has less work than our tip, and if so, set a 20 minute timeout --
if the peer is still not known to have caught up to a chain with as much
work as ours after 20 minutes, then send a single getheaders message,
wait 2 more minutes, and if a better header hasn't been received by then,
disconnect that peer.

Note:

- we do not require that our peer sync to the same tip as ours, just an
equal or greater work tip.  (Doing otherwise would risk partitioning the
network in the event of a chain split, and is also unnecessary.)

- we pick 4 of our outbound peers and do not subject them to this logic,
to be more conservative. We don't wish to permit temporary network
issues (or an attacker) to excessively disrupt network topology.
@jnewbery
Copy link
Contributor

utACK 453545f

@sdaftuar sdaftuar force-pushed the 2017-10-outbound-peers-good-chain branch from d7f1222 to e065249 Compare October 26, 2017 17:53
@sdaftuar
Copy link
Member Author

sdaftuar commented Oct 26, 2017

@jnewbery pointed out to that the functional test didn't actually need a change, due to protections given to manual connections (and every connection in regtest is a manual connection). So I reverted the change to minchainwork and just added a comment that explains the situation. Also fixed a comment typo that he pointed out in the unit test.

Squashed https://github.com/sdaftuar/bitcoin/commits/11490.6 -> e065249

@jnewbery
Copy link
Contributor

utACK e065249

@laanwj laanwj merged commit e065249 into bitcoin:master Oct 26, 2017
laanwj added a commit that referenced this pull request Oct 26, 2017
e065249 Add unit test for outbound peer eviction (Suhas Daftuar)
5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar)
c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar)

Pull request description:

  The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.

  The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

  For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip.  If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes.  If after two minutes their best known block has insufficient work, we disconnect that peer.

  We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

  We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split.  Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect.  This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
@morcos morcos mentioned this pull request Nov 2, 2017
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Nov 2, 2017
When in IBD, we'd like to use all our outbound peers to help us
sync the chain.  Disconnect any outbound peers whose headers have
insufficient work.

Github-Pull: bitcoin#11490
Rebased-From: c60fd71
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Nov 2, 2017
Currently we have no rotation of outbound peers.  If an outbound peer
stops serving us blocks, or is on a consensus-incompatible chain with
less work than our tip (but otherwise valid headers), then we will never
disconnect that peer, even though that peer is using one of our 8
outbound connection slots.  Because we rely on our outbound peers to
find an honest node in order to reach consensus, allowing an
incompatible peer to occupy one of those slots is undesirable,
particularly if it is possible for all such slots to be occupied by such
peers.

Protect against this by always checking to see if a peer's best known
block has less work than our tip, and if so, set a 20 minute timeout --
if the peer is still not known to have caught up to a chain with as much
work as ours after 20 minutes, then send a single getheaders message,
wait 2 more minutes, and if a better header hasn't been received by then,
disconnect that peer.

Note:

- we do not require that our peer sync to the same tip as ours, just an
equal or greater work tip.  (Doing otherwise would risk partitioning the
network in the event of a chain split, and is also unnecessary.)

- we pick 4 of our outbound peers and do not subject them to this logic,
to be more conservative. We don't wish to permit temporary network
issues (or an attacker) to excessively disrupt network topology.

Github-Pull: bitcoin#11490
Rebased-From: 5a6d00c
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Nov 2, 2017
codablock pushed a commit to codablock/dash that referenced this pull request Sep 26, 2019
…chains

e065249 Add unit test for outbound peer eviction (Suhas Daftuar)
5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar)
c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar)

Pull request description:

  The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.

  The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

  For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip.  If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes.  If after two minutes their best known block has insufficient work, we disconnect that peer.

  We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

  We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split.  Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect.  This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
codablock pushed a commit to codablock/dash that referenced this pull request Sep 29, 2019
…chains

e065249 Add unit test for outbound peer eviction (Suhas Daftuar)
5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar)
c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar)

Pull request description:

  The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.

  The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

  For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip.  If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes.  If after two minutes their best known block has insufficient work, we disconnect that peer.

  We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

  We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split.  Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect.  This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
barrystyle pushed a commit to PACGlobalOfficial/PAC that referenced this pull request Jan 22, 2020
…chains

e065249 Add unit test for outbound peer eviction (Suhas Daftuar)
5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar)
c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar)

Pull request description:

  The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.

  The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:

  For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip.  If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes.  If after two minutes their best known block has insufficient work, we disconnect that peer.

  We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.

  We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split.  Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect.  This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.

Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.