-
Notifications
You must be signed in to change notification settings - Fork 36.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disconnect from outbound peers with bad headers chains #11490
Disconnect from outbound peers with bad headers chains #11490
Conversation
src/net_processing.cpp
Outdated
@@ -502,7 +517,7 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) { | |||
NodeId nodeid = pnode->GetId(); | |||
{ | |||
LOCK(cs_main); | |||
mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName))); | |||
auto it = mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable
src/net_processing.cpp
Outdated
// we should sync to it, unless it's invalid, in which case we | ||
// should find that out and disconnect from them elsewhere). | ||
if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: same line
I've spent some time verifying the logic, and it looks very good. I briefly misunderstood it to mean that if a peer makes progress during the 20 minute window, it would be extended. That would have been a problem (an incompatible chain that has less work but does grow continuously), but it's not what happens (it requires reaching the same amount of work as we had at the beginning of the window). A test for the second commit would be great. |
src/net_processing.cpp
Outdated
// whitelisted). | ||
if (!(pfrom->fInbound || pfrom->fWhitelisted || pfrom->fAddnode)) { | ||
pfrom->fDisconnect = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be backed off a bit: Consider-- imagine we start and are fully synced up, but we are still in IsInitialBlockDownload because the network has been slow and so the tip's timestamp is a ways back. Then we have a peer we get headers from who is a single block behind us. This would disconnect them, and I think that is probably undesirable.
It's probably a good idea to review all code as if IsInitialBlockDownload() were just return true; and if there would be misbehavior in that case, then the usage is perhaps suboptimal.
An obvious conservative thing to do would be to just use the nMinimumChainWork ... though perhaps too conservative. It might also be reasonable to use a GetBlockProofEquivalentTime like test, or just the work several blocks back from chainactive tip. I think if it was conservative enough it could apply at all times, not just IsIninitialBlockDownload().
Maybe the existence of the second commit means that it would be okay for this one to be less aggressive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess with the second commit, we will eventually disconnect some peers if they stay on chains with less work than our tip. So I think we could just leave this as nMinimumChainWork, to protect against the case that all our peers only have incomplete/bogus chains (which wouldn't be prevented by the second commit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to avoid the push above:
bitcoin/src/net_processing.cpp
Line 2332 in e9d5802
connman->PushMessage(pfrom, msgMaker.Make(NetMsgType::GETHEADERS, chainActive.GetLocator(pindexLast), uint256())); |
now that the node will disconnect?
Edit: nevermind, since this is protected with nCount != MAX_HEADERS_RESULTS
.
Thanks for the review. I've pushed up a couple quick fixes for the nits. Will work on a test, as well as trying to implement a suggestion @gmaxwell gave me offline for improving the behavior in the situation where none of our peers are giving us any block announcements. |
12d7bcd
to
4b8a4bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. This is begging for a clearer node-sync-state model, but its fine for 0.15.0.2. Does want a test.
src/net_processing.cpp
Outdated
// If we're in IBD, we want outbound peers that will serve us a useful | ||
// chain. Disconnect peers that are behind us or on chains with | ||
// insufficient work. | ||
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.
Eventually I really think we need to have peers in a "mode" which describes were we are 1. connect-handshake (version/verack/sendcmpct/sendheaders/feefilter/etc) 2. header-chain-sync (slightly complicated by the fact that we want to block all peers but one during ibd until we've finished one) 3. normal operation and then we can simplify and document strange things like this significantly instead of making everything yet more of a mess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.
I don't think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) -- if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.
In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap conditions as the second is faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this really matters? In the steady state (ie after we've caught up from initial sync), the second condition will generally be true, while the first condition will be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.
I'm confused, then. If we randomly decide to request the header at height 1 in the middle of syncing our peer's header chain, we'd do fine in the rest of the ::HEADERS handling here, just (correctly) not requesting any further headers built on top of it. However, we'd immediately disconnect them if we're not in IBD by this new check. Previously this wouldn't break anything, as the next HEADERS they send us will result in us requesting further headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right about that case -- I was thinking of the case where we send a random getheaders after we've already synced the chain, and therefore updated pindexBestKnownBlock to be some recent-ish thing.
I guess it is true that if you just added a random getheaders message in the middle of initial sync, you might not break the sync logic, but this new logic would cause a disconnect... I can add a comment indicating that we shouldn't do anything funky until after we've synced a peer's headers chain?
src/net_processing.cpp
Outdated
@@ -200,6 +203,14 @@ struct CNodeState { | |||
* otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns. | |||
*/ | |||
bool fSupportsDesiredCmpctVersion; | |||
//! Whether this peer is protected from disconnection due to a bad/slow chain | |||
bool fProtectFromDisconnect; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new code should have the m_ prefix here (and other places), and probably drop the hungarian type prefix.
src/net_processing.cpp
Outdated
pto->fDisconnect = true; | ||
} else { | ||
const CBlockIndex *pindexStart = state.header_with_required_work; | ||
assert (pindexStart != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, just for a print?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, chucking it
d9b0eee
to
1725fb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review -- pushed up some fixup commits, which I'll squash.
I switched to using GetTime() everywhere, and was then able to write up a unit test. This is currently next-to-impossible to test in the regtest environment, I believe, since we don't support non-manual connections in regtest.
src/net_processing.cpp
Outdated
// If we're in IBD, we want outbound peers that will serve us a useful | ||
// chain. Disconnect peers that are behind us or on chains with | ||
// insufficient work. | ||
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this adds a new invariant (currently true), that we shall never send a GETHEADERS or do something which expects, as a result, a HEADERS message, which is not MAX_HEADERS_RESULT if it doesn't end with the peer's tip (ie no GETHEADERS with hashStop of not-their-tip, or at least not when in IBD/before we've finished syncing the state of another peer). I don't think this is bad, but it should be documented somewhere explicitly.
I don't think this is precisely correct. What we are assuming is the same thing we use for headers sync (eg on initial connection) -- if we receive fewer than MAX_HEADERS_RESULTS, then we have no reason to request more headers in order to update pindexBestKnownBlock. Note that in the logic below, we are using pindexBestKnownBlock, not pindexLast.
In other words, if we did add a reason for requesting a single header to a random location in the chain, I think this logic would still be correct.
src/net_processing.cpp
Outdated
pto->fDisconnect = true; | ||
} else { | ||
const CBlockIndex *pindexStart = state.header_with_required_work; | ||
assert (pindexStart != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, chucking it
1725fb0
to
032dc53
Compare
Squashed (https://github.com/sdaftuar/bitcoin/commits/11490.1 -> 032dc531b911a0c0ea570aa74531508f54a36073) |
Travis found:
|
c1e129b
to
0a3d957
Compare
Unit test issue should be fixed now. |
src/net_processing.cpp
Outdated
@@ -534,6 +549,8 @@ void PeerLogicValidation::FinalizeNode(NodeId nodeid, bool& fUpdateConnectionTim | |||
nPreferredDownload -= state->fPreferredDownload; | |||
nPeersWithValidatedDownloads -= (state->nBlocksInFlightValidHeaders != 0); | |||
assert(nPeersWithValidatedDownloads >= 0); | |||
g_outbound_peers_with_protect_from_disconnect -= state->m_protect_from_disconnect; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int -= bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More verbose but IMO more clear:
if (state->m_protect_from_disconnect) {
assert(g_outbound_peers_with_protect_from_disconnect > 0);
--g_outbound_peers_with_protect_from_disconnect;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the pattern already in use in the lines above (nPreferredDownload -= state->fPreferredDownload
). But I can change to something like what you suggest if this is frowned upon for some reason.
src/net_processing.cpp
Outdated
@@ -124,6 +124,9 @@ namespace { | |||
/** Number of peers from which we're downloading blocks. */ | |||
int nPeersWithValidatedDownloads = 0; | |||
|
|||
/** Number of outbound peers with m_protect_from_disconnect. */ | |||
int g_outbound_peers_with_protect_from_disconnect = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove since this is easily calculated by counting nodes with m_protect_from_disconnect
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looping over all the CNodeState objects when this value is needed? I don't think that makes sense. Note that tracking the value here is analogous to what we do with nPeersWithValidatedDownloads
and nPreferredDownload
.
src/net_processing.h
Outdated
/** Protect at least this many outbound peers from disconnection due to slow/ | ||
* behind headers chain. | ||
*/ | ||
static constexpr int32_t MAX_OUTBOUND_PEERS_TO_PROTECT = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_FROM_DISCONNECT
?
src/net_processing.cpp
Outdated
// If we're in IBD, we want outbound peers that will serve us a useful | ||
// chain. Disconnect peers that are behind us or on chains with | ||
// insufficient work. | ||
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap conditions as the second is faster?
src/net_processing.cpp
Outdated
@@ -2383,6 +2401,31 @@ bool static ProcessMessage(CNode* pfrom, const std::string& strCommand, CDataStr | |||
} | |||
} | |||
} | |||
// If we're in IBD, we want outbound peers that will serve us a useful | |||
// chain. Disconnect peers that are behind us or on chains with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra space.
src/net_processing.cpp
Outdated
state.m_sent_getheaders_to_check_chain_sync = false; | ||
} else if (state.m_headers_chain_timeout > 0 && time_in_seconds > state.m_headers_chain_timeout) { | ||
// No evidence yet that our peer has synced to a chain with work equal to that | ||
// of our tip, when we first detected it was behind. Send a single getheaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra space.
ea03103
to
39e0c8e
Compare
Addressed some of @promag's nits at https://github.com/sdaftuar/bitcoin/commits/11490.2, squashed -> 39e0c8e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 39e0c8e
src/net_processing.cpp
Outdated
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) { | ||
// When nCount < MAX_HEADERS_RESULTS, we know we have no more | ||
// headers to fetch from this peer. | ||
if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Disconnecting from bad outbound peers in IBD"
Can you update comment to say why this is checking against nMinimumChainWork (and not chainActive.Tip()->nChainWork)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. New "compare their tip" text might make a little more sense as part of the "peer has too little work" comment below rather than the "no more headers to fetch from this peer" comment up here.
test/functional/minchainwork.py
Outdated
@@ -27,7 +27,7 @@ class MinimumChainWorkTest(BitcoinTestFramework): | |||
def set_test_params(self): | |||
self.setup_clean_chain = True | |||
self.num_nodes = 3 | |||
self.extra_args = [[], ["-minimumchainwork=0x65"], ["-minimumchainwork=0x65"]] | |||
self.extra_args = [['-whitelist=127.0.0.1'], ["-minimumchainwork=0x65", '-whitelist=127.0.0.1'], ["-minimumchainwork=0x65"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Disconnecting from bad outbound peers in IBD"
A comment saying why -whitelist is needed would be helfpul I think
src/net_processing.cpp
Outdated
@@ -200,6 +203,14 @@ struct CNodeState { | |||
* otherwise: whether this peer sends non-witnesses in cmpctblocks/blocktxns. | |||
*/ | |||
bool fSupportsDesiredCmpctVersion; | |||
//! Whether this peer is protected from disconnection due to a bad/slow chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Permit disconnection of outbound peers on bad/slow chains"
I think you should do something to more clearly group these variables together. You could use doxygen grouping syntax:
//! State used to enforce CHAIN_SYNC_TIMEOUT.
//! @{
... variable declarations here ...
//! @}
You could also give them a common prefix like m_sync_timeout_enable
/ m_sync_timeout_deadline
/ m_sync_timeout_min_work
/ m_sync_timeout_sent_getheaders
or move them to a nested struct, though that's probably asking a lot this far into the review.
src/net_processing.cpp
Outdated
// Check that outbound peers have reasonable chains | ||
// GetTime() is used by this anti-DoS logic so we can test this using mocktime | ||
int64_t time_in_seconds = GetTime(); | ||
if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Permit disconnection of outbound peers on bad/slow chains"
It seems like it could be good to have an pto->isOutbound()
helper returning !pto->fInbound && !pto->m_manual_connection && !pto->fFeeler && !pto->fOneShot
so this long string of checks doesn't need to be repeated here and above in getheaders.
src/net_processing.cpp
Outdated
int64_t time_in_seconds = GetTime(); | ||
if (!(state.m_protect_from_disconnect || pto->fInbound || pto->m_manual_connection || pto->fFeeler || pto->fOneShot) && state.fSyncStarted) { | ||
// This is an outbound peer subject to disconnection if their chain | ||
// lags behind ours (note: if their chain has more work than ours, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Permit disconnection of outbound peers on bad/slow chains"
Instead of "if their chain lags behind ours" maybe be more specific and say "if they don't announce a block with as much work as the current tip within CHAIN_SYNC_TIMEOUT+HEADERS_RESPONSE_TIME seconds."
src/net_processing.cpp
Outdated
// lags behind ours (note: if their chain has more work than ours, | ||
// we should sync to it, unless it's invalid, in which case we | ||
// should find that out and disconnect from them elsewhere). | ||
if (state.pindexBestKnownBlock != nullptr && state.pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Permit disconnection of outbound peers on bad/slow chains":
I don't see why the first and second branches of this if/else are distinct. It seems like you could unite them and drop unneeded checks with:
if (!state.m_header_with_required_work || (state.pindexBestKnownBlock && state.pindexBestKnownBlock->nChainWork >= state.m_header_with_required_work->nChainWork) {
// Reset timeout and min required work if peer caught up to previous requirement.
state.m_headers_chain_timeout = time_in_seconds + CHAIN_SYNC_TIMEOUT;
state.m_header_with_required_work = chainActive.Tip();
state.m_sent_getheaders_to_check_chain_sync = false;
} else if (time_in_seconds > state.m_headers_chain_timeout) {
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'll think about this. I haven't quite convinced myself yet that there are no edge-case differences between your simplification and my original patch but you may be right.
src/test/DoS_tests.cpp
Outdated
int64_t nStartTime = GetTime(); | ||
// Wait 21 minutes, to | ||
SetMockTime(nStartTime+21*60); | ||
peerLogic->SendMessages(&dummyNode1, interruptDummy); // should result in getheaders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "Add unit test for outbound peer eviction"
Maybe check dummyNode1.vSendMsg size or contents to see if there actually was a getheaders.
@ryanofsky Thanks for the review; I've addressed all but one of your comments, which I'm still thinking about. |
eafe053
to
9dafb45
Compare
Squashed https://github.com/sdaftuar/bitcoin/commits/11490.3 -> 9dafb45 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! utACK 9dafb45
src/net_processing.cpp
Outdated
/** State used to enforce CHAIN_SYNC_TIMEOUT | ||
* Only in effect for outbound, non-manual connections, with | ||
* m_protect == false | ||
* Algorithm: if a peer's best known block has less work than our tip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could maybe be more specific about the timeouts in the description. Instead of "T in the future", could write "CHAIN_SYNC_TIMEOUT seconds in the future", instead of "bump the timeout" could write "reset the timeout", instead of "new shorter timeout" could refer to HEADERS_RESPONSE_TIME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/net_processing.cpp
Outdated
if (IsInitialBlockDownload() && nCount != MAX_HEADERS_RESULTS) { | ||
// When nCount < MAX_HEADERS_RESULTS, we know we have no more | ||
// headers to fetch from this peer. | ||
if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. New "compare their tip" text might make a little more sense as part of the "peer has too little work" comment below rather than the "no more headers to fetch from this peer" comment up here.
src/net_processing.cpp
Outdated
// won't start block download until we have a headers chain that | ||
// has at least nMinimumChainWork, even if a peer has a chain past | ||
// our tip, as an anti-DoS measure. | ||
if (nodestate->pindexBestKnownBlock->nChainWork < nMinimumChainWork) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure if it would be possible to get here without setting a pindexBestKnownBlock, but please double-check to be safe: if (nodestate->pindexBestKnownBlock && ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK d9a0506 (just adds two suggested comment updates).
src/net_processing.cpp
Outdated
if (!pfrom->fDisconnect && IsOutboundPeer(pfrom)) { | ||
// If this is an outbound peer, check to see if we should protect | ||
// it from the bad/lagging chain logic. | ||
if (g_outbound_peers_with_protect_from_disconnect < MAX_OUTBOUND_PEERS_TO_PROTECT_FROM_DISCONNECT && nodestate->pindexBestKnownBlock->nChainWork >= chainActive.Tip()->nChainWork && !nodestate->m_chain_sync.m_protect) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nodestate->pindexBestKnownBlock
needs a nullptr check here too. Though after a quick look, it doesn't look like we can get here without a pindexBestKnownBlock. If that's the case (I haven't traced it fully) an assert may make more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it should be impossible due to the call to UpdateBlockAvailability() above, but I'll add the nullptr check anyway.
src/net_processing.cpp
Outdated
@@ -3265,6 +3315,54 @@ bool PeerLogicValidation::SendMessages(CNode* pto, std::atomic<bool>& interruptM | |||
} | |||
} | |||
|
|||
// Check that outbound peers have reasonable chains | |||
// GetTime() is used by this anti-DoS logic so we can test this using mocktime | |||
int64_t time_in_seconds = GetTime(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be factored out into a function? It seems we really only need to call it when we get new headers, or when a timer expires. Moving it out will make it easier to break up the SendMessages monster down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Updated to address latest comments from @theuni. |
utACK 0114230 Needs squash |
0114230
to
d6c4ad7
Compare
ACK. Looks good and I've been running it on a node for a couple days, seen some getheaders but no disconnections yet, though I've started disconnecting healthy outbound peers with the hope of finding some that fail naturally. :) |
After a kicking all my outbounds every couple hours, I eventually hit on this disconnecting a peer... logs show that it was obviously broken. Good job patch. |
When in IBD, we'd like to use all our outbound peers to help us sync the chain. Disconnect any outbound peers whose headers have insufficient work.
Currently we have no rotation of outbound peers. If an outbound peer stops serving us blocks, or is on a consensus-incompatible chain with less work than our tip (but otherwise valid headers), then we will never disconnect that peer, even though that peer is using one of our 8 outbound connection slots. Because we rely on our outbound peers to find an honest node in order to reach consensus, allowing an incompatible peer to occupy one of those slots is undesirable, particularly if it is possible for all such slots to be occupied by such peers. Protect against this by always checking to see if a peer's best known block has less work than our tip, and if so, set a 20 minute timeout -- if the peer is still not known to have caught up to a chain with as much work as ours after 20 minutes, then send a single getheaders message, wait 2 more minutes, and if a better header hasn't been received by then, disconnect that peer. Note: - we do not require that our peer sync to the same tip as ours, just an equal or greater work tip. (Doing otherwise would risk partitioning the network in the event of a chain split, and is also unnecessary.) - we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology.
utACK 453545f |
d7f1222
to
e065249
Compare
@jnewbery pointed out to that the functional test didn't actually need a change, due to protections given to manual connections (and every connection in regtest is a manual connection). So I reverted the change to minchainwork and just added a comment that explains the situation. Also fixed a comment typo that he pointed out in the unit test. Squashed https://github.com/sdaftuar/bitcoin/commits/11490.6 -> e065249 |
utACK e065249 |
e065249 Add unit test for outbound peer eviction (Suhas Daftuar) 5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar) c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar) Pull request description: The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD. The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours: For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer. We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains. We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate. Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
When in IBD, we'd like to use all our outbound peers to help us sync the chain. Disconnect any outbound peers whose headers have insufficient work. Github-Pull: bitcoin#11490 Rebased-From: c60fd71
Currently we have no rotation of outbound peers. If an outbound peer stops serving us blocks, or is on a consensus-incompatible chain with less work than our tip (but otherwise valid headers), then we will never disconnect that peer, even though that peer is using one of our 8 outbound connection slots. Because we rely on our outbound peers to find an honest node in order to reach consensus, allowing an incompatible peer to occupy one of those slots is undesirable, particularly if it is possible for all such slots to be occupied by such peers. Protect against this by always checking to see if a peer's best known block has less work than our tip, and if so, set a 20 minute timeout -- if the peer is still not known to have caught up to a chain with as much work as ours after 20 minutes, then send a single getheaders message, wait 2 more minutes, and if a better header hasn't been received by then, disconnect that peer. Note: - we do not require that our peer sync to the same tip as ours, just an equal or greater work tip. (Doing otherwise would risk partitioning the network in the event of a chain split, and is also unnecessary.) - we pick 4 of our outbound peers and do not subject them to this logic, to be more conservative. We don't wish to permit temporary network issues (or an attacker) to excessively disrupt network topology. Github-Pull: bitcoin#11490 Rebased-From: 5a6d00c
Github-Pull: bitcoin#11490 Rebased-From: e065249
…chains e065249 Add unit test for outbound peer eviction (Suhas Daftuar) 5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar) c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar) Pull request description: The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD. The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours: For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer. We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains. We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate. Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
…chains e065249 Add unit test for outbound peer eviction (Suhas Daftuar) 5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar) c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar) Pull request description: The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD. The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours: For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer. We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains. We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate. Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
…chains e065249 Add unit test for outbound peer eviction (Suhas Daftuar) 5a6d00c Permit disconnection of outbound peers on bad/slow chains (Suhas Daftuar) c60fd71 Disconnecting from bad outbound peers in IBD (Suhas Daftuar) Pull request description: The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD. The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours: For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer. We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains. We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate. Tree-SHA512: 2e0169a1dd8a7fb95980573ac4a201924bffdd724c19afcab5efcef076fdbe1f2cec7dc5f5d7e0a6327216f56d3828884f73642e00c8534b56ec2bb4c854a656
The first commit will disconnect an outbound peer that serves us a headers chain with insufficient work while we're in IBD.
The second commit introduces a way to disconnect outbound peers whose chains fall out of sync with ours:
For a given outbound peer, we check whether their best known block (which is known from the blocks they announce to us) has at least as much work as our tip. If it doesn't, we set a 20 minute timeout, and if we still haven't heard about a block with as much work as our tip had when we set the timeout, then we send a single getheaders message, and wait 2 more minutes. If after two minutes their best known block has insufficient work, we disconnect that peer.
We protect 4 of our outbound peers (who provide some "good" headers chains, ie a chain with at least as much work as our tip at some point) from being subject to this logic, to prevent excessive network topology changes as a result of this algorithm, while still ensuring that we have a reasonable number of nodes not known to be on bogus chains.
We also don't require our peers to be on the same chain as us, to prevent accidental partitioning of the network in the event of a chain split. Note that if our peers are ever on a more work chain than our tip, then we will download and validate it, and then either reorg to it, or learn of a consensus incompatibility with that peer and disconnect. This PR is designed to protect against peers that are on a less work chain which we may never try to download and validate.