Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETCM-186] Fix strict pick with too few blocks #723

Merged
merged 3 commits into from
Oct 7, 2020

Conversation

ntallar
Copy link

@ntallar ntallar commented Oct 5, 2020

Description

Fixes branch resolution when triggered at the beginning of the chain, that caused sync to end in irreparable case

Issue

Given 2 nodes

  1. If both have chains of the same weight but forked (top block with number X), when node 1 requests the last blocks from node 2 it won't import any
    This step results in node 1 logging: Imported no blocks

  2. Node 2 extends it chain with several mined blocks

  3. Node 1 asks for blocks starting from block X+1, but as node 2 forked they won't be concatenable
    This step results in node 1 logging:

     Unknown branch, going back to block nr -2 in order to resolve branches
     Invalidate blocks from -2
    
  4. Next request from node 1 will be from an earlier block than X, ideally before the fork, node 1 currently gets stuck here with logging:

     Strict Pick blocks from -2 to 10
     Lowest available block is 0
    

How to reproduce

It's very hard to reproduce this automatically in an integration test level (maybe after ETCM-127 it can be easier to include a test)

  1. Change code to:
    a. Delay requests for 2 minutes to allow time for mining blocks in between. Replace the fetchHeaders function from BlockFetcher with:

     private def fetchHeaders(state: BlockFetcherState): Unit = {
       val blockNr = state.nextToLastBlock
       val amount = syncConfig.blockHeadersPerRequest
    
       log.info("Sleeping for 2 minutes before fetching headers")
       Thread.sleep(2 * 60 * 1000)
       fetchHeadersFrom(blockNr, amount) pipeTo self
     }
    

    b. Have not every block be broadcasted so as to simplify the whole process, the last blocks should be broadcasted so as to trigger a new fetch. Replace the broadcastBlock function from BlockBroadcast with:

     def broadcastBlock(newBlock: NewBlock, handshakedPeers: Map[Peer, PeerInfo]): Unit = {
        val peersWithoutBlock = handshakedPeers.collect {
           case (peer, peerInfo) if shouldSendNewBlock(newBlock, peerInfo) => peer }.toSet
    
        if(newBlock.block.number > 14) {
           broadcastNewBlock(newBlock, peersWithoutBlock)
    
           if (syncConfig.broadcastNewBlockHashes) {
             // NOTE: the usefulness of this message is debatable, especially in private networks
             broadcastNewBlockHash(newBlock, peersWithoutBlock)
           }
        }
      }
    
  2. Start at the same time 2 nodes connected to each other

  3. While they are both blocked in their sleep, mine 10 blocks in each

  4. Await for node 1 fetching the 10 blocks from node 2 and failing to import them (with log Imported no blocks)

  5. Mine 10 blocks on node 2, the broadcasting of them should trigger a re-fetch from 1

  6. That will halt node 1 progress with infinite logs:

    2020-10-05 10:07:59,343 [i.i.e.b.sync.regular.BlockFetcher] - Strict Pick blocks from -2 to 10
    2020-10-05 10:07:59,343 [i.i.e.b.sync.regular.BlockFetcher] - Lowest available block is 0
    

Solution

Strict pick from value is capped to 1 in case it's lower than it.

Our current code wasn't working due to the condition .filter(_.headOption.exists(block => block.number <= lower)) on strictPickBlocks, which is never true in case lower is a negative number. The latter never happens after capping the from value

Testing

Attempting to reproduce it after the fix should result in node 1 not halting itself and importing the last 10 blocks from node 2

Up to discussion

I'm not sure how the node will handle if resolving branches up to 1000 blocks as configured on the testnet, maybe we should change it to 100 so that it requires a single message for that resolve? Further analysis of the sync process should be done if not

@ntallar ntallar added the bug Something isn't working label Oct 5, 2020
@ntallar ntallar force-pushed the etcm-186-fix-strict-pick branch from 3c6e02f to cb179e9 Compare October 5, 2020 15:47
@ntallar ntallar added the WIP label Oct 5, 2020
@ntallar ntallar force-pushed the etcm-186-fix-strict-pick branch 4 times, most recently from 80e62ce to 2c3352f Compare October 5, 2020 18:34
@ntallar ntallar changed the title [ETCM-186] Fix strict pick [ETCM-186] Fix strict pick with too few blocks Oct 5, 2020
@ntallar ntallar removed the WIP label Oct 6, 2020
@ntallar ntallar force-pushed the etcm-186-fix-strict-pick branch from 2c3352f to 7c8dd37 Compare October 6, 2020 15:34
@@ -68,14 +68,17 @@ class BlockFetcher(
private def handleCommands(state: BlockFetcherState): Receive = {
case PickBlocks(amount) => state.pickBlocks(amount) |> handlePickedBlocks(state) |> fetchBlocks
case StrictPickBlocks(from, atLeastWith) =>
val minBlock = from.min(atLeastWith).max(1)
log.debug("Strict Pick blocks from {} to {}", from, atLeastWith)
// FIXME: Consider having StrictPickBlocks calls guaranteeing this
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally decided not touching this on this PR to minimize the impact it has on our syncing process

@kapke
Copy link
Contributor

kapke commented Oct 7, 2020

It all makes sense to me.

Do you think that writing a test in similar fashion to "go back to earlier block in order to find a common parent with new branch" would be too hard to expose that issue?

@mirkoAlic
Copy link
Contributor

This is need it as soon as possible for experimental testnet purposes, so i will merge it as it is. Any future improvement could be do it later. cc @jmendiola222, @kapke

@mirkoAlic mirkoAlic merged commit 327d295 into develop Oct 7, 2020
@mirkoAlic mirkoAlic deleted the etcm-186-fix-strict-pick branch October 7, 2020 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants