Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETCM-313] and [ETCM-316]: Header skeleton using new branch resolver #892

Merged
merged 38 commits into from
Mar 16, 2021

Conversation

enriquerodbe
Copy link
Contributor

@enriquerodbe enriquerodbe commented Jan 13, 2021

Description

See ETCM-313 and ethereum/go-ethereum#2315

Proposed Solution

  1. From one peer, called the master peer: Download a skeleton of N headers1, spaced by N-1 headers in between, starting at bestBlockHeaderNumber + N (i.e. block headers bestBlockHeaderNumber + kN where 1 <= k <= N). To achieve this, use the skip parameter in the GetBlockHeaders request.
  2. When the skeleton is received, validate that it corresponds to the requested headers and also check the PoW of the headers.
  3. If something is wrong, blacklist the peer and restart.
  4. Enqueue the numbers of the headers starting each batch in the skeleton. This means, enqueue for downloading block numbers: bestBlockHeaderNumber + 1 + kN where 0 <= k <= N-1.
  5. Download the batches of N sequential headers, in parallel from multiple peers (potentially the master peer as well) by using the enqueued block numbers. Notice there are N batches of size N to download in parallel, so we are requesting N*N block headers.
  6. For each downloaded batch, validate if:
    1. It is a valid sequence of headers, with valid parent-child hashes
    2. It fits the skeleton
  7. To "fit the skeleton" means:
    1. The first header of the batch must have number equal to any batch starting number (those requested in 4.).
    2. The last header of the batch must have number equal to any skeleton header (those requested in 1.).
    3. The last header's hash must be equal to the hash of the skeleton header found in ii.
    4. The penultimate header's hash must be equal to the parent hash of the skeleton header found in ii.
  8. If all validations pass, save this batch (for now, in memory) and continue downloading the rest.
  9. If validations 6.i, 7.i, or 7.ii fail, blacklist the peer and enqueue again the batch for downloading.
  10. If validations 7.iii or 7.iv fail, increment the counter batchFailuresCount. This is because these validation failures might mean that the master peer is sending us wrong data and that we couldn't find a matching batch from another peer.
  11. If on previous step fastSyncMaxBatchRetries is reached, start branch resolution.
  12. When all batches are downloaded, process the whole chain of headers as always (chain numbers & hashes with parent hashes).
  13. Enqueue block bodies and receipts as always.
  14. Start with a new skeleton once all header + body + receipts are downloaded and validated.

1 N is determined by blockHeadersPerRequest. Currently set exactly to its value but this is open for discussion.

A class called HeaderSkeleton handles the logic of the skeleton size, numbers, batches, gap size, and all that is needed from the FastSync actor. It is kept in memory so if the node goes down everything is recalculated.

Pending work

  • Before calling for the branch resolution, the FastSync actor has to rewind a configured number of blocks so the resolution actor starts from there.
  • Once we choose a master peer, it's better to stick to it and continue getting the skeletons from that same master peer unless it starts sending us wrong data. Currently on each round of assignBlockchainWork we just take the peer with the highest block number.
  • If a peer sends us an incomplete batch (one with less than N headers but valid nonetheless): We could check against that peer's best block so we know if it's malicious, and therefore should be blacklisted, or it may be just the highest block that the peer hast to offer, in which case just retry same batch from another peer.
  • We need to integrate with the new actors BlacklistSupport and PeerListSupport mentioned in the task ETCM-531.

@enriquerodbe enriquerodbe changed the base branch from develop to ETCM-311-fast_sync_improvements January 15, 2021 17:56
@robinraju robinraju self-assigned this Jan 20, 2021
@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch from bdc0dfb to e2618f6 Compare January 20, 2021 08:57
@1015bit 1015bit self-requested a review January 23, 2021 06:38
@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch 3 times, most recently from 88128d9 to 5c8e0f2 Compare February 1, 2021 07:44
@robinraju robinraju marked this pull request as ready for review February 1, 2021 12:24
@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch from 5c8e0f2 to 7608c20 Compare February 1, 2021 16:08
@robinraju robinraju marked this pull request as draft February 2, 2021 03:56
@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch from 7608c20 to d293268 Compare February 2, 2021 08:47
@robinraju robinraju force-pushed the ETCM-311-fast_sync_improvements branch from d293268 to 4c7150e Compare February 4, 2021 07:38
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PullRequest Breakdown

Reviewable lines of change

+ 1,784
- 210

56% Scala
43% Scala (tests)
1% Other (tests)
<1% Other
<1% XML (tests)

Type of change

Feature - These changes are adding a new feature or improvement to existing code.
1 Message
⚠️ Due to its size, this pull request will likely have a little longer turnaround time and will probably require multiple passes from our reviewers.

@1015bit 1015bit changed the title [ETCM-313] Download skeleton and then batch headers in parallel [!pr] [ETCM-313] and [ETCM-316]: Header skeleton using new branch resolver Mar 4, 2021
@1015bit 1015bit changed the base branch from ETCM-311-fast_sync_improvements to develop March 4, 2021 11:12
@1015bit
Copy link
Contributor

1015bit commented Mar 4, 2021

This PR now also includes the changes from #887

@1015bit
Copy link
Contributor

1015bit commented Mar 11, 2021

Wrapping up the work described in the initial PR description. Further work to extract the header skeleton download into a separate actor and get rid of most mutable state variables will be addressed in another PR.

@1015bit
Copy link
Contributor

1015bit commented Mar 11, 2021

Successfully synced with Sagano testnet.

@1015bit 1015bit marked this pull request as ready for review March 11, 2021 21:48
@1015bit 1015bit removed their request for review March 11, 2021 21:58
@1015bit
Copy link
Contributor

1015bit commented Mar 12, 2021

sagano.conf used for testing: enables fast-sync and debug logging

include "testnet-internal-nomad.conf"

mantis {
  sync {
    do-fast-sync = true
  }
}

akka {
  loglevel = "DEBUG"
}

and add the following line in logback.xml

<logger name="io.iohk.ethereum.blockchain.sync.fast.FastSync" level="DEBUG" />

…t be higher than the default max number of headers returned
@robinraju robinraju removed their assignment Mar 16, 2021
@1015bit 1015bit merged commit 04208b0 into develop Mar 16, 2021
@1015bit 1015bit deleted the ETCM-313-skeleton branch March 16, 2021 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants