Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ETCM-77] checkpoint syncing #728

Merged
merged 3 commits into from
Oct 21, 2020
Merged

[ETCM-77] checkpoint syncing #728

merged 3 commits into from
Oct 21, 2020

Conversation

rtkaczyk
Copy link
Contributor

@rtkaczyk rtkaczyk commented Oct 9, 2020

Description

  • Handling new checkpoint messages to RegularSync/BlockImporter
  • Checkpoint block generation
  • Branch comparison logic in Ledger/BranchResolution.

Relevant unit tests added.

@pslaski pslaski force-pushed the etcm-75-blockchain-support-checkpoint branch from 643d0c9 to a2edd59 Compare October 9, 2020 14:18
@rtkaczyk rtkaczyk force-pushed the etcm-77-checkpoint-syncing branch from 5da21d9 to a922376 Compare October 9, 2020 14:23
@pslaski pslaski force-pushed the etcm-75-blockchain-support-checkpoint branch 3 times, most recently from 913a042 to 28fa411 Compare October 13, 2020 11:14
Base automatically changed from etcm-75-blockchain-support-checkpoint to develop October 13, 2020 14:07
@rtkaczyk rtkaczyk force-pushed the etcm-77-checkpoint-syncing branch 2 times, most recently from 4cbeed0 to fbd1980 Compare October 14, 2020 08:05
@rtkaczyk rtkaczyk marked this pull request as ready for review October 14, 2020 08:06
@rtkaczyk rtkaczyk requested review from ntallar and pslaski October 14, 2020 08:35
Copy link
Contributor

@pslaski pslaski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you port CheckpointBlockGeneratorSpec too ?

case nc @ NewCheckpoint(parentHash, signatures) =>
if (state.importing) {
//TODO: is this ok? What delay?
context.system.scheduler.scheduleOnce(100.millis, self, nc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO we can treat it like other block types and don't try to re-import it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if all the federation nodes happen to be importing other blocks when a new checkpoint appears? Wouldn't it be lost forever?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming that we know OBFT nodes and federation nodes, they shouldn't start producing checkpoints before being on top of the chain, so I'm ok with your restart logic. The only thing worth to consider is higher restart time, maybe something ~1 s, to give to node more time to re-sync

import ImportMessages._
override def preImport(): LogEntry = (DebugLevel, s"Importing new checkpoint block (${block.idTag})")
override def importedToTheTop(): LogEntry =
(DebugLevel, s"Added new checkpont block $number to top of the chain")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo -> checkpont


def generate(parent: Block, checkpoint: Checkpoint): Block = {
val blockNumber = parent.number + 1
// we are using a predictable value for timestamp so that each federation node generates identical block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add information about ETCM-173 here

val header = BlockHeader(
parentHash = parent.hash,
ommersHash = BlockHeader.EmptyOmmers,
beneficiary = Address(0).bytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockHeader.EmptyBeneficiary

}

private def isBetterBranch(block: Block, bestBlock: Block, newTd: BigInt, currentTd: BigInt): Boolean =
newTd > currentTd || (blockchainConfig.gasTieBreaker && newTd == currentTd && block.header.gasUsed > bestBlock.header.gasUsed)
private def isBetterBranch(block: Block, bestBlock: Block, newTd: BigInt, currentTd: BigInt): Boolean = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this ? what about BranchResolution.compareByCheckpoints / compareByDifficulty ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's what it is. Unlike in the reference implementation, we have the branch comparison logic in 2 places. I'm a bit hesitant to changing anything with this regard, as syncing on the whole could use a bit of revamping

val blockPromise: Promise[BlockImportResult] = Promise()
ledger.setImportResult(block, () => blockPromise.future)

// FIXME: Why the hell is this needed???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the core changes, have you done any manual tests with mainnet or mordor?

import io.iohk.ethereum.domain._
import io.iohk.ethereum.ledger.BloomFilter

class CheckpointBlockGenerator(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this duplicated with the CheckpointingTestHelpers?

blockchainConfig.gasTieBreaker && newTd == currentTd && block.header.gasUsed > bestBlock.header.gasUsed

(block.hasCheckpoint, bestBlock.hasCheckpoint) match {
case (true, true) => false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we untie here by betterTd || tieBreaker? For the case that we are comparing checkpoint of block 5 and checkpoint of block 10 for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this way we are opening the door to reverting checkpoints. I'd rather not do that until we have a clear plan on how to deal with that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm and probably it can never be the case that we are comparing blocks from the same branch? That is C5 and C10 from B1<-...<-C5<-...<-C10

Should we at least add a comment then?

I also find it odd that we are allowing reverting checkpoints in all our codebase (i.e. blockchain object and https://github.com/input-output-hk/mantis/pull/728/files#diff-be955b26fb84ed57e0ea600e45c78fb94d45dad720746529236701d67b14e843R49) but here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, there is a door for a revert. What if we changed this expression to something like:

newCheckpoint.number > oldCheckpoint.number && newBranch.contains(oldCheckpoint)

and compare checkpoints by hashes (rather than by numbers) a few lines above.

WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

However wasn't a rule like this what originally caused the issue on our testnet some time ago? Was that fixed on the OBFT side?

@rtkaczyk rtkaczyk force-pushed the etcm-77-checkpoint-syncing branch 2 times, most recently from e7200e4 to 414f1cc Compare October 19, 2020 20:33
@rtkaczyk rtkaczyk force-pushed the etcm-77-checkpoint-syncing branch from 414f1cc to 70672f2 Compare October 20, 2020 09:16
@rtkaczyk
Copy link
Contributor Author

rtkaczyk commented Oct 20, 2020

SyncControllerSpec is failing non-deterministically. Waiting for #742 which fixes the issue (cc @KonradStaniec )

Edit: Fixed!

@rtkaczyk rtkaczyk force-pushed the etcm-77-checkpoint-syncing branch from 70672f2 to f0416e0 Compare October 20, 2020 13:25
Copy link

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments and LGTM!

@@ -170,10 +171,12 @@ class BlockValidatorSpec extends AnyFlatSpec with Matchers with SecureRandomBuil
val validCheckpoint = Checkpoint(CheckpointingTestHelpers.createCheckpointSignatures(keys, validBlockHeader.hash))

val validBlockHeaderWithCheckpoint =
CheckpointingTestHelpers.createBlockHeaderWithCheckpoint(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are still using the CheckpointingTestHelpers.createBlockHeaderWithCheckpoint on the BlockchainSpec, should we remove it from there as well?

lazy val tieBreaker =
blockchainConfig.gasTieBreaker && newTd == currentTd && block.header.gasUsed > bestBlock.header.gasUsed

(block.hasCheckpoint, bestBlock.hasCheckpoint) match {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check for the presence of checkpoints in the both branches and not just the top block of them?

Maybe there's something that I'm missing here, this component is difficult to follow

Copy link
Contributor Author

@rtkaczyk rtkaczyk Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! Let's consider 2 branches:

  • B0 -> C -> B1
  • B0 -> B2 -> B3

If we're considering B3 for import and it has a greater TD than B1, then checkpoint C would be reverted by a regular block.

This simple method cannot be adapted to handling checkpoints. What I think we could do is obtain the branch for the newly enqueued block (BlockQueue#getBranch: here) and then compare it using BranchResolution#resolveBranch, which as a bonus would DRY the code.

WDYT?

Copy link

@ntallar ntallar Oct 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to remove this code duplication, my only comments are:

  • I feel like we are doing something unnecesary by having both the importing and resolving process do this sort of comparisons. However at this stage is probably something we'll have to live with
  • For comparison shouldn't we use BranchResolution#compareByCheckpoints instead? There a several thing surrounding BranchResolution#resolveBranch that are maybe not necessary

But the above are just technical questions, totally agree with re-using code here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However at this stage is probably something we'll have to live with

The only way I see around is to take advantage of the improvements in our testnet

There a several thing surrounding BranchResolution#resolveBranch that are maybe not necessary

Not that much really:

  • checking the parent
  • checking that headers from queue form a chain

Avoiding that shouldn't be worth refactoring

Copy link

@ntallar ntallar Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way I see around is to take advantage of the improvements in our testnet

What improvements do you mean?

Btw yesterday I thought of an alternative to your idea, which is closer to what is currently in place on the BlockQueue; if our leaves would be as Leaf(hash: ByteString, totalDifficulty: BigInt, lastCheckpointBlockNumber) can't this function be adapted to compare those lastCheckpointBlockNumber? Obviously, if we decide that mantis doesn't allow reverts then that information is not enough... but maybe we could rely on OBFT being the one that doesn't produce forked checkpoints.

I'm always in favour of reducing replicated codebase but your proposed change probably should be associated with more refactoring (for example, is there any purpose for tracking total difficulty of the leaves if it's not used here?) and maybe impact to the performance (just guessing, it might not be the case).

Wdyt?

blockchainConfig.gasTieBreaker && newTd == currentTd && block.header.gasUsed > bestBlock.header.gasUsed

(block.hasCheckpoint, bestBlock.hasCheckpoint) match {
case (true, true) => false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm and probably it can never be the case that we are comparing blocks from the same branch? That is C5 and C10 from B1<-...<-C5<-...<-C10

Should we at least add a comment then?

I also find it odd that we are allowing reverting checkpoints in all our codebase (i.e. blockchain object and https://github.com/input-output-hk/mantis/pull/728/files#diff-be955b26fb84ed57e0ea600e45c78fb94d45dad720746529236701d67b14e843R49) but here

Copy link

@ntallar ntallar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR is blocking #729, #738 and the creation of new PRs for other issues, we agreed on merging it as is, further comments will be addressed on a new PR.

Only syncing with checkpointing will remain buggy, but it won't be used on the short term

 .\\            //.
. \ \          / /.
.\  ,\     /` /,.-
 -.   \  /'/ /  .
 ` -   `-'  \  -
   '.       /.\`
      -    .-
      :`//.'
      .`.'
      .' BP 

@ntallar ntallar merged commit 18c25a4 into develop Oct 21, 2020
@ntallar ntallar deleted the etcm-77-checkpoint-syncing branch October 21, 2020 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants