dBFT 2.0 #547

vncoelho · 2019-01-14T19:01:03Z

PR with the key changes for the achievement of dBFT 2.0 (#426, #422, #534, among others...), an improved and optimization version of the pioner dBFT 1.0.

Key features will be merged here before final merge on the Master code.

Main contributors: @erikzhang, @shargon, @igormcoelho, @vncoelho, @belane, @wanglongfei88, @edwardz246003, @jsolman

* Add commit phase to consensus algorithm * fix tests * Prevent repeated sending of `Commit` messages * RPC call gettransactionheight (#541) * getrawtransactionheight Nowadays two calls are need to get a transaction height, `getrawtransaction` with `verbose` and then use the `blockhash`. Other option is to use `confirmations`, but it can be misleading. * Minnor fix * Shargon's tip * modified * Allow to use the wallet inside a RPC plugin (#536) * Clean code * Clean code

vncoelho · 2019-01-14T19:03:01Z

From #534, the only opened discussion was related to the Regeneration phase.

…e Genesis block.

vncoelho · 2019-01-22T20:18:54Z

@erikzhang, do you have an idea for the template for the Regeneration?
If you want we can try to adjust the ideas of #426 to the current context? I think that the main difference will be about how to track and share the Signatures of the Payloads, which we will need to get the from Witness in order to correctly regenerate.

erikzhang · 2019-01-23T08:04:57Z

I think regeneration should have two levels. The first is to achieve state recovery by reading the log recorded by the node, and the second is to achieve state synchronization through the replay of network messages.

vncoelho · 2019-01-23T13:28:16Z

Sounds good, Erik.
Replaying network messages is a little different than the previous approach.
I just wonder if we also follow that line with a single Payload that can regenerate the node, because it contains signatures of the previous agreements. Do you think that Messages with plenty of signatures hashes (imagining 1000 nodes) is a bad design?

) * Prevent `ConsensusService` from receiving messages before starting * fixed tests - calling OnStart now

* Pass store to `ConsensusService` * Implement `ISerializable` in `ConsensusContext` * Start from recovery log * Fix unit tests due to constructor taking the store. * Add unit tests for serializing and deserializing the consensus context.

jsolman · 2019-01-25T02:57:28Z

@erikzhang I added also saving consensus context upon changing view, check #575

vncoelho · 2019-01-25T03:39:01Z

Nice to see this as NEO 3.0, @erikzhang, deserved. If we start to think deeply the set of changes of this PR summarizes knowledge, backgrounds, scientific studies, insights and discussions of several involved agents.

…tacks from malicious primary (#576)

…ned.

vncoelho · 2019-01-28T19:49:02Z

Perfect, @erikzhang, it seems to be working as expected.

I did not check the Regeneration yet, in an intensive manner.
I think that we also need a Regeneration strategy for when a node asks for Change View and the other nodes are already in commit phase waiting for final signatures, as done in the other PR.
With this last feature we gonna probably reach a code ready for an initial merge into the master.

neo/Consensus/ConsensusService.cs

vncoelho · 2019-02-09T18:59:43Z

After #579 is merged this PR might be ready for being integrated into master.

Considering recent discussions that have been carried on in such aforementioned thread, we fell that the complexity of the Regeneration/Recover mechanism is reaching another level of quality.
In this sense, after this current PR is merged we might have an open field for improving the Reason for each ChangeView Request. With such description, we might be able to create additional mechanisms that filter such Reasons and Recover each node accordingly (letting view changes only for cases with Inconsistencies).

In connection with a Redundant mechanism for replacing Speakers (in case of delays) we might experience another level of Consensus in the next phase of improvements that will be carried on after this PR.

Thanks to everyone who has been contributing to this PR.

jsolman · 2019-02-10T10:46:20Z

@vncoelho I think we will also want to merge #575 into this along with #579 before merging it to master.

vncoelho

It looks all correct. In my tests everything is running almost 100%.

In some tests I got some delays in recovering changeviews, however, might be my configuration of iptables and network disabling.

I will take another check as soon as possible.

neo/Consensus/ConsensusService.cs

vncoelho · 2019-02-24T08:46:32Z

neo/Consensus/ConsensusService.cs

+                {
+                    var eligibleResponders = context.Validators.Length - 1;
+                    var chosenIndex = (payload.ValidatorIndex + i + message.NewViewNumber) % eligibleResponders;
+                    if (chosenIndex >= payload.ValidatorIndex) chosenIndex++;


I did not get this point here. Why to jump one here? The next iteration of the loop would play this role.

Because the ValidatorIndex is the requester’s index. The requester does not respond to his own ChangeView message with a recovery.

If instead it moved to the next iteration of the loop that would mean there would be one less potential responders. It is preferable to increment the chosenIndex to allow the correct number of nodes to respond with recovery.

neo/Consensus/ConsensusService.cs

erikzhang · 2019-02-24T14:24:12Z

One more optimization we could make: If a node sends ChangeView with a NewViewNumber less than the expected view for that node, a recovery message could be used to send that node it’s previously sent changeview message so it could immediately jump it’s expected view forward. Thus if nodes crash and come back before nodes agree on changing to a higher view we can recover faster in that case.

I am afraid that even if the node has previously sent a ChangeView message, it cannot be changed directly to the view. It must have 2f+1 messages.

vncoelho · 2019-02-24T17:15:29Z

Hi @erikzhang, I agree with you.
I think that the case that @jsolman mentioned is not for change the view itself, it is just to speed up the request of the change view.
From my perspective, it can be done without problems since the node would had already signed that payload requesting view in a higher level.
Even if such case could be rare I believe that it happens some time (for example, 1 per day), thus, everything we can cover we gonna be benefited in the future.

But maybe as Jeff said we could do these additional improvements in other PRs in the near future and try to merge this one here when ready.

jsolman · 2019-02-24T17:41:01Z

I am afraid that even if the node has previously sent a ChangeView message, it cannot be changed directly to the view. It must have 2f+1 messages.

I am not suggesting it change the view directly if had sent in the past; I was just suggesting it start it’s expectedView where it left off after it was restarted. It’s an optimization that can wait for later if we decide to implement it.

erikzhang · 2019-02-25T02:42:51Z

The tests NGD is doing are expected to be completed on March 1.

jsolman · 2019-02-25T03:30:02Z

The tests NGD is doing are expected to be completed on March 1

Great! I will do some additional testing between now and then as well then.

jsolman · 2019-02-25T20:58:27Z

Testing with #608 looks good so far.

…ck persist. (#608)

jsolman

These changes introduce a networking incompatibility with the existing old clients. See my included comment.

neo/Network/P2P/Payloads/ConsensusPayload.cs

#609 resolves my last review comments

vncoelho · 2019-02-26T19:15:09Z

[19:12:54.896] chain sync: expected=379 current=354 nodes=7
[19:12:55.175] timeout: height=355 view=0 state=Backup, RequestReceived, ResponseSent, CommitSent
[19:12:55.176] send recovery to resend commit

When a CN is lagging behind and timeout is this message expected?
Maybe it should stop consensus layer until synced or something like that.

vncoelho · 2019-02-26T20:14:59Z

The current version is online at: https://neocompiler.io/#/ecolab

Click on Consensus Info for following consensus logs.
The consensus behavior is quite impressive in such a limited computational resource we have there, as well as the amount of plugins and tools that are running there.

@erikzhang, do you think that we can update the variable SecondsPerBlock to ms in this PR? This is important for our tests with low latency IoT devices, which is a promising field of application for private chains.

jsolman · 2019-02-27T00:39:53Z

When a CN is lagging behind and timeout is this message expected?
Maybe it should stop consensus layer until synced or something like that.

Yes, it is expected. From it's perspective it hasn't received newer blocks yet and it hasn't received enough commits for the block it is on, so it will periodically send recovery until it either receives enough commits to persist the block it is on, or until it receives the next block that others have already persisted.

erikzhang · 2019-02-27T07:50:20Z

do you think that we can update the variable SecondsPerBlock to ms in this PR?

We can reach a consensus within 1 second?

vncoelho · 2019-02-27T08:04:22Z

I think so, Erik, aehuaheauah....even less nowadays.
On the other hand, before that Refactoring that you conducted with Akka it was something that we were seeing far away. That move with Akka was incredible and a huge step with "quite simple" modifications for the consensus communication (in the future we also believe that Akka can be improved as we are discussing in that other thread with @lightszero, however, it was a precise, quick and great move).

We are talking about local blockchain networks, for example, a bank, supply chain enterprises, or a database partially-centralized running CNs. There are some useful application for low latency systems, for example:

a drone reporting data in a journey for verification of energy transmission grid;
sensors applied for cities' surveillance;
tracking production of things in a supply chain.

Such systems could for example, just need few couple of tx's per seconds in private blockchain, however, requiring them to be published as soon as possible.
In this sense, we need, for the sake of publishing certificates, something that ensures that the information was confirmed inside the network as quick as possible.
In addition, this quick blocks persistence has great potential with some timestamps provided by hardware-based trusted execution environments (TEEs).

It would be great if we could change it now.

We need to be sure that the exponential view_timeout formula is not affected with the change.

erikzhang · 2019-02-28T08:26:18Z

neo/neo/Network/P2P/Payloads/BlockBase.cs

Line 19 in 91e006c

public uint Timestamp;

One difficulty is that Block.Timestamp is accurate to the second. If the consensus time is less than one second, it may cause some problems.

vncoelho · 2019-02-28T11:15:54Z

Don't worry, Erik. Let's do it later, in some more couple of weeks.
Thanks for the support as always.

jsolman

Changes look very nice, and it is working well from my testing. I'm looking forward to see the results from NGD testing that should be complete soon.

igormcoelho · 2019-03-04T20:14:33Z

A day to celebrate! Congratulations to everyone that worked hard to solve this issue, especially: @vncoelho @jsolman @shargon and others... to not forget @erikzhang, as always :)

erikzhang and others added 2 commits January 14, 2019 16:58

Merge branch 'master' into consensus/improved_dbft

0ae8d62

vncoelho mentioned this pull request Jan 14, 2019

Add commit phase to consensus algorithm #534

Merged

vncoelho and others added 3 commits January 14, 2019 17:08

Minor fix on mempoolVerified

c58cbfd

Add MemoryPool Unit tests. Fix bug on initital start of Persisting th…

cdedb79

…e Genesis block.

Merge branch 'master' into consensus/improved_dbft

c90dec3

jsolman mentioned this pull request Jan 22, 2019

block sync stopped at 3260162 neo-project/neo-node#292

Closed

Merge branch 'master' into consensus/improved_dbft

485487f

erikzhang added the Critical Issues (bugs) that need to be fixed ASAP label Jan 24, 2019

erikzhang added 2 commits January 24, 2019 18:43

Prevent ConsensusService from receiving messages before starting (#573

b207227

) * Prevent `ConsensusService` from receiving messages before starting * fixed tests - calling OnStart now

Consensus recovery log (#572)

9aa6527

* Pass store to `ConsensusService` * Implement `ISerializable` in `ConsensusContext` * Start from recovery log * Fix unit tests due to constructor taking the store. * Add unit tests for serializing and deserializing the consensus context.

erikzhang added this to the NEO 3.0 milestone Jan 25, 2019

erikzhang added 3 commits January 25, 2019 13:07

Combine ConsensusContext.ChangeView() and ConsensusContext.Reset()

0855bdc

Add PreparationHash field to PrepareResponse to prevent replay at…

0e87248

…tacks from malicious primary (#576)

Fixed a problem where PrepareResponse.PreparationHash was not assig…

bf0ca7c

…ned.

jsolman reviewed Jan 29, 2019

View reviewed changes

neo/Consensus/ConsensusService.cs Outdated Show resolved Hide resolved

This was referenced Jan 30, 2019

block sync stopped at 3293298 neo-project/neo-node#294

Closed

No new blocks in neotracker.io #578

Closed

Load context from store only when height matches

7304274

jsolman mentioned this pull request Feb 11, 2019

Restore MemoryPool transactions from saved consensus context. #575

Closed

Recover nodes requesting ChangeView when possible (#579)

be37423

vncoelho commented Feb 24, 2019

View reviewed changes

Refactoring

eaeb0a5

erikzhang dismissed jsolman’s stale review via eaeb0a5 February 25, 2019 07:36

AggressiveInlining (#606)

06dc6b9

Reset Block reference when consensus context is initialized after blo…

5d5046c

…ck persist. (#608)

jsolman previously requested changes Feb 26, 2019

View reviewed changes

neo/Network/P2P/Payloads/ConsensusPayload.cs Show resolved Hide resolved

erikzhang added 2 commits February 26, 2019 10:52

Merge branch 'master' into consensus/improved_dbft

b9b595e

Change ConsensusPayload for compatibility (#609)

91e006c

JustinR1 mentioned this pull request Feb 27, 2019

create tags to versions #612

Closed

Merge branch 'master' into consensus/improved_dbft

335d7b7

jsolman approved these changes Mar 2, 2019

View reviewed changes

erikzhang approved these changes Mar 3, 2019

View reviewed changes

erikzhang merged commit f88c427 into master Mar 3, 2019

erikzhang deleted the consensus/improved_dbft branch March 3, 2019 10:56

jsolman mentioned this pull request Mar 18, 2019

Minor additional flag when processing OnChangeViewReceived #641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dBFT 2.0 #547

dBFT 2.0 #547

vncoelho commented Jan 14, 2019 •

edited

Loading

vncoelho commented Jan 14, 2019

vncoelho commented Jan 22, 2019 •

edited

Loading

erikzhang commented Jan 23, 2019

vncoelho commented Jan 23, 2019 •

edited

Loading

jsolman commented Jan 25, 2019

vncoelho commented Jan 25, 2019

vncoelho commented Jan 28, 2019 •

edited

Loading

vncoelho commented Feb 9, 2019 •

edited

Loading

jsolman commented Feb 10, 2019

vncoelho left a comment

vncoelho Feb 24, 2019

jsolman Feb 24, 2019 •

edited

Loading

jsolman Feb 24, 2019

erikzhang commented Feb 24, 2019

vncoelho commented Feb 24, 2019 •

edited

Loading

jsolman commented Feb 24, 2019

erikzhang commented Feb 25, 2019

jsolman commented Feb 25, 2019

jsolman commented Feb 25, 2019

jsolman left a comment

vncoelho commented Feb 26, 2019

vncoelho commented Feb 26, 2019 •

edited

Loading

jsolman commented Feb 27, 2019 •

edited

Loading

erikzhang commented Feb 27, 2019

vncoelho commented Feb 27, 2019 •

edited

Loading

erikzhang commented Feb 28, 2019

vncoelho commented Feb 28, 2019

jsolman left a comment

igormcoelho commented Mar 4, 2019 •

edited

Loading

dBFT 2.0 #547

dBFT 2.0 #547

Conversation

vncoelho commented Jan 14, 2019 • edited Loading

vncoelho commented Jan 14, 2019

vncoelho commented Jan 22, 2019 • edited Loading

erikzhang commented Jan 23, 2019

vncoelho commented Jan 23, 2019 • edited Loading

jsolman commented Jan 25, 2019

vncoelho commented Jan 25, 2019

vncoelho commented Jan 28, 2019 • edited Loading

vncoelho commented Feb 9, 2019 • edited Loading

jsolman commented Feb 10, 2019

vncoelho left a comment

Choose a reason for hiding this comment

vncoelho Feb 24, 2019

Choose a reason for hiding this comment

jsolman Feb 24, 2019 • edited Loading

Choose a reason for hiding this comment

jsolman Feb 24, 2019

Choose a reason for hiding this comment

erikzhang commented Feb 24, 2019

vncoelho commented Feb 24, 2019 • edited Loading

jsolman commented Feb 24, 2019

erikzhang commented Feb 25, 2019

jsolman commented Feb 25, 2019

jsolman commented Feb 25, 2019

jsolman left a comment

Choose a reason for hiding this comment

vncoelho commented Feb 26, 2019

vncoelho commented Feb 26, 2019 • edited Loading

jsolman commented Feb 27, 2019 • edited Loading

erikzhang commented Feb 27, 2019

vncoelho commented Feb 27, 2019 • edited Loading

erikzhang commented Feb 28, 2019

vncoelho commented Feb 28, 2019

jsolman left a comment

Choose a reason for hiding this comment

igormcoelho commented Mar 4, 2019 • edited Loading

vncoelho commented Jan 14, 2019 •

edited

Loading

vncoelho commented Jan 22, 2019 •

edited

Loading

vncoelho commented Jan 23, 2019 •

edited

Loading

vncoelho commented Jan 28, 2019 •

edited

Loading

vncoelho commented Feb 9, 2019 •

edited

Loading

jsolman Feb 24, 2019 •

edited

Loading

vncoelho commented Feb 24, 2019 •

edited

Loading

vncoelho commented Feb 26, 2019 •

edited

Loading

jsolman commented Feb 27, 2019 •

edited

Loading

vncoelho commented Feb 27, 2019 •

edited

Loading

igormcoelho commented Mar 4, 2019 •

edited

Loading