Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit phase - AKKA Model #422

Closed
wants to merge 10 commits into from
Closed

Conversation

shargon
Copy link
Member

@shargon shargon commented Oct 23, 2018

Is the same implementation as #320

TODO:

  • Ported to AKKA model
  • We need to lock the commit phase
  • Tested in private environment
  • Define if we must send the signature in commit message or not
  • We need to recover the consensus state when a CN is rebooted

Fixes #193
Fixes neo-project/neo-node#219

@shargon shargon added Enhancement Type - Changes that may affect performance, usability or add new features to existing modules. Critical Issues (bugs) that need to be fixed ASAP labels Oct 23, 2018
@shargon shargon mentioned this pull request Oct 23, 2018
3 tasks
@shargon
Copy link
Member Author

shargon commented Oct 23, 2018

At this moment, the unique different line is this, the lock:

https://github.com/neo-project/neo/pull/422/files#diff-0285c1c12d1d492897a99ffe07f9fed9R77

I am working on recover the consensus state

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shargon brodas,

I saw that change view is now being blocked with if (context.State.HasFlag(ConsensusState.CommitSent)) return at CheckExpectedView method.

If the comments I mentioned are applied, then, we would send a partial signature and we would need Two Counters now (context.SignaturesPartial and context.Signatures).

After veryfing that context.SignaturesPartial is fulfilled at line

if (!context.State.HasFlag(ConsensusState.CommitSent) &&
, we should block change view (as you implemented).

Then, the CN would enter in a loop waiting for the real signatures context.Signatures.
For me, the point is: The same problem of the "fork" can happen here, some CN entering in the BlockChangeView (commit phase) and others not.
However, there should be a mechnism in which Good Nodes would be able to also enter commit phase, for this purpose, the nodes in the Commit Phase (Blocking the view) should Re-Send their Partial Signatures in order that good nodes (that possible entered in a loop due to connections looses) will occasionally join the commit phase waiting for the Real Signatures.

@@ -82,8 +84,30 @@ private void CheckExpectedView(byte view_number)

private void CheckSignatures()
{
if (!context.State.HasFlag(ConsensusState.CommitSent) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shargon,
I believe that here we should check the partial signatures instead if the complete and, then, at lines 353 (

context.Signatures[context.MyIndex] = context.MakeHeader().Sign(context.KeyPair);
) and 52 (
context.Signatures[context.MyIndex] = context.MakeHeader().Sign(context.KeyPair);
) they should not be send.

@shargon
Copy link
Member Author

shargon commented Oct 23, 2018

I think that the priority should be fix the network errors. Obiously we need to find the right way to fix the malicious CN who can produce a fork, but i think ... if we send the complete signatures on the commit, we are doing the same as without this patch, i need to think about it 🤔

@vncoelho
Copy link
Member

vncoelho commented Oct 23, 2018

I also agree that the priority is to fix the network errors.
But if yes, we should remove the block of the Change View after Commit phase, this block only makes sense if we implement the partial signature (which is a feature that additionally tries to tackle the Malicious Nodes behavior)

@belane
Copy link
Member

belane commented Oct 23, 2018

good job @shargon, it's working on private net.

==> node1/neo-cli/Logs/2018-10-23.log <==
[12:54:19.743] initialize: height=27 view=0 index=2 role=Backup
[12:54:34.755] OnPrepareRequestReceived: height=27 view=0 index=3 tx=1
[12:54:34.756] send prepare response
[12:54:36.606] OnPrepareResponseReceived: height=27 view=0 index=0
[12:54:36.607] Commit sent: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 state=Backup, RequestReceived, SignatureSent, CommitSent
[12:54:36.610] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=3
[12:54:36.612] OnPrepareResponseReceived: height=27 view=0 index=1
[12:54:36.614] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=1
[12:54:36.618] relay block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:36.618] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=0
[12:54:36.620] OnPrepareResponseReceived: height=27 view=0 index=1
[12:54:36.624] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=3
[12:54:36.626] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=1
[12:54:36.630] persist block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:36.631] initialize: height=28 view=0 index=2 role=Backup

==> node2/neo-cli/Logs/2018-10-23.log <==
[12:54:19.750] initialize: height=27 view=0 index=0 role=Backup
[12:54:34.755] OnPrepareRequestReceived: height=27 view=0 index=3 tx=1
[12:54:34.755] send prepare response
[12:54:34.759] OnPrepareResponseReceived: height=27 view=0 index=2
[12:54:34.761] Commit sent: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 state=Backup, RequestReceived, SignatureSent, CommitSent
[12:54:34.763] OnPrepareResponseReceived: height=27 view=0 index=1
[12:54:34.770] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=3
[12:54:34.779] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=1
[12:54:34.782] relay block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.787] persist block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.789] initialize: height=28 view=0 index=0 role=Primary

==> node3/neo-cli/Logs/2018-10-23.log <==
[12:54:19.748] initialize: height=27 view=0 index=3 role=Primary
[12:54:34.751] timeout: height=27 view=0 state=Primary
[12:54:34.751] send prepare request: height=27 view=0
[12:54:34.759] OnPrepareResponseReceived: height=27 view=0 index=0
[12:54:34.764] OnPrepareResponseReceived: height=27 view=0 index=1
[12:54:34.767] Commit sent: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 state=Primary, RequestSent, CommitSent
[12:54:34.768] OnPrepareResponseReceived: height=27 view=0 index=2
[12:54:34.770] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=0
[12:54:34.787] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=1
[12:54:34.791] relay block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.796] persist block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.797] initialize: height=28 view=0 index=3 role=Backup

==> node4/neo-cli/Logs/2018-10-23.log <==
[12:54:19.755] initialize: height=27 view=0 index=1 role=Backup
[12:54:34.755] OnPrepareRequestReceived: height=27 view=0 index=3 tx=1
[12:54:34.756] send prepare response
[12:54:34.775] OnPrepareResponseReceived: height=27 view=0 index=2
[12:54:34.776] Commit sent: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 state=Backup, RequestReceived, SignatureSent, CommitSent
[12:54:34.777] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=3
[12:54:34.780] OnPrepareResponseReceived: height=27 view=0 index=0
[12:54:34.789] OnPrepareResponseReceived: height=27 view=0 index=0
[12:54:34.790] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=0
[12:54:34.794] relay block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.794] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=0
[12:54:34.802] persist block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.803] initialize: height=28 view=0 index=1 role=Backup

Now, we have to decide if is better to lock change view on commit phase.
There are two possible attacks if someone control two CN (and has the ability to arbitrary distribute consensus messages to each CN).

  1. Without lock can produce a fork. Stage 3 of dBFT (Commit) #320 (comment)
  2. With lock can stop block generation. Stage 3 of dBFT (Commit) #320 (comment)

Also, as @vncoelho said, partial signature is important here.

@vncoelho
Copy link
Member

vncoelho commented Oct 23, 2018

Hey, @belane,
Thanks for mentioning and congrats to You and Shargonn.

Jaime, I prefer 2. instead of 1.
In particular, because that "stop block generation" is natural of the dBFT if f nodes are MN. Otherwise, the stop would just be a delay until everyone enter commit phase (this could be reinforced by a re-sending mechanism:

  • which should make changeview of some nodes return in some cases (decrease until the point where agreement was previously reached).
  • I mean, in some cases, nodes in a higher changeview would return to the view in which a node is locked because the node would see that an agreement was already obtained, thus could be verified when the node in a Loop on the Commit resent his state with more/equal than M valid context.SignaturesPartial)

aheuaheuahuea

Let's evaluate and think about this with careful, maybe there also can be a problem...Crazy and Smart minds everywhere, we never know where they gonna find a possible entrance 📦

@vncoelho
Copy link
Member

vncoelho commented Oct 23, 2018

But another discussion is: merge the critical part and later think about this or try to add this additional important change now?
I believe that if we do not do it now we could all be lazy and delay this notorious innovative and important feature for the Neo Blockchain.

@shargon
Copy link
Member Author

shargon commented Oct 24, 2018

In 5f3e99a and f0c2ada i locked the view change on the timeout, and for every timeout, the CN resend his signature.

What do you think?

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shargon, fast as a bolt!

The first commit looks precise to me.
The second one (f0c2ada) I do not know, because I thought that the CheckExpectedView was correctly implemented for blocking the ChangeView of those nodes with commit already sent. I think that you do not need to re-send here, just keep that same return for blocking changeview.

Now, the challenges are:

  • Create a mechanism for accepting the incoming re-send (at OnPrepareResponse). Perhaps, including an exception here:
    if (message.ViewNumber != context.ViewNumber && message.Type != ConsensusMessageType.ChangeView)
  • Change the complete signature at line
    context.Signatures[context.MyIndex] = context.MakeHeader().Sign(context.KeyPair);
    . Maybe we do not need to change the one of the Primary. Perhaps, for the Primary we could send both together:
context.Signatures[context.MyIndex] = context.MakeHeader().Sign(context.KeyPair);
context.SignaturesPartial[context.MyIndex] = context.SOMETHINGTOTHINGABOUT.Sign(context.KeyPair);

// If signature was sent, we send again

SignAndRelay(context.MakePrepareResponse(context.Signatures[context.MyIndex]));
CheckSignatures();
Copy link
Member

@vncoelho vncoelho Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe remove this CheckSignatures here because the node is just re-sending. It should keep Checking at OnPrepareResponseReceived which also filters messages from nodes with wrong view and etc...

@igormcoelho
Copy link
Contributor

Still some repeated logs.. we must see that:

[12:54:34.780] OnPrepareResponseReceived: height=27 view=0 index=0
[12:54:34.789] OnPrepareResponseReceived: height=27 view=0 index=0
[12:54:34.790] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=0
[12:54:34.794] relay block: 0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92
[12:54:34.794] OnCommitAgreement: height=27 hash=0x12313680c434188bdc316d641958694efff7b0f656a9ec4b84880ae89ceb6e92 view=0 index=0

@@ -13,5 +13,6 @@ internal enum ConsensusState : byte
SignatureSent = 0x10,
BlockSent = 0x20,
ViewChanging = 0x40,
CommitSent = 0x80,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct 0x80.


private void OnCommitAgreement(ConsensusPayload payload, CommitAgreement message)
{
Log($"{nameof(OnCommitAgreement)}: height={payload.BlockIndex} hash={message.BlockHash.ToString()} view={message.ViewNumber} index={payload.ValidatorIndex}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this message Log($"{nameof(OnCommitAgreement)}: ... after the if? So it won't be repeated.
Please, do that to Log($"{nameof(OnPrepareResponseReceived)}: ... too, because it's also repeated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shargon This is why messages are repeated, easy to fix.

@shargon
Copy link
Member Author

shargon commented Oct 24, 2018

@igormcoelho try again with 14a31e6 please :)

@vncoelho i removed the CheckSignatures like you said on 6b23061

@@ -233,15 +275,22 @@ private void OnPersistCompleted(Block block)

private void OnPrepareRequestReceived(ConsensusPayload payload, PrepareRequest message)
{
if (context.State.HasFlag(ConsensusState.RequestReceived))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good move! Avoiding this repeated message.

if (context.State.HasFlag(ConsensusState.BlockSent)) return;
if (context.Signatures[payload.ValidatorIndex] != null) return;

Log($"{nameof(OnPrepareResponseReceived)}: height={payload.BlockIndex} view={message.ViewNumber} index={payload.ValidatorIndex}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good!

Copy link
Member

@vncoelho vncoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to filter before RequestView

@@ -360,6 +411,20 @@ public static Props Props(NeoSystem system, Wallet wallet)

private void RequestChangeView()
{
if (context.State.HasFlag(ConsensusState.CommitSent))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this and filter before, @shargon.

else if ((context.State.HasFlag(ConsensusState.Primary) && context.State.HasFlag(ConsensusState.RequestSent)) || (context.State.HasFlag(ConsensusState.Backup) && !context.State.HasFlag(ConsensusState.CommitSent))    )

at line

else if ((context.State.HasFlag(ConsensusState.Primary) && context.State.HasFlag(ConsensusState.RequestSent)) || context.State.HasFlag(ConsensusState.Backup))

{
// If signature was sent, we send again

SignAndRelay(context.MakePrepareResponse(context.Signatures[context.MyIndex]));
Copy link
Contributor

@igormcoelho igormcoelho Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you need to Relay your CommitAgreement again too. Not sure. Or is it implicit by PrepareResponse message?

@igormcoelho
Copy link
Contributor

Shargon, it's looking good for me! I'll try to simulate forking scenarios again, to make sure it's 100%... but congratulations in advance ;)

{
// If signature was sent, we send again

SignAndRelay(context.MakePrepareResponse(context.Signatures[context.MyIndex]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shargon ,if a node (in commit phase) receives change view request, and re-send prepareResponse. From current code, the node (who have sent the changeView request ) will not accept prepareResponse because the viewNumber does not match. We may need to find a way to accept the re-send prepareResponse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is just for dropped packets (network errors), maybe there are a disconnection, and this packet never arrive to your CN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Imagine one CN(A) sent ChangeView request to other CN if it did not collect enough PrepareResponse on time because the network error. Then A increases its view number. But when other CN(B) receives the ChangeView message, it resends prepareResponse because B is in commit phase now.

In this situation, even if A re-connect with B, A will not accept B's resent prepareResponse message because view numbers do not match anymore. But we need to let A accepts this message and move on to commit phase as B, right ?

if (context.State.HasFlag(ConsensusState.BlockSent) ||
!context.TryToCommit(payload, message)) return;

Log($"{nameof(OnCommitAgreement)}: height={payload.BlockIndex} hash={message.BlockHash.ToString()} view={message.ViewNumber} index={payload.ValidatorIndex}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move log() before if()? From the log, it looks like CN send out block when it receives only one commitAgreement message. This is very confusing.

commit
commit-2

@shargon
Copy link
Member Author

shargon commented Nov 21, 2018

I close this for focus on #426

@shargon shargon closed this Nov 21, 2018
@erikzhang
Copy link
Member

I would prefer a smaller pr instead of a big pr with many changes.

@erikzhang erikzhang reopened this Nov 21, 2018
@vncoelho
Copy link
Member

vncoelho commented Nov 21, 2018

Hey @erikzhang,
You are right about smaller changes. However, this is a special case and occasion.

I think that NGD is already testing the other PR and we also have been testing it for a couple of days.

  • Most of the changes there are quite solid and required minor redesigns;
  • However, since changes require split of Full Signature before the commit phase (as we discussed here and in previous conversation) we needed to make that adjustments;
  • The new itself design proportionate new things that would be hard to implement separately because are all interlaced;
  • Currently, all changes are quite documented in the code and we gonna clean after everyone agrees in every little aspect;
  • There are just minor improvements that could be separated and it would be hard to decide if they are improvements and not lack of efficiency in implementing the new design;
  • I do not believe it is really 100% correct but we are finally reaching close to it. In the beginning that PR was kind of an exercise. It is already something that contains the contributions of many of us, because every change was previous discussed and detected in group in our previous Issues and PR's.

The PR is a sum of several contributions that all of us have been discussing in the past months.
I think that we should really focus there.

@vncoelho vncoelho mentioned this pull request Jan 14, 2019
@erikzhang erikzhang closed this Jan 24, 2019
@shargon shargon deleted the commit-phase branch January 24, 2019 16:27
@erikzhang erikzhang added this to the NEO 3.0 milestone Jan 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Critical Issues (bugs) that need to be fixed ASAP Enhancement Type - Changes that may affect performance, usability or add new features to existing modules.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants