Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node with smaller term can get stuck in election when PreVote is true #8243

Closed
distributed-sys opened this issue Jul 11, 2017 · 2 comments
Closed
Assignees

Comments

@distributed-sys
Copy link

Hi,

When playing with the etcd Raft library, I noticed the following issue when PreVote is enabled.

For a three nodes Raft cluster with node A, B and C, let's say C is isolated from both A and B due to network partition, so it is expected to keep the initial Term value 1. A and B had a few elections and thus moved their terms to a higher value, say 5. Let's also assume that B is the current leader and A is a follower.

Now if C is recovered from the network partition and the leader B crash at roughly the same time, we end up having A at term 5 and C at term 1. When they start to do preVote, C's MsgPreVote and MsgPreVoteResp messages will carry term values lower than A's term, they will be both ignored by A, code here. For C, it is expected to receive MsgPreVote from A carrying a higher term value, but C's term will not change in response to a MsgPreVote, see comments here. There is currently no leader, so no MsgHeartbeat message in the system.

It seems to me that the cluster thus get stuck as no node can complete the PreVote phase. @bdarnell and @xiang90, could you please have a look and shed some light on it? Did I miss something? Many thanks!

I hacked a quick test for what is described above.

func TestNodeWithSmallerTermCanCompleteElection(t *testing.T) {
  a := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
  b := newTestRaft(2, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
  c := newTestRaft(3, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())

  a.becomeFollower(1, None)
  b.becomeFollower(1, None)
  c.becomeFollower(1, None)

  a.preVote = true
  b.preVote = true
  c.preVote = true

  // cause a network partition to isolate node 3
  nt := newNetwork(a, b, c)
  nt.cut(1, 3)
  nt.cut(2, 3)

  // start a few elections to bump the term values
  nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
  nt.send(pb.Message{From: 3, To: 3, Type: pb.MsgHup})

  sm := nt.peers[1].(*raft)
  if sm.state != StateLeader {
    t.Errorf("peer 1 state: %s, want %s", sm.state, StateLeader)
  }

  sm = nt.peers[2].(*raft)
  if sm.state != StateFollower {
    t.Errorf("peer 2 state: %s, want %s", sm.state, StateFollower)
  }
  sm = nt.peers[3].(*raft)
  if sm.state != StatePreCandidate {
    t.Errorf("peer 3 state: %s, want %s", sm.state, StatePreCandidate)
  }

  nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgHup})
  nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
  nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgHup})
  nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("some data")}}})

  // check whether the term values are expected
  // a.Term == 5
  // b.Term == 5
  // c.Term == 1
  sm = nt.peers[1].(*raft)
  if sm.Term != 5 {
    t.Errorf("peer 1 term: %d, want %d", sm.Term, 5)
  }

  sm = nt.peers[2].(*raft)
  if sm.Term != 5 {
    t.Errorf("peer 2 term: %d, want %d", sm.Term, 5)
  }

  sm = nt.peers[3].(*raft)
  if sm.Term != 1 {
    t.Errorf("peer 3 term: %d, want %d", sm.Term, 1)
  }

  // check state
  // a == follower
  // b == leader
  // c == pre-candidate
  sm = nt.peers[1].(*raft)
  if sm.state != StateFollower {
    t.Errorf("peer 1 state: %s, want %s", sm.state, StateFollower)
  }
  sm = nt.peers[2].(*raft)
  if sm.state != StateLeader {
    t.Errorf("peer 2 state: %s, want %s", sm.state, StateLeader)
  }
  sm = nt.peers[3].(*raft)
  if sm.state != StatePreCandidate {
    t.Errorf("peer 3 state: %s, want %s", sm.state, StatePreCandidate)
  }

  sm.logger.Infof("going to bring back peer 3 and kill peer 2")
  // recover the network then immediately isolate b which is currently
  // the leader, this is to emulate the crash of b.
  nt.recover()
  nt.cut(2, 1)
  nt.cut(2, 3)

  // call for election
  nt.send(pb.Message{From: 3, To: 3, Type: pb.MsgHup})
  nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})

  // do we have a leader
  sma := nt.peers[1].(*raft)
  smb := nt.peers[3].(*raft)
  if sma.state != StateLeader && smb.state != StateLeader {
    t.Errorf("no leader")
  }
}
@bdarnell
Copy link
Contributor

We don't currently use PreVote in CockroachDB - we tried it briefly and ran into problems, which were probably similar to this one. Thank you for the test, this should help us get to the bottom of this problem quickly. CC @irfansharif, who just started looking into this.

@xiang90
Copy link
Contributor

xiang90 commented Jul 14, 2017

i will take a look over the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants