Node with smaller term can get stuck in election when PreVote is true #8243

distributed-sys · 2017-07-11T16:03:14Z

Hi,

When playing with the etcd Raft library, I noticed the following issue when PreVote is enabled.

For a three nodes Raft cluster with node A, B and C, let's say C is isolated from both A and B due to network partition, so it is expected to keep the initial Term value 1. A and B had a few elections and thus moved their terms to a higher value, say 5. Let's also assume that B is the current leader and A is a follower.

Now if C is recovered from the network partition and the leader B crash at roughly the same time, we end up having A at term 5 and C at term 1. When they start to do preVote, C's MsgPreVote and MsgPreVoteResp messages will carry term values lower than A's term, they will be both ignored by A, code here. For C, it is expected to receive MsgPreVote from A carrying a higher term value, but C's term will not change in response to a MsgPreVote, see comments here. There is currently no leader, so no MsgHeartbeat message in the system.

It seems to me that the cluster thus get stuck as no node can complete the PreVote phase. @bdarnell and @xiang90, could you please have a look and shed some light on it? Did I miss something? Many thanks!

I hacked a quick test for what is described above.

func TestNodeWithSmallerTermCanCompleteElection(t *testing.T) {
  a := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
  b := newTestRaft(2, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
  c := newTestRaft(3, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())

  a.becomeFollower(1, None)
  b.becomeFollower(1, None)
  c.becomeFollower(1, None)

  a.preVote = true
  b.preVote = true
  c.preVote = true

  // cause a network partition to isolate node 3
  nt := newNetwork(a, b, c)
  nt.cut(1, 3)
  nt.cut(2, 3)

  // start a few elections to bump the term values
  nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
  nt.send(pb.Message{From: 3, To: 3, Type: pb.MsgHup})

  sm := nt.peers[1].(*raft)
  if sm.state != StateLeader {
    t.Errorf("peer 1 state: %s, want %s", sm.state, StateLeader)
  }

  sm = nt.peers[2].(*raft)
  if sm.state != StateFollower {
    t.Errorf("peer 2 state: %s, want %s", sm.state, StateFollower)
  }
  sm = nt.peers[3].(*raft)
  if sm.state != StatePreCandidate {
    t.Errorf("peer 3 state: %s, want %s", sm.state, StatePreCandidate)
  }

  nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgHup})
  nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
  nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgHup})
  nt.send(pb.Message{From: 2, To: 2, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("some data")}}})

  // check whether the term values are expected
  // a.Term == 5
  // b.Term == 5
  // c.Term == 1
  sm = nt.peers[1].(*raft)
  if sm.Term != 5 {
    t.Errorf("peer 1 term: %d, want %d", sm.Term, 5)
  }

  sm = nt.peers[2].(*raft)
  if sm.Term != 5 {
    t.Errorf("peer 2 term: %d, want %d", sm.Term, 5)
  }

  sm = nt.peers[3].(*raft)
  if sm.Term != 1 {
    t.Errorf("peer 3 term: %d, want %d", sm.Term, 1)
  }

  // check state
  // a == follower
  // b == leader
  // c == pre-candidate
  sm = nt.peers[1].(*raft)
  if sm.state != StateFollower {
    t.Errorf("peer 1 state: %s, want %s", sm.state, StateFollower)
  }
  sm = nt.peers[2].(*raft)
  if sm.state != StateLeader {
    t.Errorf("peer 2 state: %s, want %s", sm.state, StateLeader)
  }
  sm = nt.peers[3].(*raft)
  if sm.state != StatePreCandidate {
    t.Errorf("peer 3 state: %s, want %s", sm.state, StatePreCandidate)
  }

  sm.logger.Infof("going to bring back peer 3 and kill peer 2")
  // recover the network then immediately isolate b which is currently
  // the leader, this is to emulate the crash of b.
  nt.recover()
  nt.cut(2, 1)
  nt.cut(2, 3)

  // call for election
  nt.send(pb.Message{From: 3, To: 3, Type: pb.MsgHup})
  nt.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})

  // do we have a leader
  sma := nt.peers[1].(*raft)
  smb := nt.peers[3].(*raft)
  if sma.state != StateLeader && smb.state != StateLeader {
    t.Errorf("no leader")
  }
}

The text was updated successfully, but these errors were encountered:

bdarnell · 2017-07-11T17:25:22Z

We don't currently use PreVote in CockroachDB - we tried it briefly and ran into problems, which were probably similar to this one. Thank you for the test, this should help us get to the bottom of this problem quickly. CC @irfansharif, who just started looking into this.

xiang90 · 2017-07-14T19:59:05Z

i will take a look over the weekend.

xiang90 added the area/raft label Jul 11, 2017

irfansharif mentioned this issue Jul 17, 2017

storage: re-enable Raft PreVote RPC cockroachdb/cockroach#16950

Closed

heyitsanthony assigned xiang90 Jul 18, 2017

irfansharif mentioned this issue Jul 20, 2017

raft: introduce/fix TestNodeWithSmallerTermCanCompleteElection #8288

Merged

xiang90 closed this as completed in #8288 Jul 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node with smaller term can get stuck in election when PreVote is true #8243

Node with smaller term can get stuck in election when PreVote is true #8243

distributed-sys commented Jul 11, 2017

bdarnell commented Jul 11, 2017

xiang90 commented Jul 14, 2017

Node with smaller term can get stuck in election when PreVote is true #8243

Node with smaller term can get stuck in election when PreVote is true #8243

Comments

distributed-sys commented Jul 11, 2017

bdarnell commented Jul 11, 2017

xiang90 commented Jul 14, 2017