Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: adjust election timeout on restart #9415

Merged
merged 3 commits into from
Mar 11, 2018

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Mar 9, 2018

Address #9333 with simpler logic.

Single-node restart with no snapshot does not need special handling, because itself will be elected as leader, by the time peer connection report wait times out.


Fresh start 1-node cluster

15:45:34.940350 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
15:45:34.940911 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
15:45:35.233465 I | raft: 8e9e05c52164694d is starting a new election at term 1
15:45:35.233525 I | raft: 8e9e05c52164694d became candidate at term 2
15:45:35.233559 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
15:45:35.233596 I | raft: 8e9e05c52164694d became leader at term 2


Restart 1-node cluster from snapshot

15:47:29.182917 I | etcdserver: recovered store from snapshot at index 8
15:47:29.194451 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
15:47:29.487854 I | raft: 8e9e05c52164694d is starting a new election at term 2
15:47:29.487939 I | raft: 8e9e05c52164694d became candidate at term 3
15:47:29.487985 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
15:47:29.488032 I | raft: 8e9e05c52164694d became leader at term 3
15:47:29.488062 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3


Restart 1-node with no snapshot

15:49:43.412234 I | raft: 8e9e05c52164694d is starting a new election at term 2
15:49:43.412299 I | raft: 8e9e05c52164694d became candidate at term 3
15:49:43.412339 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
15:49:43.412380 I | raft: 8e9e05c52164694d became leader at term 3
15:49:43.412407 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3
15:49:47.020895 I | etcdserver: 8e9e05c52164694d waited 5s but no active peer found (or restarted 1-node cluster); currently, 1 member(s)

Leader gets elected while waiting for peer connection report timeouts, so no side-effect.


Fresh start 3-node

node A:

15:53:47.895306 I | etcdserver: 7339c4e5e833c029 waited 5s but no active peer found (or restarted 1-node cluster); currently, 3 member(s)
15:53:48.690716 I | raft: 7339c4e5e833c029 is starting a new election at term 4
15:53:49.991580 I | raft: 7339c4e5e833c029 became leader at term 6

node B:

15:54:02.194297 I | etcdserver: b548c2511513015 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
15:54:02.197587 I | raft: b548c2511513015 [term: 1] received a MsgHeartbeat message with higher term from 7339c4e5e833c029 [term: 6]

No side-effect.


Rejoining to 3-node cluster with snapshot

16:01:12.800882 I | rafthttp: peer 729934363faa4a24 became active
16:01:12.800894 I | etcdserver: 7339c4e5e833c029 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
16:01:12.857434 I | raft: raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 7

Peer connection is notified and advance with adjusted ticks.
Previously, it advanced 9 ticks with only one tick left. Now, advances 8 ticks.


Rejoining to 3-node cluster with no snapshot

16:05:48.368674 I | etcdserver: 7339c4e5e833c029 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
16:05:48.439695 I | raft: raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 6


/cc @xiang90 @jpbetz

@codecov-io
Copy link

codecov-io commented Mar 9, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@9e84f2d). Click here to learn what that means.
The diff coverage is 97.36%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #9415   +/-   ##
=========================================
  Coverage          ?   72.48%           
=========================================
  Files             ?      362           
  Lines             ?    30827           
  Branches          ?        0           
=========================================
  Hits              ?    22344           
  Misses            ?     6854           
  Partials          ?     1629
Impacted Files Coverage Δ
etcdserver/raft.go 89.47% <100%> (ø)
etcdserver/server.go 79.73% <100%> (ø)
rafthttp/transport.go 83.42% <87.5%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e84f2d...9680b8a. Read the comment docs.

// 1. all connections failed, or
// 2. no active peers, or
// 3. restarted single-node with no snapshot
plog.Infof("%s waited %s but no active peer found (or restarted 1-node cluster); currently, %d member(s)", srv.ID(), rafthttp.ConnReadTimeout, len(cl.Members()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just remove this logging? we only log if we do something special.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. PTAL. Thanks!


if s.once != nil {
s.once.Do(func() {
plog.Infof("notifying of active peer %q", s.id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logging does not seem to be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

// InitialPeerNotify returns a channel that closes when an initial
// peer connection has been established. Use this to wait until the
// first peer connection becomes active.
InitialPeerNotify() <-chan struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... can you try to find another way to do this without introducing a new method to the interface? this interface is too heavy already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Just removed.

// This can be used for fast-forwarding election
// ticks in multi data-center deployments, thus
// speeding up election process.
advanceRaftTicks func(ticks int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be a method on raft related struct.

@@ -527,6 +539,32 @@ func NewServer(cfg ServerConfig) (srv *EtcdServer, err error) {
}
srv.r.transport = tr

srv.goAttach(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this needs to be async? can we start the network first? then wait here? then decide to advance ticks or not. then start the raft routine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if it works like what i described, we do not need the lock to protect the tick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be async, since we start peer handler in embed package, after we create etcd server here.

Addressed others. PTAL.

@@ -32,11 +32,16 @@ type peerStatus struct {
mu sync.Mutex // protect variables below
active bool
since time.Time

once *sync.Once
notify chan struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activeNotify


if s.once != nil {
s.once.Do(func() {
close(s.notify)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably just do

select {
    case s.notify<- struct{}{}:
    default: 
}

we do not need the once struct.

// InitialPeerNotify returns a channel that closes when an initial
// peer connection has been established. Use this to wait until the
// first peer connection becomes active.
func (t *Transport) InitialPeerNotify() <-chan struct{} { return t.initPeerNotifyCh }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... actually, an easy way to solve this problem is to have a for loop to loop over the peer status and check if active is true.

for p := range peers {
    if p.status.isActive() {
        // send to chan
    }
}

// InitialPeerNotify returns a channel that closes when an initial
// peer connection has been established. Use this to wait until the
// first peer connection becomes active.
func (t *Transport) InitialPeerNotify() <-chan struct{} { return t.initPeerNotifyCh }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better:

InitialPeerNotify() -> ActivePeers()

ActivePeers simply calculate how many peers are active. we move the for loop thing to the raft node side in etcdserver pkg to wait for the first active peer.

@gyuho gyuho force-pushed the adjust-advancing-ticks branch 2 times, most recently from 8c4a077 to e42dd2a Compare March 11, 2018 02:36
}
}

// 1. all connections failed, or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this comment to line 544

// retry up to "rafthttp.ConnReadTimeout", which is 5-sec
for i := 0; i < 5; i++ {
select {
case <-time.After(time.Second):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce this to 50ms to be more responsive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define 50ms as wait time

}

// retry up to "rafthttp.ConnReadTimeout", which is 5-sec
for i := 0; i < 5; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define waitTime. then here can be 5second/waittime

@gyuho gyuho force-pushed the adjust-advancing-ticks branch from e42dd2a to 9f356c8 Compare March 11, 2018 02:43
rafthttp/peer.go Outdated
@@ -76,6 +76,9 @@ type Peer interface {
// activeSince returns the time that the connection with the
// peer becomes active.
activeSince() time.Time
// isActive returns true if the connection to this peer
// has been established
isActive() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can reuse activeSince. if it is smaller than current, it it is active, right?

@xiang90
Copy link
Contributor

xiang90 commented Mar 11, 2018

lgtm.

gyuho added 2 commits March 10, 2018 18:50
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
@gyuho gyuho force-pushed the adjust-advancing-ticks branch from 9f356c8 to 33adce4 Compare March 11, 2018 02:50
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
@mborsz
Copy link
Contributor

mborsz commented Mar 26, 2018

When do we expect this fix to be released in 3.1 branch?

@gyuho
Copy link
Contributor Author

gyuho commented Mar 26, 2018

#9485 is merged.
So, we are ready to publish another set of patch releases.

@jpbetz Are you available for 3.1, 3.2 releasing this week?
Let's coordinate at #9411.

Any day this week works for me.

@jpbetz
Copy link
Contributor

jpbetz commented Mar 28, 2018

@gyuho Yes, I'm available. Thursday (tomorrow) work? I'm free Friday as well.

@gyuho
Copy link
Contributor Author

gyuho commented Mar 28, 2018

@jpbetz Tomorrow (Thursday) sounds good. Will ping you when the key is ready. Thanks.

@@ -97,6 +97,7 @@ type raftNode struct {
term uint64
lead uint64

tickMu *sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, any reason for pointer here? My understanding is that pointers are not typically needed for sync.Mutex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to backport this to 3.1 without a pointer since that removes the need to initialize the mutex, which simplifies the backport: https://github.com/coreos/etcd/pull/9500/files#diff-8c6a0ae3bb0763acd9c96a35d89131feR99

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpbetz govet would complain something like this

passes lock by value: github.com/coreos/etcd/clientv3.Client contains sync.Mutex

jpbetz added a commit that referenced this pull request Mar 28, 2018
jpbetz added a commit that referenced this pull request Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants