Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft learner issue #7

Closed
WIZARD-CXY opened this issue Mar 21, 2019 · 6 comments
Closed

raft learner issue #7

WIZARD-CXY opened this issue Mar 21, 2019 · 6 comments

Comments

@WIZARD-CXY
Copy link

WIZARD-CXY commented Mar 21, 2019

code: learner branch
os: Mac

reproduce procedure:
1 start one main node with default setting, like
etcd

2 use
bin/etcdctl member add infra2 --peer-urls=127.0.0.1:23800 --learner
to add a learner node

3 start the learner node, like


export ETCD_NAME="infra2"

export ETCD_INITIAL_CLUSTER="default=http://localhost:2380,infra2=http://127.0.0.1:23800"

export ETCD_INITIAL_ADVERTISE_PEER_URLS="http://127.0.0.1:23800"

export ETCD_INITIAL_CLUSTER_STATE="existing"

bin/etcd --listen-client-urls 'http://localhost:23790' --advertise-client-urls 'http://localhost:23790' --listen-peer-urls 'http://localhost:23800'

4 Then send a read request to the learner node after a while(wait to receive the snapshot). The result is as below:


➜  etcd git:(promote) ✗ bin/etcdctl --endpoints=127.0.0.1:23790 get "" --prefix
{"level":"warn","ts":"2019-03-21T15:33:55.183+0800","caller":"clientv3/retry_interceptor.go:60","msg":"retrying of unary invoker failed","target":"endpoint://client-480d3208-67b0-4bd5-a309-f649baf90e89/127.0.0.1:23790","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: context deadline exceeded

Not quite as the desgin doc said, the learner node can serve read request.

The main server log


2019-03-21 15:32:15.157285 W | etcdserver: server is likely overloaded
2019-03-21 15:32:23.099274 W | etcdserver: ignored out-of-date read index response; local node read indexes queueing up and waiting to be in sync with leader (request ID want 7587837078940101644, got 15930192439078804740)
2019-03-21 15:33:51.185246 W | etcdserver: timed out sending read state
2019-03-21 15:33:51.185342 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 842.440628ms)

the learner node log

2019-03-21 15:33:57.188397 W | etcdserver: timed out waiting for read index response (local node might have slow network)
2019-03-21 15:34:02.526932 W | etcdserver: read-only range request "key:\"\\000\" range_end:\"\\000\" " with result "error:context deadline exceeded" took too long (4.999960974s) to execute
@WIZARD-CXY
Copy link
Author

@jingyih

@jingyih
Copy link
Owner

jingyih commented Mar 21, 2019

We do plan to have learner member serve ONLY serializable read and endpoint status. But currently such limitation is not implemented yet. So I would expect etcdctl --endpoints=127.0.0.1:23790 get to work. Will take a closer look tomorrow.

@jingyih
Copy link
Owner

jingyih commented Mar 21, 2019

I was able to reproduce this. From learner node's log, it looks like learner is not able to get read index from leader, which is part of serving linearizable read. I will take a closer look.

2019-03-21 11:24:47.439000 W | etcdserver: read-only range request "key:\"foo\" " with result "error:context canceled" took too long (4.999190886s) to execute
2019-03-21 11:24:49.439917 W | etcdserver: timed out waiting for read index response (local node might have slow network)

@jingyih
Copy link
Owner

jingyih commented Mar 22, 2019

Looks like we found a bug. When leader receives a readIndex request, it assumes the request is from local node (leader itself) if quorum == 1. This is fine if there is no learner. But in case there is learner, the leader's assumption is wrong and therefore will not send readIndex response to learner. I will send a fix to etcd-io repo. But this bug is not blocking our learner implementation.

etcd/raft/raft.go

Lines 1025 to 1051 in a1408c5

case pb.MsgReadIndex:
if r.quorum() > 1 {
if r.raftLog.zeroTermOnErrCompacted(r.raftLog.term(r.raftLog.committed)) != r.Term {
// Reject read only request when this leader has not committed any log entry at its term.
return nil
}
// thinking: use an interally defined context instead of the user given context.
// We can express this in terms of the term and index instead of a user-supplied value.
// This would allow multiple reads to piggyback on the same message.
switch r.readOnly.option {
case ReadOnlySafe:
r.readOnly.addRequest(r.raftLog.committed, m)
r.bcastHeartbeatWithCtx(m.Entries[0].Data)
case ReadOnlyLeaseBased:
ri := r.raftLog.committed
if m.From == None || m.From == r.id { // from local member
r.readStates = append(r.readStates, ReadState{Index: r.raftLog.committed, RequestCtx: m.Entries[0].Data})
} else {
r.send(pb.Message{To: m.From, Type: pb.MsgReadIndexResp, Index: ri, Entries: m.Entries})
}
}
} else {
r.readStates = append(r.readStates, ReadState{Index: r.raftLog.committed, RequestCtx: m.Entries[0].Data})
}
return nil

@jingyih
Copy link
Owner

jingyih commented Mar 29, 2019

Fixed by etcd-io#10590.

@jingyih jingyih closed this as completed Mar 29, 2019
@WIZARD-CXY
Copy link
Author

great

jingyih pushed a commit that referenced this issue Nov 27, 2019
etcdserver: add v3 request type for cluster attr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants