-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd v3.5.0 with cherry-pick(pull/13348) still resurrects ancient members, it restarts failed with "member count is unequal"... #13697
Comments
The failing node's complete log: {"level":"warn","ts":"2022-02-10T17:37:34.168Z","logger":"raft","caller":"etcdserver/zap_raft.go:85","msg":"ccd3ecb34f88ff13 stepped down to follower since quorum is not active"}
{"level":"info","ts":"2022-02-10T17:37:34.168Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 became follower at term 5"}
{"level":"info","ts":"2022-02-10T17:37:34.168Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: ccd3ecb34f88ff13 lost leader ccd3ecb34f88ff13 at term 5"}
{"level":"warn","ts":"2022-02-10T17:37:34.168Z","caller":"etcdserver/server.go:1032","msg":"update leader as:","lead-id":"0000000000000000"}
{"level":"info","ts":"2022-02-10T17:37:34.278Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 no leader at term 5; dropping index reading msg"}
{"level":"info","ts":"2022-02-10T17:37:34.341Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 no leader at term 5; dropping index reading msg"}
{"level":"info","ts":"2022-02-10T17:37:34.341Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [logterm: 5, index: 5195, vote: ccd3ecb34f88ff13] cast MsgPreVote for ed254075ccc97692 [logterm: 5, index: 5195] at term 5"}
{"level":"info","ts":"2022-02-10T17:37:34.341Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [term: 5] received a MsgVote message with higher term from ed254075ccc97692 [term: 6]"}
{"level":"info","ts":"2022-02-10T17:37:34.341Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 became follower at term 6"}
{"level":"info","ts":"2022-02-10T17:37:34.341Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [logterm: 5, index: 5195, vote: 0] cast MsgVote for ed254075ccc97692 [logterm: 5, index: 5195] at term 6"}
{"level":"info","ts":"2022-02-10T17:37:34.341Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: ccd3ecb34f88ff13 elected leader ed254075ccc97692 at term 6"}
{"level":"warn","ts":"2022-02-10T17:37:34.341Z","caller":"etcdserver/server.go:1032","msg":"update leader as:","lead-id":"ed254075ccc97692"}
{"level":"warn","ts":"2022-02-14T01:42:11.286Z","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream Message","local-member-id":"ccd3ecb34f88ff13","remote-peer-id":"ed254075ccc97692","error":"unexpected EOF"}
{"level":"warn","ts":"2022-02-14T01:42:11.286Z","caller":"rafthttp/peer_status.go:66","msg":"peer became inactive (message send to peer failed)","peer-id":"ed254075ccc97692","error":"failed to read ed254075ccc97692 on stream Message (unexpected EOF)"}
{"level":"warn","ts":"2022-02-14T01:42:11.286Z","caller":"rafthttp/stream.go:421","msg":"lost TCP streaming connection with remote peer","stream-reader-type":"stream MsgApp v2","local-member-id":"ccd3ecb34f88ff13","remote-peer-id":"ed254075ccc97692","error":"unexpected EOF"}
{"level":"warn","ts":"2022-02-14T01:42:11.289Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"9127ba3ed067c444","rtt":"403.783µs","error":"dial tcp 176.228.1.1:2380: connect: connection refused"}
{"level":"warn","ts":"2022-02-14T01:42:11.289Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"9127ba3ed067c444","rtt":"700.936µs","error":"dial tcp 176.228.1.1:2380: connect: connection refused"}
{"level":"warn","ts":"2022-02-14T01:42:12.227Z","caller":"etcdserver/v3_server.go:815","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":18380174024408422142,"retry-timeout":"500ms"}
{"level":"info","ts":"2022-02-14T01:42:12.654Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 is starting a new election at term 6"}
{"level":"info","ts":"2022-02-14T01:42:12.654Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 became pre-candidate at term 6"}
{"level":"info","ts":"2022-02-14T01:42:12.654Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 received MsgPreVoteResp from ccd3ecb34f88ff13 at term 6"}
{"level":"info","ts":"2022-02-14T01:42:12.654Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [logterm: 6, index: 63539] sent MsgPreVote request to 9127ba3ed067c444 at term 6"}
{"level":"info","ts":"2022-02-14T01:42:12.654Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [logterm: 6, index: 63539] sent MsgPreVote request to ed254075ccc97692 at term 6"}
{"level":"info","ts":"2022-02-14T01:42:12.654Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"raft.node: ccd3ecb34f88ff13 lost leader ed254075ccc97692 at term 6"}
{"level":"warn","ts":"2022-02-14T01:42:12.654Z","caller":"etcdserver/server.go:1032","msg":"update leader as:","lead-id":"0000000000000000"}
{"level":"warn","ts":"2022-02-14T01:42:12.655Z","caller":"rafthttp/stream.go:223","msg":"lost TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"ccd3ecb34f88ff13","remote-peer-id":"ed254075ccc97692"}
{"level":"warn","ts":"2022-02-14T01:42:12.727Z","caller":"etcdserver/v3_server.go:815","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":18380174024408422142,"retry-timeout":"500ms"}
{"level":"info","ts":"2022-02-14T01:42:14.053Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 is starting a new election at term 6"}
{"level":"info","ts":"2022-02-14T01:42:14.053Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 became pre-candidate at term 6"}
{"level":"info","ts":"2022-02-14T01:42:14.054Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 received MsgPreVoteResp from ccd3ecb34f88ff13 at term 6"}
{"level":"info","ts":"2022-02-14T01:42:14.054Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [logterm: 6, index: 63539] sent MsgPreVote request to 9127ba3ed067c444 at term 6"}
{"level":"info","ts":"2022-02-14T01:42:14.054Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"ccd3ecb34f88ff13 [logterm: 6, index: 63539] sent MsgPreVote request to ed254075ccc97692 at term 6"}
{"level":"warn","ts":"2022-02-14T01:42:14.229Z","caller":"etcdserver/v3_server.go:815","msg":"waiting for ReadIndex response took too long, retrying","sent-request-id":18380174024408422142,"retry-timeout":"500ms"} |
Please reproduce the problem on official etcd v3.5.2 binaries. |
As an etcd user, I think it's an unselfish act to report a bug. |
I've compared the difference between official v3.5.2 and our enhanced/cherry-picked v3.5.0, which have the cherry-pick of such below commits:
So it seems that there is not any big difference in v3.5.2 related to this issue... |
cc @ptabor |
Close it since it occured only once and there is no further response. |
What happened?
1 etcd node(etcd-node-0) restarts about every 3 seconds, with log "closing etcd server" and "member count is unequal".
We deployed 3 etcd pod in kubernetes. this Monday we performed an rolling update of the etcd pods(only the App above etcd updated, etcd version is both v3.5.0 with fix #13348), but after this update, 1 etcd pod start failed.
Log is like this:
What did you expect to happen?
We expect the 3 etcd pod updated successfully.
How can we reproduce it (as minimally and precisely as possible)?
We just performed a rolling update for the 3 etcd pod. It's the Kubernetes's manner to operate them.
Anything else we need to know?
We've cherry-picked fix into our etcd cluster.
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
节点名称
name: 'etcd0'
data-dir: "/opt/ETCDdata/"
listen-peer-urls: "http://0.0.0.0:2380"
listen-client-urls: "http://0.0.0.0:2379"
initial-advertise-peer-urls: "http://0.0.0.0:2380"
initial-cluster: "etcd0=http://0.0.0.0:2380"
initial-cluster-state: "new"
initial-cluster-token: "etcd-cluster"
advertise-client-urls: "http://0.0.0.0:2379"
auto-compaction-mode: "revision"
auto-compaction-retention: "1000"
strict-reconfig-check: false
pre-vote: true
enable-grpc-gateway: true
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
The text was updated successfully, but these errors were encountered: