etcd 3.5.3panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost? #15699

Tejaswini5327 · 2023-04-11T09:49:30Z

What happened?

We have deployment of 3 pods , which is running from past 90 days, after 90 days the pod-0 suddenly started restarting. We applied work around like deleting the pod-0 member id, deletion of pod-0 ,then deleting the wal file of pod-0 and After this WA, the error in pod-0 log we found as below:

{"level":"info","ts":"2023-03-06T13:07:12.235-0500","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"1c192aed65dd7938","remote-peer-id":"f92d0a91a53265ea"}
panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost?

goroutine 77 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000884000, 0x0, 0x0, 0x0)
/go/pkg/mod/go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x58d
go.uber.org/zap.(*SugaredLogger).log(0xc000b120b8, 0x9696db4a832504, 0x124ecb9, 0x5d, 0xc0025a4080, 0x2, 0x2, 0x0, 0x0, 0x0)
/go/pkg/mod/go.uber.org/zap@v1.17.0/sugar.go:227 +0x111
go.uber.org/zap.(*SugaredLogger).Panicf(...)
/go/pkg/mod/go.uber.org/zap@v1.17.0/sugar.go:159
go.etcd.io/etcd/server/v3/etcdserver.(*zapRaftLogger).Panicf(0xc00003c110, 0x124ecb9, 0x5d, 0xc0025a4080, 0x2, 0x2)
/go/src/go.etcd.io/etcd/release/etcd/server/etcdserver/zap_raft.go:101 +0x7d
go.etcd.io/etcd/raft/v3.(*raftLog).commitTo(0xc000802c40, 0x8f5b8)
/go/src/go.etcd.io/etcd/release/etcd/raft/log.go:237 +0x135
go.etcd.io/etcd/raft/v3.(*raft).handleHeartbeat(0xc0004be2c0, 0x8, 0x1c192aed65dd7938, 0x9f7835abc7c54201, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/go.etcd.io/etcd/release/etcd/raft/raft.go:1508 +0x54
go.etcd.io/etcd/raft/v3.stepFollower(0xc0004be2c0, 0x8, 0x1c192aed65dd7938, 0x9f7835abc7c54201, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/go.etcd.io/etcd/release/etcd/raft/raft.go:1434 +0x478
go.etcd.io/etcd/raft/v3.(*raft).Step(0xc0004be2c0, 0x8, 0x1c192aed65dd7938, 0x9f7835abc7c54201, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/go.etcd.io/etcd/release/etcd/raft/raft.go:975 +0xa55
go.etcd.io/etcd/raft/v3.(*node).run(0xc00081b080)
/go/src/go.etcd.io/etcd/release/etcd/raft/node.go:356 +0x798
created by go.etcd.io/etcd/raft/v3.RestartNode
/go/src/go.etcd.io/etcd/release/etcd/raft/node.go:244 +0x330

What did you expect to happen?

Pod should run without restarting and without any panic: tocommit(587192) is out of range [lastIndex(587189)] error.

How can we reproduce it (as minimally and precisely as possible)?

Deleted the member id of pod-0
Delete the pod-0
3.Delete the wal file of pod-0
After the above , panic: tocommit(587192) is out of range [lastIndex(587189)] error came in pod-0 logs.

Anything else we need to know?

We have found similar issue #13509 , the WA/solution didnt work for us.

Etcd version (please run commands below)

bash-4.4$ etcd --version
etcd Version: 3.5.3
Git SHA: 0452fee
Go Version: go1.16.15
Go OS/Arch: linux/amd64
bash-4.4$ etcdctl version
etcdctl version: 3.5.3
API version: 3.5

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

{"level":"info","ts":"2023-03-06T13:07:12.235-0500","caller":"rafthttp/stream.go:274","msg":"established TCP streaming connection with remote peer","stream-writer-type":"stream Message","local-member-id":"1c192aed65dd7938","remote-peer-id":"f92d0a91a53265ea"}
panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost?

goroutine 77 [running]:
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000884000, 0x0, 0x0, 0x0)
/go/pkg/mod/go.uber.org/zap@v1.17.0/zapcore/entry.go:234 +0x58d
go.uber.org/zap.(*SugaredLogger).log(0xc000b120b8, 0x9696db4a832504, 0x124ecb9, 0x5d, 0xc0025a4080, 0x2, 0x2, 0x0, 0x0, 0x0)
/go/pkg/mod/go.uber.org/zap@v1.17.0/sugar.go:227 +0x111
go.uber.org/zap.(*SugaredLogger).Panicf(...)
/go/pkg/mod/go.uber.org/zap@v1.17.0/sugar.go:159
go.etcd.io/etcd/server/v3/etcdserver.(*zapRaftLogger).Panicf(0xc00003c110, 0x124ecb9, 0x5d, 0xc0025a4080, 0x2, 0x2)
/go/src/go.etcd.io/etcd/release/etcd/server/etcdserver/zap_raft.go:101 +0x7d
go.etcd.io/etcd/raft/v3.(*raftLog).commitTo(0xc000802c40, 0x8f5b8)
/go/src/go.etcd.io/etcd/release/etcd/raft/log.go:237 +0x135
go.etcd.io/etcd/raft/v3.(*raft).handleHeartbeat(0xc0004be2c0, 0x8, 0x1c192aed65dd7938, 0x9f7835abc7c54201, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/go.etcd.io/etcd/release/etcd/raft/raft.go:1508 +0x54
go.etcd.io/etcd/raft/v3.stepFollower(0xc0004be2c0, 0x8, 0x1c192aed65dd7938, 0x9f7835abc7c54201, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/go.etcd.io/etcd/release/etcd/raft/raft.go:1434 +0x478
go.etcd.io/etcd/raft/v3.(*raft).Step(0xc0004be2c0, 0x8, 0x1c192aed65dd7938, 0x9f7835abc7c54201, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/go.etcd.io/etcd/release/etcd/raft/raft.go:975 +0xa55
go.etcd.io/etcd/raft/v3.(*node).run(0xc00081b080)
/go/src/go.etcd.io/etcd/release/etcd/raft/node.go:356 +0x798
created by go.etcd.io/etcd/raft/v3.RestartNode
/go/src/go.etcd.io/etcd/release/etcd/raft/node.go:244 +0x330

The text was updated successfully, but these errors were encountered:

serathius · 2023-04-11T10:06:28Z

You deleted files from database directory and expect it to run? It doesn't work like that.

Using v3.5.3 etcd version is super duper not recommended. https://groups.google.com/g/etcd-dev/c/8S7u6NqW6C4/m/_uy9Dv7XBwAJ

jmhbnz · 2023-04-22T20:00:47Z

Hey @Tejaswini5327 - Please refer to our etcd operations guide for guidance on disaster recovery for failing members: https://etcd.io/docs/v3.5/op-guide/recovery

Essentially you can use the etcdctl snapshot functionality from one of your two live members to then restore the third failing member with etcdutl snapshot restore.

I'm going to close this issue as I don't believe there is an etcd bug here, and the etcd operations guide I've linked should assist you to resolve the problematic member.

You're welcome to reply below with new information if you would like this to be re-opened. If you do run into any issues with the operations guide I linked, please let us know by raising an issue on the etcd-io/website repository.

Tejaswini5327 added the type/bug label Apr 11, 2023

serathius added type/support and removed type/bug labels Apr 11, 2023

jmhbnz closed this as completed Apr 22, 2023

qixiaoyang0 mentioned this issue Jul 11, 2023

tocommit(3730) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost #16220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd 3.5.3panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost? #15699

etcd 3.5.3panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost? #15699

Tejaswini5327 commented Apr 11, 2023 •

edited by serathius

Loading

paste your configuration here

serathius commented Apr 11, 2023

jmhbnz commented Apr 22, 2023

etcd 3.5.3panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost? #15699

etcd 3.5.3panic: tocommit(587192) is out of range [lastIndex(587189)]. Was the raft log corrupted, truncated, or lost? #15699

Comments

Tejaswini5327 commented Apr 11, 2023 • edited by serathius Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

serathius commented Apr 11, 2023

jmhbnz commented Apr 22, 2023

Tejaswini5327 commented Apr 11, 2023 •

edited by serathius

Loading