ETCD "wal sync: duration" error. #10414

ronakpandya7 · 2019-01-16T13:10:07Z

Hello,

We are using kubernetes version 10.5 and etcd version 3.3.3.
We are facing "wal sync duration" issue in etcd.

Status of our etcd cluster is continiously changing from healthy to unhealthy and from unhealthy to healthy.

member b3b41c7fa80c57 is unhealthy: got unhealthy result from https://test01.example.com:2379
member 68ab94f6d8198965 is healthy: got healthy result from https://test02.example.com:2379
member aff40dc30f6ce09c is healthy: got healthy result from https://test05.example.com:2379
member f647a273c0b05deb is healthy: got healthy result from https://test03.example.com:2379
member fd4aa6527f8e52bd is healthy: got healthy result from https://test04.example.com:2379
cluster is degraded

After some time...

member b3b41c7fa80c57 is healthy: got healthy result from https://test01.example.com:2379
member 68ab94f6d8198965 is healthy: got healthy result from https://test02.example.com:2379
member aff40dc30f6ce09c is healthy: got healthy result from https://test05.example.com:2379
member f647a273c0b05deb is healthy: got healthy result from https://test03.example.com:2379
member fd4aa6527f8e52bd is healthy: got healthy result from https://test04.example.com:2379
cluster is healthy

We checked in etcd service logs and it is as below.

etcd logs:

Jan 16 12:47:00 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:47:00.350963 W | wal: sync duration of 2.657129557s, expected less than 1s
Jan 16 12:47:32 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:47:32.413980 W | etcdserver: read-only range request "key:\"/registry/networkpolicies\" range_end:\"/registry/networkpoliciet\" count_only:true " took too long (850.292449ms) to execute
Jan 16 12:47:32 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:47:32.414472 W | etcdserver: read-only range request "key:\"/registry/statefulsets/ns-du/pod-nginx1-test\" " took too long (352.115645ms) to execute
Jan 16 12:49:15 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:49:15.863276 W | wal: sync duration of 1.008625349s, expected less than 1s
Jan 16 12:49:33 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:49:33.512991 W | wal: sync duration of 2.030885625s, expected less than 1s

And below are the etcd metrices for "etcd_disk_wal_fsync_duration_seconds_bucket" fsync wal logs:

{instance="test01.example.com:2379",job="etcd",le="+Inf"}	1.8785714285714286
{instance="test01.example.com:2379",job="etcd",le="0.001"}	0
{instance="test01.example.com:2379",job="etcd",le="0.002"}	0
{instance="test01.example.com:2379",job="etcd",le="0.004"}	0
{instance="test01.example.com:2379",job="etcd",le="0.008"}	0.0035714285714285713
{instance="test01.example.com:2379",job="etcd",le="0.016"}	0.11785714285714284
{instance="test01.example.com:2379",job="etcd",le="0.032"}	0.475
{instance="test01.example.com:2379",job="etcd",le="0.064"}	1.0535714285714286
{instance="test01.example.com:2379",job="etcd",le="0.128"}	1.5678571428571428
{instance="test01.example.com:2379",job="etcd",le="0.256"}	1.714285714285714
{instance="test01.example.com:2379",job="etcd",le="0.512"}	1.764285714285714
{instance="test01.example.com:2379",job="etcd",le="1.024"}	1.832142857142857
{instance="test01.example.com:2379",job="etcd",le="2.048"}	1.864285714285714
{instance="test01.example.com:2379",job="etcd",le="4.096"}	1.8785714285714286
{instance="test01.example.com:2379",job="etcd",le="8.192"}	1.8785714285714286

And below are the etcd metrices for "etcd_disk_backend_commit_duration_seconds_bucket" fsync wal logs:

{instance="test01.example.com:2379",job="etcd",le="+Inf"}	1.6816326530612244
{instance="test01.example.com:2379",job="etcd",le="0.001"}	0
{instance="test01.example.com:2379",job="etcd",le="0.002"}	0
{instance="test01.example.com:2379",job="etcd",le="0.004"}	0
{instance="test01.example.com:2379",job="etcd",le="0.008"}	0
{instance="test01.example.com:2379",job="etcd",le="0.016"}	0
{instance="test01.example.com:2379",job="etcd",le="0.032"}	0.012244897959183675
{instance="test01.example.com:2379",job="etcd",le="0.064"}	0.35918367346938773
{instance="test01.example.com:2379",job="etcd",le="0.128"}	1.2612244897959184
{instance="test01.example.com:2379",job="etcd",le="0.256"}	1.510204081632653
{instance="test01.example.com:2379",job="etcd",le="0.512"}	1.5673469387755103
{instance="test01.example.com:2379",job="etcd",le="1.024"}	1.6448979591836734
{instance="test01.example.com:2379",job="etcd",le="2.048"}	1.6816326530612244
{instance="test01.example.com:2379",job="etcd",le="4.096"}	1.6816326530612244
{instance="test01.example.com:2379",job="etcd",le="8.192"}	1.6816326530612244

So what we can do here to make etcd stable and working fine?

Thanks

The text was updated successfully, but these errors were encountered:

hexfusion · 2019-01-16T13:18:48Z

So what we can do here to make etcd stable and working fine?

From a distance just looking at these metrics my guess is that etcd is using HDD vs SSD to store state? You need faster disks to overcome this I/O bottleneck.

ronakpandya7 · 2019-01-17T15:28:19Z

Thanks for the reply @hexfusion.

Actually this issue suddenly came in our cluster after running a long time without any issue.
So is there anything else we can check?

Thanks

hexfusion · 2019-01-17T15:40:42Z

Hi @ronakpandya7 just because you were able to operate for N period of time with the current hardware does not mean it is not the problem right? Increases in workload, size of your db etc could have just caught up with you. The metrics point and logging are pointing to slow disk. Also I am pretty certain that k8s does not support 3.3 for 10.5 and will be included in 11.4. kubernetes/kubernetes#61326. Not saying that is the issue but something to consider as well.

Can you answer my question about state storage HDD vs SSD?

If you can give me full metrics as attachments I can look for more issues and or additional logging around the issue.

ref: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#disks

hexfusion · 2019-01-17T16:04:05Z

Here is a nice fio benchmark you can run to get an idea on your disks WTR IOPS. I have been meaning to add this to the repo, curious the output you see.

Fair warning the test will generate a 4GB file.

fio --randrepeat=1 \
  --ioengine=libaio \
  --direct=1 \
   --gtod_reduce=1 \
   --name=etcd-disk-io-test \
   --filename=etcd_read_write.io \
   --bs=4k --iodepth=64 --size=4G \
   --readwrite=randrw --rwmixread=75

ronakpandya7 · 2019-01-18T08:55:35Z

Hello @hexfusion,

For the storage state we are using 8 Disk (HDD) of 300 GB with Raid 1+0.

We do not have the fio command as we are using coreos.
So we have to use the fio container and it will take time.
For the full metrices i have attached one file metrices.txt

metrices.txt
Thanks.

ronakpandya7 · 2019-01-18T14:06:53Z

Here is the output for the spew command that will show the Disk I/O performance it this helps you for debugging.

[root@test01 ~]# spew -i 20 -v -d --write  -r -b 4096 1M ./test-spew
Iteration:        1    Total runtime: 00:00:05
WTR:      178.09 KiB/s   Transfer time: 00:00:05    IOPS:       44.52

Iteration:        2    Total runtime: 00:00:11
WTR:      188.04 KiB/s   Transfer time: 00:00:05    IOPS:       47.01

Iteration:        3    Total runtime: 00:00:16
WTR:      208.54 KiB/s   Transfer time: 00:00:04    IOPS:       52.14

Iteration:        4    Total runtime: 00:00:19
WTR:      289.34 KiB/s   Transfer time: 00:00:03    IOPS:       72.34

Iteration:        5    Total runtime: 00:00:23
WTR:      275.87 KiB/s   Transfer time: 00:00:03    IOPS:       68.97

Iteration:        6    Total runtime: 00:00:26
WTR:      284.01 KiB/s   Transfer time: 00:00:03    IOPS:       71.00

Iteration:        7    Total runtime: 00:00:37
WTR:       96.48 KiB/s   Transfer time: 00:00:10    IOPS:       24.12

Iteration:        8    Total runtime: 00:00:43
WTR:      169.23 KiB/s   Transfer time: 00:00:06    IOPS:       42.31

Iteration:        9    Total runtime: 00:00:48
WTR:      194.98 KiB/s   Transfer time: 00:00:05    IOPS:       48.75

Iteration:       10    Total runtime: 00:00:52
WTR:      263.04 KiB/s   Transfer time: 00:00:03    IOPS:       65.76

Iteration:       11    Total runtime: 00:00:57
WTR:      209.48 KiB/s   Transfer time: 00:00:04    IOPS:       52.37

Iteration:       12    Total runtime: 00:01:03
WTR:      161.76 KiB/s   Transfer time: 00:00:06    IOPS:       40.44

Iteration:       13    Total runtime: 00:01:09
WTR:      204.20 KiB/s   Transfer time: 00:00:05    IOPS:       51.05

Iteration:       14    Total runtime: 00:01:13
WTR:      209.46 KiB/s   Transfer time: 00:00:04    IOPS:       52.36

Iteration:       15    Total runtime: 00:01:24
WTR:      100.02 KiB/s   Transfer time: 00:00:10    IOPS:       25.00

Iteration:       16    Total runtime: 00:01:35
WTR:       88.98 KiB/s   Transfer time: 00:00:11    IOPS:       22.25

Iteration:       17    Total runtime: 00:01:40
WTR:      233.06 KiB/s   Transfer time: 00:00:04    IOPS:       58.27

Iteration:       18    Total runtime: 00:01:44
WTR:      220.07 KiB/s   Transfer time: 00:00:04    IOPS:       55.02

Iteration:       19    Total runtime: 00:01:49
WTR:      198.23 KiB/s   Transfer time: 00:00:05    IOPS:       49.56

Iteration:       20    Total runtime: 00:01:55
WTR:      185.48 KiB/s   Transfer time: 00:00:05    IOPS:       46.37

Total iterations:                               20
Total runtime:                            00:01:55
Total write transfer time (WTT):          00:01:55
Total write transfer rate (WTR):      177.51 KiB/s
Total write IOPS:                      44.38 IOPS

hexfusion · 2019-01-18T15:32:12Z

From the hardware guide linked above

etcd is very sensitive to disk write latency. Typically 50 sequential IOPS (e.g., a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (e.g., a typical local SSD or a high performance virtualized block device) is recommended.

from your test

Iteration: 16 Total runtime: 00:01:35
WTR: 88.98 KiB/s Transfer time: 00:00:11 IOPS: 22.25

Do we need to research this more? Please make sure that you disks are not shared with anything else and dedicated to etcd. If this is currently the case then your disks are too slow and you need to update them to preferably SSD.

xpcone · 2019-04-04T11:38:38Z

Why is my memory/tmpfs to slow?

hexfusion added type/question area/performance labels Jan 17, 2019

hexfusion closed this as completed Jan 18, 2019

matte21 mentioned this issue Mar 22, 2019

Add fio job file to test disk performance. #10577

Closed

godchen0212 mentioned this issue Aug 12, 2021

etcd lease not found milvus-io/milvus#6789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ETCD "wal sync: duration" error. #10414

ETCD "wal sync: duration" error. #10414

ronakpandya7 commented Jan 16, 2019

hexfusion commented Jan 16, 2019

ronakpandya7 commented Jan 17, 2019

hexfusion commented Jan 17, 2019 •

edited

Loading

hexfusion commented Jan 17, 2019

ronakpandya7 commented Jan 18, 2019 •

edited

Loading

ronakpandya7 commented Jan 18, 2019

hexfusion commented Jan 18, 2019 •

edited

Loading

xpcone commented Apr 4, 2019

ETCD "wal sync: duration" error. #10414

ETCD "wal sync: duration" error. #10414

Comments

ronakpandya7 commented Jan 16, 2019

hexfusion commented Jan 16, 2019

ronakpandya7 commented Jan 17, 2019

hexfusion commented Jan 17, 2019 • edited Loading

hexfusion commented Jan 17, 2019

ronakpandya7 commented Jan 18, 2019 • edited Loading

ronakpandya7 commented Jan 18, 2019

hexfusion commented Jan 18, 2019 • edited Loading

xpcone commented Apr 4, 2019

hexfusion commented Jan 17, 2019 •

edited

Loading

ronakpandya7 commented Jan 18, 2019 •

edited

Loading

hexfusion commented Jan 18, 2019 •

edited

Loading