Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD "wal sync: duration" error. #10414

Closed
ronakpandya7 opened this issue Jan 16, 2019 · 8 comments
Closed

ETCD "wal sync: duration" error. #10414

ronakpandya7 opened this issue Jan 16, 2019 · 8 comments

Comments

@ronakpandya7
Copy link

Hello,

We are using kubernetes version 10.5 and etcd version 3.3.3.
We are facing "wal sync duration" issue in etcd.

Status of our etcd cluster is continiously changing from healthy to unhealthy and from unhealthy to healthy.

member b3b41c7fa80c57 is unhealthy: got unhealthy result from https://test01.example.com:2379
member 68ab94f6d8198965 is healthy: got healthy result from https://test02.example.com:2379
member aff40dc30f6ce09c is healthy: got healthy result from https://test05.example.com:2379
member f647a273c0b05deb is healthy: got healthy result from https://test03.example.com:2379
member fd4aa6527f8e52bd is healthy: got healthy result from https://test04.example.com:2379
cluster is degraded

After some time...

member b3b41c7fa80c57 is healthy: got healthy result from https://test01.example.com:2379
member 68ab94f6d8198965 is healthy: got healthy result from https://test02.example.com:2379
member aff40dc30f6ce09c is healthy: got healthy result from https://test05.example.com:2379
member f647a273c0b05deb is healthy: got healthy result from https://test03.example.com:2379
member fd4aa6527f8e52bd is healthy: got healthy result from https://test04.example.com:2379
cluster is healthy

We checked in etcd service logs and it is as below.

etcd logs:

Jan 16 12:47:00 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:47:00.350963 W | wal: sync duration of 2.657129557s, expected less than 1s
Jan 16 12:47:32 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:47:32.413980 W | etcdserver: read-only range request "key:\"/registry/networkpolicies\" range_end:\"/registry/networkpoliciet\" count_only:true " took too long (850.292449ms) to execute
Jan 16 12:47:32 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:47:32.414472 W | etcdserver: read-only range request "key:\"/registry/statefulsets/ns-du/pod-nginx1-test\" " took too long (352.115645ms) to execute
Jan 16 12:49:15 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:49:15.863276 W | wal: sync duration of 1.008625349s, expected less than 1s
Jan 16 12:49:33 test01.example.com etcd-wrapper[15260]: 2019-01-16 12:49:33.512991 W | wal: sync duration of 2.030885625s, expected less than 1s

And below are the etcd metrices for "etcd_disk_wal_fsync_duration_seconds_bucket" fsync wal logs:

{instance="test01.example.com:2379",job="etcd",le="+Inf"}	1.8785714285714286
{instance="test01.example.com:2379",job="etcd",le="0.001"}	0
{instance="test01.example.com:2379",job="etcd",le="0.002"}	0
{instance="test01.example.com:2379",job="etcd",le="0.004"}	0
{instance="test01.example.com:2379",job="etcd",le="0.008"}	0.0035714285714285713
{instance="test01.example.com:2379",job="etcd",le="0.016"}	0.11785714285714284
{instance="test01.example.com:2379",job="etcd",le="0.032"}	0.475
{instance="test01.example.com:2379",job="etcd",le="0.064"}	1.0535714285714286
{instance="test01.example.com:2379",job="etcd",le="0.128"}	1.5678571428571428
{instance="test01.example.com:2379",job="etcd",le="0.256"}	1.714285714285714
{instance="test01.example.com:2379",job="etcd",le="0.512"}	1.764285714285714
{instance="test01.example.com:2379",job="etcd",le="1.024"}	1.832142857142857
{instance="test01.example.com:2379",job="etcd",le="2.048"}	1.864285714285714
{instance="test01.example.com:2379",job="etcd",le="4.096"}	1.8785714285714286
{instance="test01.example.com:2379",job="etcd",le="8.192"}	1.8785714285714286

And below are the etcd metrices for "etcd_disk_backend_commit_duration_seconds_bucket" fsync wal logs:

{instance="test01.example.com:2379",job="etcd",le="+Inf"}	1.6816326530612244
{instance="test01.example.com:2379",job="etcd",le="0.001"}	0
{instance="test01.example.com:2379",job="etcd",le="0.002"}	0
{instance="test01.example.com:2379",job="etcd",le="0.004"}	0
{instance="test01.example.com:2379",job="etcd",le="0.008"}	0
{instance="test01.example.com:2379",job="etcd",le="0.016"}	0
{instance="test01.example.com:2379",job="etcd",le="0.032"}	0.012244897959183675
{instance="test01.example.com:2379",job="etcd",le="0.064"}	0.35918367346938773
{instance="test01.example.com:2379",job="etcd",le="0.128"}	1.2612244897959184
{instance="test01.example.com:2379",job="etcd",le="0.256"}	1.510204081632653
{instance="test01.example.com:2379",job="etcd",le="0.512"}	1.5673469387755103
{instance="test01.example.com:2379",job="etcd",le="1.024"}	1.6448979591836734
{instance="test01.example.com:2379",job="etcd",le="2.048"}	1.6816326530612244
{instance="test01.example.com:2379",job="etcd",le="4.096"}	1.6816326530612244
{instance="test01.example.com:2379",job="etcd",le="8.192"}	1.6816326530612244

So what we can do here to make etcd stable and working fine?

Thanks

@hexfusion
Copy link
Contributor

So what we can do here to make etcd stable and working fine?

From a distance just looking at these metrics my guess is that etcd is using HDD vs SSD to store state? You need faster disks to overcome this I/O bottleneck.

@ronakpandya7
Copy link
Author

Thanks for the reply @hexfusion.

Actually this issue suddenly came in our cluster after running a long time without any issue.
So is there anything else we can check?

Thanks

@hexfusion
Copy link
Contributor

hexfusion commented Jan 17, 2019

Hi @ronakpandya7 just because you were able to operate for N period of time with the current hardware does not mean it is not the problem right? Increases in workload, size of your db etc could have just caught up with you. The metrics point and logging are pointing to slow disk. Also I am pretty certain that k8s does not support 3.3 for 10.5 and will be included in 11.4. kubernetes/kubernetes#61326. Not saying that is the issue but something to consider as well.

Can you answer my question about state storage HDD vs SSD?

If you can give me full metrics as attachments I can look for more issues and or additional logging around the issue.

ref: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/hardware.md#disks

@hexfusion
Copy link
Contributor

Here is a nice fio benchmark you can run to get an idea on your disks WTR IOPS. I have been meaning to add this to the repo, curious the output you see.

Fair warning the test will generate a 4GB file.

fio --randrepeat=1 \
  --ioengine=libaio \
  --direct=1 \
   --gtod_reduce=1 \
   --name=etcd-disk-io-test \
   --filename=etcd_read_write.io \
   --bs=4k --iodepth=64 --size=4G \
   --readwrite=randrw --rwmixread=75

@ronakpandya7
Copy link
Author

ronakpandya7 commented Jan 18, 2019

Hello @hexfusion,

For the storage state we are using 8 Disk (HDD) of 300 GB with Raid 1+0.

We do not have the fio command as we are using coreos.
So we have to use the fio container and it will take time.
For the full metrices i have attached one file metrices.txt

metrices.txt
Thanks.

@ronakpandya7
Copy link
Author

Here is the output for the spew command that will show the Disk I/O performance it this helps you for debugging.

[root@test01 ~]# spew -i 20 -v -d --write  -r -b 4096 1M ./test-spew
Iteration:        1    Total runtime: 00:00:05
WTR:      178.09 KiB/s   Transfer time: 00:00:05    IOPS:       44.52

Iteration:        2    Total runtime: 00:00:11
WTR:      188.04 KiB/s   Transfer time: 00:00:05    IOPS:       47.01

Iteration:        3    Total runtime: 00:00:16
WTR:      208.54 KiB/s   Transfer time: 00:00:04    IOPS:       52.14

Iteration:        4    Total runtime: 00:00:19
WTR:      289.34 KiB/s   Transfer time: 00:00:03    IOPS:       72.34

Iteration:        5    Total runtime: 00:00:23
WTR:      275.87 KiB/s   Transfer time: 00:00:03    IOPS:       68.97

Iteration:        6    Total runtime: 00:00:26
WTR:      284.01 KiB/s   Transfer time: 00:00:03    IOPS:       71.00

Iteration:        7    Total runtime: 00:00:37
WTR:       96.48 KiB/s   Transfer time: 00:00:10    IOPS:       24.12

Iteration:        8    Total runtime: 00:00:43
WTR:      169.23 KiB/s   Transfer time: 00:00:06    IOPS:       42.31

Iteration:        9    Total runtime: 00:00:48
WTR:      194.98 KiB/s   Transfer time: 00:00:05    IOPS:       48.75

Iteration:       10    Total runtime: 00:00:52
WTR:      263.04 KiB/s   Transfer time: 00:00:03    IOPS:       65.76

Iteration:       11    Total runtime: 00:00:57
WTR:      209.48 KiB/s   Transfer time: 00:00:04    IOPS:       52.37

Iteration:       12    Total runtime: 00:01:03
WTR:      161.76 KiB/s   Transfer time: 00:00:06    IOPS:       40.44

Iteration:       13    Total runtime: 00:01:09
WTR:      204.20 KiB/s   Transfer time: 00:00:05    IOPS:       51.05

Iteration:       14    Total runtime: 00:01:13
WTR:      209.46 KiB/s   Transfer time: 00:00:04    IOPS:       52.36

Iteration:       15    Total runtime: 00:01:24
WTR:      100.02 KiB/s   Transfer time: 00:00:10    IOPS:       25.00

Iteration:       16    Total runtime: 00:01:35
WTR:       88.98 KiB/s   Transfer time: 00:00:11    IOPS:       22.25

Iteration:       17    Total runtime: 00:01:40
WTR:      233.06 KiB/s   Transfer time: 00:00:04    IOPS:       58.27

Iteration:       18    Total runtime: 00:01:44
WTR:      220.07 KiB/s   Transfer time: 00:00:04    IOPS:       55.02

Iteration:       19    Total runtime: 00:01:49
WTR:      198.23 KiB/s   Transfer time: 00:00:05    IOPS:       49.56

Iteration:       20    Total runtime: 00:01:55
WTR:      185.48 KiB/s   Transfer time: 00:00:05    IOPS:       46.37

Total iterations:                               20
Total runtime:                            00:01:55
Total write transfer time (WTT):          00:01:55
Total write transfer rate (WTR):      177.51 KiB/s
Total write IOPS:                      44.38 IOPS

@hexfusion
Copy link
Contributor

hexfusion commented Jan 18, 2019

From the hardware guide linked above

etcd is very sensitive to disk write latency. Typically 50 sequential IOPS (e.g., a 7200 RPM disk) is required. For heavily loaded clusters, 500 sequential IOPS (e.g., a typical local SSD or a high performance virtualized block device) is recommended.

from your test

Iteration: 16 Total runtime: 00:01:35
WTR: 88.98 KiB/s Transfer time: 00:00:11 IOPS: 22.25

Do we need to research this more? Please make sure that you disks are not shared with anything else and dedicated to etcd. If this is currently the case then your disks are too slow and you need to update them to preferably SSD.

@xpcone
Copy link

xpcone commented Apr 4, 2019

Why is my memory/tmpfs to slow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants