Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdctl member add sometimes fails with 3.3.15 #11186

Closed
juliantaylor opened this issue Sep 27, 2019 · 24 comments · Fixed by #11194
Closed

etcdctl member add sometimes fails with 3.3.15 #11186

juliantaylor opened this issue Sep 27, 2019 · 24 comments · Fixed by #11194

Comments

@juliantaylor
Copy link

juliantaylor commented Sep 27, 2019

We recently updated our etcd from 3.3.13 to 3.3.15 and since then etcdctl member add sometimes fails with following error:

Member ca83b0fdcfd3f509 added to cluster  7f02dc1fca0457e
{"level":"warn","ts":"2019-09-27T16:56:16.191+0200","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-.../etcdmaster1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}

Though it seems the member add actually worked fine as a etcdctl member list shows the newly added member as unstarted

# endpoints are three existing and running etcd nodes
etcdctl member add "newmember" --peer-urls=https://newmember:2380
# errors out with above message
etcdctl member list
...
ca83b0fdcfd3f509, unstarted, , https://newmember:2380,

Starting the etcd process then also works fine.

This behaviour seems to be new in 3.3.15, it did not happen in 3.3.13.

@jingyih
Copy link
Contributor

jingyih commented Sep 27, 2019

Member ca83b0fdcfd3f509 added to cluster 7f02dc1fca0457e

If this line is printed, the new member is added to the cluster.

{"level":"warn","ts":"2019-09-27T16:56:16.191+0200","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-.../etcdmaster1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}

Not sure what caused the warning message, but it is just a timeout and then the same request was retried and succeeded. This type of warning message is probably newly added. v3.3.14 and v3.3.15 have a new client side balancer.

@juliantaylor
Copy link
Author

The command exits with a failure though, if it is just a harmless warning it should not do that so you can distinguish it from a real error.

The member add does take a couple seconds when it prints this warning so it is probably a timeout.

@jingyih
Copy link
Contributor

jingyih commented Sep 27, 2019

The command exits with a failure though, if it is just a harmless warning it should not do that so you can distinguish it from a real error.

The member add command did not exit with error. IIUC, the warning is from the retry logic in the client side balancer. I agree it feels a little confusing, but it is not the output print of the member add command.

cc @gyuho

@juliantaylor
Copy link
Author

juliantaylor commented Sep 27, 2019

etcdctl does exits with an error (= nonzero exit code) in this case. The issue was found because the scripts starting an extra etcd members aborted because they are run using a shell in set -e mode.

@jingyih
Copy link
Contributor

jingyih commented Sep 28, 2019

Oh I see. The 'etcdctl member add' command is actually two parts.

  1. Add new member,
  2. get member list which includes the newly added member

From your output print it looks like 2) failed with a timeout.

@jingyih
Copy link
Contributor

jingyih commented Sep 28, 2019

Went through the changelog, did not find anything might lead to regression in member configuration.

@juliantaylor
Copy link
Author

juliantaylor commented Sep 28, 2019

I ran the command in debug mode, unfortunately no very interesting extra output:

ETCDCTL_CACERT=/etc/etcd/ca.pem
ETCDCTL_CERT=/etc/etcd/crt.pem
ETCDCTL_COMMAND_TIMEOUT=5s
ETCDCTL_DEBUG=true
ETCDCTL_DIAL_TIMEOUT=2s
ETCDCTL_DISCOVERY_SRV=
ETCDCTL_ENDPOINTS=[https://master001.zone:2379,https://master002.zone:2379,https://master003.zone:2379]
ETCDCTL_HEX=false
ETCDCTL_INSECURE_DISCOVERY=true
ETCDCTL_INSECURE_SKIP_TLS_VERIFY=false
ETCDCTL_INSECURE_TRANSPORT=true
ETCDCTL_KEEPALIVE_TIME=2s
ETCDCTL_KEEPALIVE_TIMEOUT=6s
ETCDCTL_KEY=/etc/etcd/key.pem
ETCDCTL_USER=
ETCDCTL_WRITE_OUT=simple
WARNING: 2019/09/28 11:02:33 Adjusting keepalive ping interval to minimum period of 10s 
WARNING: 2019/09/28 11:02:33 Adjusting keepalive ping interval to minimum period of 10s 
INFO: 2019/09/28 11:02:33 parsed scheme: "endpoint"
INFO: 2019/09/28 11:02:33 ccResolverWrapper: sending new addresses to cc: [{https://master001.zone:2379 0  <nil>} {https://master002.zone:2379 0  <nil>} {https://master003.zone:2379 0  <nil>}]
Member 1760eab222d27300 added to cluster  7f02dc1fca0457e
{"level":"warn","ts":"2019-09-28T11:02:38.435+0200","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-6b756513-08bd-4e0f-b13d-c7700769f001/master001.zone:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Error: <nil>

@juliantaylor
Copy link
Author

juliantaylor commented Sep 28, 2019

increasing the command and dial timeout to 15s does help either, the member add just hangs for 15 seconds then. Longer timeouts probably won't change anything as our current workaround is to do a member list 5 seconds after the failure and the member is in the list then and the etcd can startup.

@jgraettinger
Copy link

I'm seeing this also on v3.4.1, and have pinned it down to an active retry infinite loop that runs until the --command-timeout is exceeded. A SIGQUIT trace I collected from etcdctl follows. Possible the new client-side balancer has a case where the Get and MemberList RPCs are dispatched to different members, such that this case never evaluates true ?

goroutine 1 [runnable]:
google.golang.org/grpc/internal/transport.(*Stream).waitOnHeader(0xc0003a6f00, 0xc00031c790, 0xc000150db0)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/internal/transport/transport.go:312 +0xcc
google.golang.org/grpc/internal/transport.(*Stream).RecvCompress(...)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/internal/transport/transport.go:331
google.golang.org/grpc.(*csAttempt).recvMsg(0xc0004a6980, 0xce8480, 0xc00045ec00, 0x0, 0x0, 0x0)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/stream.go:871 +0x70c
google.golang.org/grpc.(*clientStream).RecvMsg.func1(0xc0004a6980, 0xc00031c789, 0x3)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/stream.go:736 +0x46
google.golang.org/grpc.(*clientStream).withRetry(0xc0000339e0, 0xc000150f50, 0xc000150f40, 0xc00031c789, 0xc00031c790)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/stream.go:594 +0x7c
google.golang.org/grpc.(*clientStream).RecvMsg(0xc0000339e0, 0xce8480, 0xc00045ec00, 0x0, 0x0)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/stream.go:735 +0x100
google.golang.org/grpc.invoke(0xe7f7e0, 0xc00013a240, 0xd4e780, 0x16, 0xd0c8c0, 0xc0001257a0, 0xce8480, 0xc00045ec00, 0xc00023f880, 0xc0002daec0, ...)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/call.go:73 +0x141
go.etcd.io/etcd/clientv3.(*Client).unaryClientInterceptor.func1(0xe7f7e0, 0xc00013a240, 0xd4e780, 0x16, 0xd0c8c0, 0xc0001257a0, 0xce8480, 0xc00045ec00, 0xc00023f880, 0xd82150, ...)
        /tmp/etcd-release-3.4.1/etcd/release/etcd/clientv3/retry_interceptor.go:57 +0x458
google.golang.org/grpc.(*ClientConn).Invoke(0xc00023f880, 0xe7f7e0, 0xc00013a240, 0xd4e780, 0x16, 0xd0c8c0, 0xc0001257a0, 0xce8480, 0xc00045ec00, 0xc00022b740, ...)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/call.go:35 +0x10a
google.golang.org/grpc.Invoke(...)
        /Users/leegyuho/go/pkg/mod/google.golang.org/grpc@v1.23.1/call.go:60
go.etcd.io/etcd/etcdserver/etcdserverpb.(*kVClient).Range(0xc000230100, 0xe7f7e0, 0xc00013a240, 0xc0001257a0, 0xc00022b740, 0x4, 0x6, 0x6, 0x40c098, 0x70)
        /tmp/etcd-release-3.4.1/etcd/release/etcd/etcdserver/etcdserverpb/rpc.pb.go:3468 +0xd3
go.etcd.io/etcd/clientv3.(*retryKVClient).Range(0xc000238bf0, 0xe7f7e0, 0xc00013a240, 0xc0001257a0, 0x14baa60, 0x3, 0x3, 0x8, 0x0, 0xc00061e1e0)
        /tmp/etcd-release-3.4.1/etcd/release/etcd/clientv3/retry.go:105 +0xeb
go.etcd.io/etcd/clientv3.(*kv).Do(0xc00024cde0, 0xe7f7e0, 0xc00013a240, 0x1, 0xc00031c740, 0x1, 0x8, 0x0, 0x0, 0x0, ...)
        /tmp/etcd-release-3.4.1/etcd/release/etcd/clientv3/kv.go:149 +0x1e5
go.etcd.io/etcd/clientv3.(*kv).Get(0xc00024cde0, 0xe7f7e0, 0xc00013a240, 0xd3fbfe, 0x1, 0x0, 0x0, 0x0, 0xc00045e840, 0x0, ...)
        /tmp/etcd-release-3.4.1/etcd/release/etcd/clientv3/kv.go:119 +0xfc
go.etcd.io/etcd/etcdctl/ctlv3/command.memberAddCommandFunc(0xc000285900, 0xc00022e280, 0x1, 0x5)
        /tmp/etcd-release-3.4.1/etcd/release/etcd/etcdctl/ctlv3/command/member_command.go:170 +0x4fb
github.com/spf13/cobra.(*Command).execute(0xc000285900, 0xc00022e230, 0x5, 0x5, 0xc000285900, 0xc00022e230)
        /Users/leegyuho/go/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:766 +0x2ae
github.com/spf13/cobra.(*Command).ExecuteC(0x14af240, 0xb, 0x14d8ec0, 0xb7b200)
        /Users/leegyuho/go/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:852 +0x2ec
github.com/spf13/cobra.(*Command).Execute(...)
        /Users/leegyuho/go/pkg/mod/github.com/spf13/cobra@v0.0.3/command.go:800
go.etcd.io/etcd/etcdctl/ctlv3.Start()
        /tmp/etcd-release-3.4.1/etcd/release/etcd/etcdctl/ctlv3/ctl_nocov.go:25 +0x81
main.main()
        /tmp/etcd-release-3.4.1/etcd/release/etcd/etcdctl/main.go:35 +0x77

@jingyih
Copy link
Contributor

jingyih commented Oct 2, 2019

Are all etcd members healthy in the cluster? This can be verified by etcdctl endpoint health --cluster.

Reading the logic in the for loop, it keeps trying Get() and MemberList() until these two API calls are served by the same member. This logic can be optimized but it is correct (I cannot reproduce this issue with the new balancer). Reading from the warning message, maybe Get() timed out on a member which is causing the entire command to time out?

@jgraettinger
Copy link

Are all etcd members healthy in the cluster?

yes, all healthy.

it keeps trying Get() and MemberList() until these two API calls are served by the same member.

The new gRPC client balancer is using round-robin, no? In that case there's no guarantee that will ever happen. Each alternating Get and MemberList RPC rotates to the next member.

I suspect the only reason this appears to work much of the time is because the entire flow is racing gRPC SubConn establishment in the client balancer, such that, in practice, all RPCs happen to be dispatched over a single ready SubConn because SubConns to other members are still being stood up.

@jingyih
Copy link
Contributor

jingyih commented Oct 2, 2019

I see. I am not sure how client balance rotates endpoint for normal key value APIs and member APIs.

@jpbetz @gyuho Could you please take a look at this issue?

@jingyih
Copy link
Contributor

jingyih commented Oct 2, 2019

On the other hand, we can definitely optimize the logic. Every time we do a Get(). we add the endpoint who served that request to a "synced" list. Once MemberList is served from a "synced" endpoint, we are done.

@gyuho
Copy link
Contributor

gyuho commented Oct 2, 2019

Doesn't seem like a balancer issue. The failover should happen for all RPCs and it only happens when it receives the transient error (retriable). Maybe you issued a command to isolated member?

@jpbetz
Copy link
Contributor

jpbetz commented Oct 2, 2019

The way etcdctl is written to wait for a particular member to serve a MemberList response should probably be rethought. At first glance, I would expect etcdctl to retry the MemberList until it knows somehow that it has received fresh data. Either it waits until the response contains the member it added, or we prove we can use the response.Header.Revision to determine that we are observing fresh data. It shouldn't need to matter which member serves the MemberList request, and having to make the cli.Get call to sync the cluster list is not something clients should need to do.

@jpbetz
Copy link
Contributor

jpbetz commented Oct 2, 2019

Ideally, MemberList would always serve consistent data. Are there performance reasons that prevent us from ensuring it is synced before serving a response?

@jpbetz
Copy link
Contributor

jpbetz commented Oct 2, 2019

To root cause, we might need to modify the condition @jgraettinger found to print out the members being compared and see what is happening there:

if listResp.Header.MemberId == resp.Header.MemberId {

I suspect the best mitigation would be to do what @jingyih described in #11186 (comment), although without the root cause it's tough to be sure it will actually fix the problem.

@jgraettinger Do you know how to reproduce in a way we could replicate? Or alternatively,would you be willing to run a build of etcdctl with additional logging to help isolate the issue?

@jingyih
Copy link
Contributor

jingyih commented Oct 2, 2019

Here is my new theory. Originally the cluster has 1 member. Using MemberAdd() to add a new member. Now cluster has 2 members, and it needs 2 members to be started and healthy to serve a quorum read. So before the new member starts, Get() will always hang.

@jingyih
Copy link
Contributor

jingyih commented Oct 3, 2019

Oh but in this case logic " listResp.Header.MemberId == resp.Header.MemberId" should always pass. So Get() should not be invoked.

@jpbetz
Copy link
Contributor

jpbetz commented Oct 3, 2019

In our design we had originally planned to round robin on failover. But in our implementation, we do round robin across endpoints on a per client request basis:

https://github.com/etcd-io/etcd/blob/master/clientv3/balancer/picker/roundrobin_balanced.go#L67

I think @jgraettinger's assessment was correct and @jingyih's proposed fix will work. Let's still validate that, but I'm pretty confident this is the problem.

@jgraettinger
Copy link

Hi gang, I was able to reproduce this with pretty consistently while testing this kustomize manifest for deploying v3.4 as a StatefulSet, when scaling up the cluster beyond the initial 3 members to 4, 5, 6, etc: https://github.com/jgraettinger/gazette/tree/kustomize/kustomize/bases/etcd

I'm happy to run it again with an instrumented build; are you able to provide a docker image I can pull? Or, if this theory is right I'd also bet it's easy to reproduce by adding a time.Sleep between the RPCs (to allow all SubConns to finish connecting).

@jingyih
Copy link
Contributor

jingyih commented Oct 3, 2019

I tried this:

--- a/etcdctl/ctlv3/command/member_command.go
+++ b/etcdctl/ctlv3/command/member_command.go
@@ -164,8 +164,9 @@ func memberAddCommandFunc(cmd *cobra.Command, args []string) {
                                ExitWithError(ExitError, err)
                        }
                        if listResp.Header.MemberId == resp.Header.MemberId {
-                               break
+                               // break
                        }
+                       fmt.Printf("get served by %v, member list served by %v\n", resp.Header.MemberId, listResp.Header.MemberId)
                        // quorum get to sync cluster list
                        gresp, gerr := cli.Get(ctx, "_")
                        if gerr != nil {

output:

get served by 10501334649042878790, member list served by 18249187646912138824
get served by 9372538179322589801, member list served by 10501334649042878790
get served by 18249187646912138824, member list served by 9372538179322589801
get served by 10501334649042878790, member list served by 18249187646912138824
get served by 9372538179322589801, member list served by 10501334649042878790
get served by 18249187646912138824, member list served by 9372538179322589801
get served by 10501334649042878790, member list served by 18249187646912138824
get served by 9372538179322589801, member list served by 10501334649042878790
get served by 18249187646912138824, member list served by 9372538179322589801
get served by 10501334649042878790, member list served by 18249187646912138824
get served by 9372538179322589801, member list served by 10501334649042878790

So listResp.Header.MemberId == resp.Header.MemberId is always false due to the new client balancer.

@jingyih
Copy link
Contributor

jingyih commented Oct 3, 2019

Let me send out a quick fix on client side. Long term, I agree we should make MemberList consistent, so we can drop the logic on client side.

@jingyih
Copy link
Contributor

jingyih commented Oct 3, 2019

Quick fix: #11194.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants