Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy/grpcproxy: fix grpc proxy hang when broadcast failed to cancel a watcher #12030

Merged
merged 3 commits into from
Jun 22, 2020

Conversation

tangcong
Copy link
Contributor

@tangcong tangcong commented Jun 20, 2020

etcd version 3.4.9,recently we met a grpc proxy hang bug when broadcast failed to cancel a watcher.
watchProxyStream will hold watchRanges global mutex lock all the time if client failed to cancel etcd watchers and it will cause the watch proxy to not work.
it is very difficult to troubleshooting when grpc proxy does not work,so I also add a zap logger for watch proxy.

1327935-goroutine 701608 [chan receive, 11 minutes]:
1327936-go.etcd.io/etcd/proxy/grpcproxy.(*watchBroadcast).stop(0xc069bdd200)
1327937-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch_broadcast.go:151 +0x60
1327938-go.etcd.io/etcd/proxy/grpcproxy.(*watchBroadcasts).delete(0xc089d2f9b0, 0xc0d1534300, 0x0)
1327939-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch_broadcasts.go:114 +0x164
1327940-go.etcd.io/etcd/proxy/grpcproxy.(*watchRanges).delete(0xc000210080, 0xc0d1534300)
1327941:	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch_ranges.go:56 +0xb5
1327942-go.etcd.io/etcd/proxy/grpcproxy.(*watchProxyStream).delete(0xc093096540, 0x2)
1327943-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch.go:294 +0xba
1327944-go.etcd.io/etcd/proxy/grpcproxy.(*watchProxyStream).recvLoop(0xc093096540, 0x10e7098, 0xc0615e9a40)
1327945-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch.go:263 +0x4e1
1327946-go.etcd.io/etcd/proxy/grpcproxy.(*watchProxy).Watch.func1(0xc0615e9a40, 0xc093096540)
1327947-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch.go:125 +0x51
1327948-created by go.etcd.io/etcd/proxy/grpcproxy.(*watchProxy).Watch
1327949-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch.go:123 +0x383

1034015-goroutine 740631 [select, 13 minutes]:
1034016-go.etcd.io/etcd/clientv3.(*watcher).Watch(0xc00054e140, 0x1287720, 0xc0d25d5900, 0xc0d25bb870, 0xd, 0xc0d24ed110, 0x5, 0x5, 0xc00022ba70)
1034017-	/tmp/etcd-release-3.4.9/etcd/release/etcd/clientv3/watch.go:346 +0x5d9
1034018-go.etcd.io/etcd/proxy/grpcproxy.newWatchBroadcast.func1(0xc0d25d5940, 0xc02dc6d280, 0xc0d25b7080, 0xc000214060, 0xc0d25b7070)
1034019:	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch_broadcast.go:63 +0x212
1034020-created by go.etcd.io/etcd/proxy/grpcproxy.newWatchBroadcast
1034021-	/tmp/etcd-release-3.4.9/etcd/release/etcd/proxy/grpcproxy/watch_broadcast.go:50 +0x16a

proxy/grpcproxy/watch.go Outdated Show resolved Hide resolved
@gyuho
Copy link
Contributor

gyuho commented Jun 22, 2020

Let's use lg as the first parameter to be consistent with other functions. Otherwise lgtm

@tangcong tangcong force-pushed the fix-grpc-proxy-hang branch from 593404e to 854c327 Compare June 22, 2020 06:57
@tangcong
Copy link
Contributor Author

updated. thanks. @gyuho i also added a error log for watcher when failed to put a watch response.
image

@tangcong
Copy link
Contributor Author

#11850 (clientv3: cancel watches proactively on client context cancellation) could we backport it to release-3.4? thanks. @gyuho

@gyuho gyuho merged commit 8f19fec into etcd-io:master Jun 22, 2020
@gyuho
Copy link
Contributor

gyuho commented Jun 22, 2020

@tangcong Can you help backport? Thanks!

@tangcong
Copy link
Contributor Author

@tangcong Can you help backport? Thanks!

done. please see pr #12055. thanks.

@tangcong tangcong deleted the fix-grpc-proxy-hang branch February 26, 2021 18:59
@zhanglei216
Copy link

When to fix this issue of grpc proxy in 3.4 branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants