-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watch hang occasionally #884
Comments
what is the tcp conn state? |
Didn't check it this time, the dump is following, same as the hang issue before tcp keep alive added, so I guess it is also ESTAB, but connection already dropped by load balancer.
|
sounds like the connection is still alive? |
It reproduced three times in total since 7.2.19(released at Apr 23, 2022), another noticeable thing is that when it happen, all pods running watch logic are hang at same time(or similar time at least), it makes me guess the connections was dropped underlying somewhere, like #533 (comment) and #773 (comment) I can try tcpdump next time. But in case of the underlying connection problem, I am wondering is it good to make watch timeout configurable instead of Infinite? |
does |
Thanks for the idea, it works for me. It is a bit tricky than global configurable watch timeout since I need to add CancelAfter in every watch logic. |
After a few fixes, the watch api works fine basically, however watch hang still occur occasionally.
The underlying connection was dropped silently even though there is tcp keep alive, then the watch hang at WaitForSocketEvents because timeout is set to Infinite.
In my observation, the api server always close watch connection after 30m~60m, I am thinking set watch timeout to a reasonable value instead of Infinite to fix the hang issue, for example 2 hours, since api server has already closed the connection before this time range in normal condition.
csharp/src/LibKubernetesGenerator/templates/Operations.cs.template
Lines 23 to 26 in 59bea22
kubernetes/kubernetes#67491 (comment)
#533 (comment)
https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/
--min-request-timeout int Default: 1800
The text was updated successfully, but these errors were encountered: