-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pingresp not received, disconnecting #263
Comments
I'm pretty sure this is fixed in master (which I plan to cut a new release from shortly) but I can't find the specific fix right now. Is it possible for you to test with the master branch? |
Thanks for the feedback. Yes, I can test with the master branch and get back to you as soon as I know more. |
Hi, the disconnection issue is also occurring in master branch. It also seems as if Paho is not sending a keepalive every 300 seconds as it is configured. I would expect a "keepalive sending ping" every five minutes but sometimes it is skipped: cat gost_ping.log | grep -E 'keepalive|ping' | grep -v 'ping check' time="2018-11-30T07:26:02Z" level=info msg="[pinger] keepalive starting" package=gost.server.mqtt Here is the relevant excerpt from Mosquitto.log: cat mosquitto_ts.log | grep -i 'PING' | grep -i 'lumiere' 2018-11-30 07:36:02 Received PINGREQ from gost_lumiere It is strange that the "keepalive sending ping" from "2018-11-30T08:22:02Z" is not seen in Mosquitto.log, yet Paho seems to receive an answer: time="2018-11-30T08:22:02Z" level=info msg="[net] received pingresp" package=gost.server.mqtt Then there seem to be no more "keepalive sending ping"s sent by Paho until the disconnection at "2018-11-30T08:54:07Z". Any ideas what might be going on here? Thanks a lot and best regards Jannis |
Hi all, we had some very weird network issues with a particular Docker network. After deleting and re-creating this Docker network the issue was gone. Thanks for your help and best regards Jannis |
This is still happening to me on current master over an LTE connection when the connectivity drops (we have default options fot pingTimeout and keepAlive). Could someone at least clarify if "disconnecting" means that the reconnect mechanism is stopped and we have to call |
@janniswarnat please don't close issues that only surface on "weird" networks. We're in a distributed systems world; if you run anything long enough, you'll run into weird issues. I just saw the same thing on v1.1.1 while connecting to vernemq, with both vernemq and my client in two containers in the same kubernetes pod. @peterdeka According to the code, "disconnecting" triggers an internal connection reset and reconnect as long as you set AutoReconnect = true in options (which should be the case if you used NewClientOptions()). Client.IsConnected() will still return true, but that's only because it's in the reconnecting state. In more detail: the message is printed in ping.go's keepalive(), which writes an error to c.errors that's picked up in net.go's errorWatch, which calls client.internalConnLost() in a goroutine. In my case, I'm certain that's getting called exactly once because this shows on the next log line:
The first place I'd check for a hang is at client.internalConnLost()'s call to c.workers.Wait(), which requires the goroutines errorWatch(), alllogic(), outgoing(), and incoming() to all exit or crash, otherwise it blocks indefinitely. @alsm Since this issue has come up quite a few times I'd suggest bumping up a couple log lines in reconnect() to INFO so you can at least tell the function is being entered. This is mission-critical for us, so I'm going to build a liveness probe so kubernetes will automatically restart the client if we lose MQTT connectivity. The trigger has to be external -- I can't trust the MQTT client to close itself properly if it's in a bad state like this. |
Thanks @originsmike we are experiencing the issue with NewClientOptions() and AutoReconnect = true, for this reason i was asking about it. I have to be honest: for what we are experiencing, it seems like the behavior is not consistent: sometimes the client reconnects, sometimes it wont reconnect, i can't confirm though that the pingresp error was present in all the cases. However the network problem was the same. |
I experienced the same error with paho.mqtt.golang with emqx, can this be re-opened? |
Hi @GopherJ - this code has been rewritten since this issue was raised (almost two years ago) so it's likely that your issue differs from the above. Please try with the latest code ( |
Hi all,
we have the same issue briefly discussed in #210. We use a server component using paho.mqtt.golang version 1.1.1 connecting to a Mosquitto MQTT broker. From time to time the client gets disconnected from the server. The debug log output from the client looks like this:
time="2018-11-20T14:41:13Z" level=info msg="[pinger] keepalive sending ping"
time="2018-11-20T14:41:18Z" level=info msg="[pinger] ping check 5"
time="2018-11-20T14:41:23Z" level=info msg="[pinger] ping check 10"
time="2018-11-20T14:41:28Z" level=info msg="[pinger] ping check 15"
time="2018-11-20T14:41:33Z" level=info msg="[pinger] ping check 20"
time="2018-11-20T14:41:38Z" level=info msg="[pinger] ping check 25"
time="2018-11-20T14:41:43Z" level=info msg="[pinger] ping check 30"
time="2018-11-20T14:41:48Z" level=info msg="[pinger] ping check 35"
time="2018-11-20T14:41:53Z" level=info msg="[pinger] ping check 40"
time="2018-11-20T14:41:58Z" level=info msg="[pinger] ping check 45"
time="2018-11-20T14:42:03Z" level=info msg="[pinger] ping check 50"
time="2018-11-20T14:42:08Z" level=info msg="[pinger] ping check 55"
time="2018-11-20T14:42:13Z" level=info msg="[pinger] ping check 60"
time="2018-11-20T14:42:13Z" level=info msg="[pinger] pingresp not received, disconnecting"
Mosquitto gets the ping request and claims to answer immediately:
2018-11-20 14:41:13 Received PINGREQ from gost_dom
2018-11-20 14:41:13 Sending PINGRESP to gost_dom
Yet the client obviously never gets / acknowledges the ping response. We have tried to set option "Order" to false as Ashton suggests in #210 but this does not help with this issue.
Any idea what might be happening here? Or what steps we could take to get to the bottom of this?
Thanks a lot for your support and best regards
Jannis
The text was updated successfully, but these errors were encountered: