Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pingresp not received, disconnecting #263

Closed
janniswarnat opened this issue Nov 20, 2018 · 9 comments
Closed

pingresp not received, disconnecting #263

janniswarnat opened this issue Nov 20, 2018 · 9 comments

Comments

@janniswarnat
Copy link

Hi all,

we have the same issue briefly discussed in #210. We use a server component using paho.mqtt.golang version 1.1.1 connecting to a Mosquitto MQTT broker. From time to time the client gets disconnected from the server. The debug log output from the client looks like this:

time="2018-11-20T14:41:13Z" level=info msg="[pinger] keepalive sending ping"
time="2018-11-20T14:41:18Z" level=info msg="[pinger] ping check 5"
time="2018-11-20T14:41:23Z" level=info msg="[pinger] ping check 10"
time="2018-11-20T14:41:28Z" level=info msg="[pinger] ping check 15"
time="2018-11-20T14:41:33Z" level=info msg="[pinger] ping check 20"
time="2018-11-20T14:41:38Z" level=info msg="[pinger] ping check 25"
time="2018-11-20T14:41:43Z" level=info msg="[pinger] ping check 30"
time="2018-11-20T14:41:48Z" level=info msg="[pinger] ping check 35"
time="2018-11-20T14:41:53Z" level=info msg="[pinger] ping check 40"
time="2018-11-20T14:41:58Z" level=info msg="[pinger] ping check 45"
time="2018-11-20T14:42:03Z" level=info msg="[pinger] ping check 50"
time="2018-11-20T14:42:08Z" level=info msg="[pinger] ping check 55"
time="2018-11-20T14:42:13Z" level=info msg="[pinger] ping check 60"
time="2018-11-20T14:42:13Z" level=info msg="[pinger] pingresp not received, disconnecting"

Mosquitto gets the ping request and claims to answer immediately:

2018-11-20 14:41:13 Received PINGREQ from gost_dom
2018-11-20 14:41:13 Sending PINGRESP to gost_dom

Yet the client obviously never gets / acknowledges the ping response. We have tried to set option "Order" to false as Ashton suggests in #210 but this does not help with this issue.

Any idea what might be happening here? Or what steps we could take to get to the bottom of this?

Thanks a lot for your support and best regards

Jannis

@alsm
Copy link
Contributor

alsm commented Nov 29, 2018

I'm pretty sure this is fixed in master (which I plan to cut a new release from shortly) but I can't find the specific fix right now. Is it possible for you to test with the master branch?

@janniswarnat
Copy link
Author

Thanks for the feedback. Yes, I can test with the master branch and get back to you as soon as I know more.

@janniswarnat
Copy link
Author

Hi,

the disconnection issue is also occurring in master branch. It also seems as if Paho is not sending a keepalive every 300 seconds as it is configured. I would expect a "keepalive sending ping" every five minutes but sometimes it is skipped:

cat gost_ping.log | grep -E 'keepalive|ping' | grep -v 'ping check'

time="2018-11-30T07:26:02Z" level=info msg="[pinger] keepalive starting" package=gost.server.mqtt
time="2018-11-30T07:36:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T07:42:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T07:42:21Z" level=info msg="[net] received pingresp" package=gost.server.mqtt
time="2018-11-30T07:52:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T07:57:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T08:02:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T08:02:02Z" level=info msg="[net] received pingresp" package=gost.server.mqtt
time="2018-11-30T08:07:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T08:07:02Z" level=info msg="[net] received pingresp" package=gost.server.mqtt
time="2018-11-30T08:12:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T08:12:02Z" level=info msg="[net] received pingresp" package=gost.server.mqtt
time="2018-11-30T08:17:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T08:22:02Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T08:22:02Z" level=info msg="[net] received pingresp" package=gost.server.mqtt
time="2018-11-30T08:32:09Z" level=info msg="[net] received pingresp" package=gost.server.mqtt
time="2018-11-30T08:54:07Z" level=info msg="[pinger] pingresp not received, disconnecting" package=gost.server.mqtt
time="2018-11-30T08:54:07Z" level=info msg="[net] error triggered, stopping" package=gost.server.mqtt
time="2018-11-30T08:54:07Z" level=warning msg="MQTT client lost connection: pingresp not received, disconnecting" package=gost.server.mqtt
time="2018-11-30T09:04:07Z" level=info msg="[pinger] keepalive sending ping" package=gost.server.mqtt
time="2018-11-30T09:04:07Z" level=info msg="[net] received pingresp" package=gost.server.mqtt

Here is the relevant excerpt from Mosquitto.log:

cat mosquitto_ts.log | grep -i 'PING' | grep -i 'lumiere'

2018-11-30 07:36:02 Received PINGREQ from gost_lumiere
2018-11-30 07:36:02 Sending PINGRESP to gost_lumiere
2018-11-30 07:42:02 Received PINGREQ from gost_lumiere
2018-11-30 07:42:02 Sending PINGRESP to gost_lumiere
2018-11-30 07:52:02 Received PINGREQ from gost_lumiere
2018-11-30 07:52:02 Sending PINGRESP to gost_lumiere
2018-11-30 07:57:02 Received PINGREQ from gost_lumiere
2018-11-30 07:57:02 Sending PINGRESP to gost_lumiere
2018-11-30 08:02:02 Received PINGREQ from gost_lumiere
2018-11-30 08:02:02 Sending PINGRESP to gost_lumiere
2018-11-30 08:07:02 Received PINGREQ from gost_lumiere
2018-11-30 08:07:02 Sending PINGRESP to gost_lumiere
2018-11-30 08:12:02 Received PINGREQ from gost_lumiere
2018-11-30 08:12:02 Sending PINGRESP to gost_lumiere
2018-11-30 08:17:02 Received PINGREQ from gost_lumiere
2018-11-30 08:17:02 Sending PINGRESP to gost_lumiere
2018-11-30 09:04:07 Received PINGREQ from gost_lumiere
2018-11-30 09:04:07 Sending PINGRESP to gost_lumiere

It is strange that the "keepalive sending ping" from "2018-11-30T08:22:02Z" is not seen in Mosquitto.log, yet Paho seems to receive an answer:

time="2018-11-30T08:22:02Z" level=info msg="[net] received pingresp" package=gost.server.mqtt

Then there seem to be no more "keepalive sending ping"s sent by Paho until the disconnection at "2018-11-30T08:54:07Z".

Any ideas what might be going on here?

Thanks a lot and best regards

Jannis

@janniswarnat
Copy link
Author

janniswarnat commented Dec 6, 2018

Hi all,

we had some very weird network issues with a particular Docker network. After deleting and re-creating this Docker network the issue was gone.

Thanks for your help and best regards

Jannis

@peterdeka
Copy link

peterdeka commented Jan 29, 2019

This is still happening to me on current master over an LTE connection when the connectivity drops (we have default options fot pingTimeout and keepAlive). Could someone at least clarify if "disconnecting" means that the reconnect mechanism is stopped and we have to call Connect() again? Thanks @alsm

@kaiterramike
Copy link

@janniswarnat please don't close issues that only surface on "weird" networks. We're in a distributed systems world; if you run anything long enough, you'll run into weird issues.

I just saw the same thing on v1.1.1 while connecting to vernemq, with both vernemq and my client in two containers in the same kubernetes pod.

@peterdeka According to the code, "disconnecting" triggers an internal connection reset and reconnect as long as you set AutoReconnect = true in options (which should be the case if you used NewClientOptions()). Client.IsConnected() will still return true, but that's only because it's in the reconnecting state.

In more detail: the message is printed in ping.go's keepalive(), which writes an error to c.errors that's picked up in net.go's errorWatch, which calls client.internalConnLost() in a goroutine. In my case, I'm certain that's getting called exactly once because this shows on the next log line:

net.go:331: [net]      error triggered, stopping

The first place I'd check for a hang is at client.internalConnLost()'s call to c.workers.Wait(), which requires the goroutines errorWatch(), alllogic(), outgoing(), and incoming() to all exit or crash, otherwise it blocks indefinitely. @alsm Since this issue has come up quite a few times I'd suggest bumping up a couple log lines in reconnect() to INFO so you can at least tell the function is being entered.

This is mission-critical for us, so I'm going to build a liveness probe so kubernetes will automatically restart the client if we lose MQTT connectivity. The trigger has to be external -- I can't trust the MQTT client to close itself properly if it's in a bad state like this.

@peterdeka
Copy link

Thanks @originsmike we are experiencing the issue with NewClientOptions() and AutoReconnect = true, for this reason i was asking about it. I have to be honest: for what we are experiencing, it seems like the behavior is not consistent: sometimes the client reconnects, sometimes it wont reconnect, i can't confirm though that the pingresp error was present in all the cases. However the network problem was the same.

@0x8f701
Copy link

0x8f701 commented Sep 17, 2020

I experienced the same error with paho.mqtt.golang with emqx, can this be re-opened?

@MattBrittan
Copy link
Contributor

Hi @GopherJ - this code has been rewritten since this issue was raised (almost two years ago) so it's likely that your issue differs from the above. Please try with the latest code (go get github.com/eclipse/paho.mqtt.golang@master and if the issue persists raise a new issue (with relevant code, logs etc) and I'll see if I can identify the cause. Note: Often this kind of issue is caused by the users code blocking whilst in a handler (e.g. not returning from an message handler or publishing a new message from within a message handler) so please check for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants