-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paho disconnected silently without calling onConnLost handler #453
Comments
I've read #263 #430. This error happens with cloud broker (emqx 4.1 & emqx 4.2), it just happens when I use paho.mqtt.golang to connect broker (mosquitto bridging correctly to broker) paho.mqtt.golang works correctly while restarting, but after some time it starts to lose connections. Then Details:
I think the error is on paho.mqtt.golang side because if I use mosquitto as client it works correctly, there isn't lost of connection |
I found also if I remove ssl and put the client code on the cloud server and then connect to I'm not sure if the error is caused by |
Hi @GopherJ, Unfortunately your description and code does not provide much information to work with. For example you make no mention of specific errors. Can you please:
Note: One thing that would cause the symptoms you explain is if there are two connections with the same client id. Are you sure that Note2: I doubt that your SSL certificate has anything to do with this (it will either work or not work). SSL does add some overhead which may have an impact if this is a weird timing issue or problem handling errors during the SSL handshake (there is a pull request re that but I think that is more an issue of an incorrect error being returned rather than no error). Thanks, |
Hi, sorry my fault I'll do it today once got time. BTW I checked emqx's dashboard the clientid is isn't the same, but yes I'll try first improving this part. |
Hi @MattBrittan I can always reproduce by running a simple program for a morning. Could you help checking why it happens and how to fix it? I don't have this problem with mosquitto, I'm 100% sure it's related to paho.mqtt.golang.
we switched from However, starting from this change we started to have this issue |
BTW I confirm that I'm using latest |
You haven't provided the demo program yet, but from looking at the code you posted initally are you sure that this func onMessageReceived(client MQTT.Client, message MQTT.Message) {
sharedChannel := shared.NewSharedChannel()
sharedChannel.MqttSubscriber <- message
} is working as you intend and that something is reading the messages off the channel you create? If that function is blocking and not returning, or getting into a state like that it could block up the receiving of packets and pingresps would not be processed. |
@alsm I don't think there is anything blocks. It's completely different goroutines... @alsm @MattBrittan Pls there is already #429, #430, #263 ... and...
If all of this cannot make you think there is a problem, I don't know what to say...as I said I don't have other code to show you, it's really just a simple subscribe, what else can I do?... The version that I'm using is |
Updated to latest and am now also experiencing this. Definitely something going on. |
@GopherJ unfortunately unless you can provide the information requested it's unlikely that we will be able to help you; this package provides a lot of options and its quite possible that you are using a combination that no one else is (so others may well not see what you are seeing). The issues you have referenced are either old (the code having been rewritten), most likely issues with the users code, or have insufficient information to determine the cause. The issue that @alsm raised (and I had mentioned previously) is probably the thing that most frequently causes the symptoms you are experiencing (because the go-routines are communicating through a channel they can block each other). There have been some major changes since version 1.2.0 and I hope we will be able to make a new release shortly so for now I recommend using @someburner please provide more information (preferably with logs). What version did you come from? There was a change some time back that removed the buffers from a number of channels and exposed issues with users code that were previously hidden (e.g. blocking in a callback, possibly by calling |
@MattBrittan just to update- this was happening with mosquitto at a high load (10K devices and around 10K msg/sec). Switching to another broker that handles the throughput better and I no longer see this, it was probably unrelated as you say. I know this is outside the scope of this project but in case others run across this, or someone has experience here, there seems to be issues with a QoS1 subscriber receiving high message loads from mosquitto, seems like mosquitto does not get the client puback. It could be that I'm doing something wrong but emqx and jorramq seem to handle the high loads better without my code changing. |
@someburner which commit are you using? PR #443 was committed a couple of weeks ago and prevents this package from reusing message ID's immediately. I had an issue where Mosquitto was dropping messages and originally implemented this as a way to trace the issue (tracing things in the Mosquitto logs is difficult when ID's are constantly reused) but the change resolved the issue; my theory, without any real evidence, was that Mosquitto may not be clearing message ID's from it's store immediately so drops some because they seemingly have duplicate IDs (the issue only occurred when sending multiple messages per second). There is a slight possibility that your issue is related (i.e. the ping messages get dropped); I did have a scan through the Mosquitto source but could not see anything obvious there (but my C skills are a bit rusty!) and as the problem had gone away moved onto other things. |
I have been habitually doing For now the library seems to work fine with my high load tests on other brokers but I'll definitely let you know if anything comes up. Thanks for being a responsive maintainer! I have been using master without issue for most of these tests so at least for my use case I'd say it's solid. |
this error is happening on our production site very dangerously, since paho.mqtt.golang team doesn't accept any errors on their side and doesn't want to inverstigate any lines of code and always think we are using old versions even we are on master already...., I suggest people who use paho.mqtt.golang being cautious because this error may happen to you. I'll rewrite this part in other libraries. (paho.mqtt.c/paho.mqtt.rust/MQTT.js) |
@GopherJ As far as I can see, from a review of the comments, no-one has said that there is not an issue; what we have said is that we need you to provide information that would enable us to identify/trace the problem (and pointed out a possible issue with your code that may explain the issue e.g. if At this point we do not have enough information to investigate your issue further (and cannot even ascertain if the problem is with this package or elsewhere). If you provide the logs (and ideally code that enables the issue to be duplicated) then the issue will be investigated; please bear in mind that work on this project is carried out by volunteers so making an effort to respond to their questions is likely to lead to a faster resolution. |
@someburner just thought it might be worth referencing issue #458 because your issue may have had the same cause. In summary the mosquitto option |
Now I can see paho.mqtt.golang cannot recover subscriptions from a disconnect, after running for a half day or a day I can always reproduce this. I have Before rewriting evething now I need to kill the process once disconnected |
What led you to believe it would? The broker should retain subscriptions as part of the session state in defined circumstances (see the clean session section). If your broker is not retaining the subscriptions then I'd recommend you check the spec (to ensure that you meet the criteria under which the session state should be retained) and your brokers settings (not all brokers comply with the spec). If you want to re-establish subscriptions then you will need to do it yourself (example here). Note that if the broker is not retaining the session state then you are likely to loose messages when you reconnect (because anything published while the connection is down becomes part of the session state). So if that is important to you then fixing the broker setup should be a priority. Matt |
@MattBrittan as you can see from my previous comments:
So I don't know why do I need to paho,mqtt.golang is the worst MQTT client library that I've used, with a maintainer who is keeping suggesting I can understand it's difficult to maintain an open source project during free time, but |
This error starts to happen on our company's another golang project... |
@GopherJ as discussed previously debugging information will be required if we are to offer you any help (please see my previous comments or the readme for examples of the information that is useful). This issue is most commonly due to the callback passed to the 'Subscribe' function blocking (so within your control rather than part of this library) - I'm not saying that this is your issue but that is the first thing I would suggest checking (starting off a goroutine, as suggested in my earlier comment, is a quick way to eliminate this as a cause). |
@GopherJ have you ever found the solution? I face the same problem too, with mosquitto 2.0.0, paho 1.3.0. |
@yunfuyiren @MattBrittan Hi, I got some understanding on this issue. It's caused by paho.mqtt.golang's limitation + golang's channel issue. I have around 300 IOT devices broadcasting to broker and sometimes there is a massive message stream. I use channel to send messages to other goroutines for handling However, if there are two much messages, channel will block and eventually cause paho.mqtt.golang's disconnection, because if you see in FAQ, paho.mqtt.golang doesn't want block code in message handler. I didn't solve this issue yet and it's happening in many of our projects. It's hard to say in which condition paho.mqtt.golang will lose connection and never reconnect back. What is the definition of block code? it even happens to a small project which basically does just pretty simple work in message handler. I feel this part is out of my countrol, paho.mqtt.golang should allow users to detect this issue, otherwise it can be dramatic. Lose connection + no way to detect drived me crazy during the last month, now I'm on other projects so I put it aside. |
Thanks for the extra detail @GopherJ.
By default ( The issue you are seeing appears (with the limited info available) to be due to a handler blocking for around 10 seconds (so a considerable period). Because the library does not process incoming messages whilst waiting on the handler it does not see the ping response from the broker and ends up disconnecting because of that. So blocking really means any action that unduly delays your handler from returning. In the code you shared a while ago
Please provide the code for this simple example; as mentioned previously if I can duplicate the issue then I can probably fix it (or point out where the issue is in your code if it's not in the library). The library uses quite a few go-routines and their interactions can be complex which makes it hard to spot an issue without logs to point you in the right direction (issue #468 is an example of such an issue, it occurred in a very rare set of circumstances but due to the logs provided I believe a solution has been found).
Unfortunately without access to your code, or the detailed logs that the library can produce, I cannot really know what the cause of the issue you are encountering is. @yunfuyiren please review the common problems section of the readme and confirm that nothing in there applies. If that does not help then please raise a new issue (noting the reporting bugs section); the more information you can provide the more likely that someone will be able to help. A specific note based on the limited info you provided; try enabling keep-alives (this is needed due to the way TCP works). |
Closing this issue as it's been quiet for three months. |
Happening still to me even with latest |
Aha! It was a Golang gotcha client.ClientOptions.SetPingTimeout(time.Duration(config.MQTT.Timeout) * time.Second)
client.ClientOptions.SetKeepAlive(time.Second * 30) had to multiply the duration by the unit I wanted. Thought I implicitly got seconds. |
The text was updated successfully, but these errors were encountered: