-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection time out causes Segmentation fault in iothubtransport_mqtt_common module #446
Comments
I'm having the same segfault occasionally using MQTT and iot-edge v1 Azure/iot-edge-v1#520 |
Hi @villepalo , |
If you generate a memory dump it will most likely contain PII, so it would be recommended to send it to us on a secure channel. We can provide one but a support ticket will be necessary so our Support Team can reach you. If you prefer to do that way, please open a ticket using the Azure portal. |
I'm trying get the core dump, but this segfault happens so seldom on our environment, that it could take some time |
Here is core dump details |
Thanks, but that's a callstack only, and not from a debug build. |
Our environment where this happens once a day:
Unfortunately getting debug crash dump is not possible. |
Is there anyway you could run this under a debugger - ideally with a debug build, but if there's timing issues a release would be helpful too - and see if you can get a repro with some additional info? We really want to get the bottom of this but we're definitely going to need some additional info. Thanks |
Correct me if I'm wrong, but isn't so that in iothubtransport_mqtt_common.c:DisconnectFromClient xioTransport is destroyed and memory freed, but in mqtt_client.c:mqtt_client_disconnect mqtt_client->xioHandle is set to NULL only if mqtt_client->clientConnected is true. And that handle is used later in the xio_dowork called from mqtt_client_dowork? |
I made modification to mqtt_client.c module's mqtt_client_disconnect() function and set mqtt_client->xioHandle to NULL also on (if mqtt_client->clientConnected) else branch and that fixes the core dump problem. |
It seems my issue (#459) is identical to this one and it's great that a solution seems to have been found. I can easily reproduce this issue on my platform and I think I can even force it by setting up a blocking rule in the right moment. First I will run my program with valgrind, it should show the "use after free". Or is this a threading issue? Then one might see something with a thread-sanitizer. |
I'm pretty sure that this isn't threading issue. I'm also pretty sure that this segfault happens every time if "mqtt_client timed out waiting for CONNACK" line gets logged. |
Valgrind will tell the truth. |
To trig the problem I'm forcing the mqtt-client to think that it timed out on the connection: // iothub_client/src/iothubtransport_mqtt_common.c:1845
else if (transport_data->mqttClientStatus == MQTT_CLIENT_STATUS_CONNECTING)
{
//tickcounter_ms_t current_time;
//if (tickcounter_get_current_ms(transport_data->msgTickCounter, ¤t_time) != 0)
//{
// LogError("failed verifying MQTT_CLIENT_STATUS_CONNECTING timeout");
// result = __FAILURE__;
//}
//else if ((current_time - transport_data->mqtt_connect_time) / 1000 > transport_data->connect_timeout_in_sec)
//{
LogError("mqtt_client timed out waiting for CONNACK");
DisconnectFromClient(transport_data);
result = 0;
//}
} I'm getting a segfault systematically now. |
And adding mqtt_client->xioHandle = NULL; to the else branch of @hhirvonen did you issue a pull-request? |
No I didn't. I just try it on my own PC. |
It'll be nice to have a non-regression along with the fix. Having said that, I wouldn't know how to write one here. |
I created a pull request about the issue: Azure/azure-umqtt-c#18 But a thing I do not understand is why this connecting timeout is not considered as a failure:
Setting result to FAILURE would solve this segFault, wouldn't it? |
This should be fixed now in the current master (with updated submodules). Coud you verify this? |
Yes, I'll verify it. I let you know the results hopefully during this week. |
Yes, It's working and connection time out does not cause Segmentation fault anymore. |
Hello,
Backround:
My network firewall blocks AMQP port and that causes delay on edgeHub connection procedure. After 1 minute connection time out it changes connection protocol to AMQP over WebSocket.
Meanwhile in my edgeModule connection time our occurs and it crashes in azure-iot-sdk-c library's iothubtransport_mqtt_common module.
edgeHub container logs:
2018-04-05 10:29:52.703 +00:00 [INF] - Attempting to connect to IoT Hub for client xxx-xxx via AMQP...
2018-04-05 10:30:52.794 +00:00 [INF] - Attempting to connect to IoT Hub for client xx-xxx via AMQP over WebSocket...
2018-04-05 10:30:52.990 +00:00 [INF] - New token requested by client xxx-xxx, but using existing token as it is usable.
2018-04-05 10:30:53.605 +00:00 [INF] - Connected to IoT Hub for client xxx-xxx via AMQP over WebSocket, with client operation timeout 60000.
2018-04-05 10:30:53.829 +00:00 [INF] - Set subscriptions from session state for xxx-xxx on cloud reconnect
edgeModule log:
-> 10:29:51 CONNECT | VER: 4 | KEEPALIVE: 240 | FLAGS: 192 | USERNAME: Edge-Test-Hub.azure-devices.net/xxx-xxx/api-version=2016-11-14&DeviceClientType=iothubclient%2f1.2.2%20(native%3b%20Linux%3b%20x86_64) | PWD: XXXX | CLEAN: 0
Error: Time:Thu Apr 5 10:30:19 2018 File:/build/azure-iot-sdk-c-MaKnm3/azure-iot-sdk-c-0.2.0.0/iothub_client/src/iothubtransport_mqtt_common.c Func:InitializeConnection Line:1865 mqtt_client timed out waiting for CONNACK
Segmentation fault (core dumped)
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
azure-iot-sdk-c release 2018-04-02
Same problem exist also previous release
br Hannu
The text was updated successfully, but these errors were encountered: