Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault when Network is Down #1275

Closed
FrankSHLi opened this issue Oct 18, 2019 · 15 comments
Closed

Segmentation Fault when Network is Down #1275

FrankSHLi opened this issue Oct 18, 2019 · 15 comments
Assignees
Labels

Comments

@FrankSHLi
Copy link
Contributor

FrankSHLi commented Oct 18, 2019

Development Machine, OS, Compiler (and Other Relevant Toolchain Info)

Debian 9

SDK Version (Please Give Commit SHA if Manually Compiling)

Release 2019-10-07

Protocol

MQTT

Describe the Bug

The sample code iothub_ll_telemetry_sample.c segmentation fault if the network is down.


**Console Logs**

Creating IoTHub Device handle
Sending message 1 to IoTHub
Info: Failed DNS lookup for XXX.azure-devices.net: -2
Segmentation fault
@Patrik-Berglund
Copy link

Patrik-Berglund commented Oct 20, 2019

Hi,

What we have seen so far is that there are a couple of bugs introduced in the 2019-10-07 release for mbedtls, mqtt and berkley sockets combination. Haven't had time to file all issues yet due to other development tasks, so we are sticking with the 2019-07-01 release.

Here's one of them though,
#1202

Have you tested the 2019-07-01 release, for us that is working as expected.

@FrankSHLi
Copy link
Contributor Author

Hi @Patrik-Berglund ,

I've already rolled back to 2019-07-01 release as soon as I found the reconnect issue, to make sure our applications won't crash.

However, OPTION_RETRY_INTERVAL_SEC isn't yet introduced in 2019-07-01, so I'd still like to know if there's any progress, thanks.

@aog2000a
Copy link

aog2000a commented Oct 30, 2019

Same problem here, running 1.3.5 using MQTT on Linux/Arm i hit this segfault when the network goes down. Same as the previous commenter, i switched to 1.3.5 due to OPTION_RETRY_INTERVAL_SEC.

@aog2000a
Copy link

aog2000a commented Oct 30, 2019

In case it helps, here a backtrace of the fault:

(gdb) bt
#0 initiate_socket_connection (socket_io_instance=socket_io_instance@entry=0xb5e00eb0)
at .../azure-c-shared-utility-git/adapters/socketio_berkeley.c:300
#1 0xb6f9b64a in lookup_address_and_initiate_socket_connection (
on_io_open_complete=0xb6f98b31 <on_underlying_io_open_complete>,
on_io_open_complete_context=0xb5e00798,
on_bytes_received=0xb6f98b91 <on_underlying_io_bytes_received>,
on_bytes_received_context=0xb5e00798, on_io_error=0xb6f984e1 <on_underlying_io_error>,
on_io_error_context=0xb5e00798)
at .../azure-c-shared-utility-git/adapters/socketio_berkeley.c:798
#3 0xb6f90a46 in xio_open (xio=, on_io_open_complete=,
on_io_open_complete_context=on_io_open_complete_context@entry=0xb5e00798,
on_bytes_received=,
on_bytes_received_context=on_bytes_received_context@entry=0xb5e00798,
on_io_error=0xb6f984e1 <on_underlying_io_error>,
on_io_error_context=on_io_error_context@entry=0xb5e00798)
at .../azure-c-shared-utility-git/src/xio.c:88
#4 0xb6f98f5c in tlsio_openssl_open (tls_io=0xb5e00798,
on_io_open_complete=, on_io_open_complete_context=,
on_bytes_received=, on_bytes_received_context=0xb5e00cf0,
on_io_error=0xb6f8c071 , on_io_error_context=0xb5e00cf0)
at .../azure-c-shared-utility-git/adapters/tlsio_openssl.c:1255
#5 0xb6f90a46 in xio_open (xio=, on_io_open_complete=,
on_io_open_complete_context=on_io_open_complete_context@entry=0xb5e00cf0,
on_bytes_received=,
on_bytes_received_context=on_bytes_received_context@entry=0xb5e00cf0,
on_io_error=0xb6f8c071 ,
on_io_error_context=on_io_error_context@entry=0xb5e00cf0)
at .../azure-c-shared-utility-git/src/xio.c:88
#6 0xb6f8cf02 in mqtt_client_connect (handle=0xb5e00cf0, xioHandle=,
mqttOptions=mqttOptions@entry=0xb2bfec48)
at .../azure-iot-mqtt-git/src/mqtt_client.c:1039
#7 0xb6f87e92 in SendMqttConnectMsg (transport_data=0xb5e00b88)
at .../azure-iot-sdk-c-git/iothub_client/src/iothubtransport_mqtt_common.c:2286
#8 InitializeConnection (transport_data=0xb5e00b88)
at .../azure-iot-sdk-c-git/iothub_client/src/iothubtransport_mqtt_common.c:2339
#9 IoTHubTransport_MQTT_Common_DoWork (handle=0xb5e00b88)
at .../azure-iot-sdk-c-git/iothub_client/src/iothubtransport_mqtt_common.c:3101
#10 0xb6f798ba in IoTHubClientCore_LL_DoWork (iotHubClientHandle=0xb5e00668)

@rajaggrawal
Copy link
Contributor

c-utility/adapters/socketio_berkeley.c
static int initiate_socket_connection(SOCKET_IO_INSTANCE* socket_io_instance)
{
if(!dns_resolver_is_lookup_complete(socket_io_instance->dns_resolver))
{
LogError("DNS did not resolve IP address");
result = MU_FAILURE;
}
else
{
addr = dns_resolver_get_addrInfo(socket_io_instance->dns_resolver);
connect_addr = addr->ai_addr;
connect_addr_len = sizeof(*addr->ai_addr);
result = 0;
}
}
addr is NULL when Network is down and DNS doesnt resolve.
crash due to below line of code.
connect_addr = addr->ai_addr

@ewertons
Copy link
Contributor

@FrankSHLi , @Patrik-Berglund, @aog2000a , @rajaggrawal , thank you for reporting this issue and for providing all the details!
A fix will be rolled out briefly.

@Patrik-Berglund
Copy link

Hi, there's a pull req for this in the azure-c-shared-utility. it's not submitted by me (or our company), Azure/azure-c-shared-utility#400

ewertons added a commit to Azure/azure-c-shared-utility that referenced this issue Nov 21, 2019
Relates to: Azure/azure-iot-sdk-c#1275

There were two instances of possible failures not properly handled on socketio related to dns_resolver:
- The result of dns_resolver_create() was not being checked for NULL (which could be caused by memory allocation failure);
- The result of dns_resolver_get_addrInfo() was not being checked for NULL, which could happen even if dns_resolver_is_lookup_complete returned true.

These failures affected socketio_berkeley.c and socketio_win32.c.
The currently remaining socketio implementations were not affected, since they do not use dns_resolver (socketio_mbed.c, socketio_mbed_os5.c).

Thanks @mlilien for the initial PR to fix this issue.
@ewertons
Copy link
Contributor

Thanks @Patrik-Berglund , we used that PR as part of the fix.
The new PR is out with the complete fix, and will be merged and integrated into azure-iot-sdk-c repo soon.

@rajaggrawal
Copy link
Contributor

when i know when this would be integrated . Any Approximate date ?.
this is a critical/blocker for us.

@ewertons
Copy link
Contributor

Hi @rajaggrawal , the fixed version of azure-c-shared-utility will be integrated to this repo today, 11/22/2019. Packages will be available at a later date.

@rajaggrawal
Copy link
Contributor

@ewertons let me know if I can pull the changes if its available.

@ewertons
Copy link
Contributor

@rajaggrawal , apologies for the delay. There was an additional change to complete the fix and we are working on it right now. We will update you briefly of the status.

@ewertons
Copy link
Contributor

Hi @rajaggrawal , the fix is in master now.
Could you please verify on your end that you no longer see the crash?

@ewertons
Copy link
Contributor

We will close this issue as a fix has been checked in.
Please feel free to reopen it if you need to follow up.
Thanks for contributing to the Azure IoT SDKs.

@az-iot-builder-01
Copy link
Collaborator

@Patrik-Berglund, @aog2000a, @rajaggrawal, @ewertons, thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants