C sdk crash when sending D2C messages continuously . #1603

rajaggrawal · 2020-07-28T14:32:08Z

i am trying to send 1,00,000 device to cloud message continuously in loop. but sdk crash intermittently.

following SendD2CMessage sends device to cloud messages.

void SendD2CMessage(const std::string& message, const std::string& tagName, const std::string& tagValue)
{
if (nullptr == m_iotHubHandle) {
Log::error(LOG_TAG, "IoT hub handle is not initialised");
throw std::runtime_error("IoT hub handle is not initialised");
}
auto messageHandle = IoTHubMessage_CreateFromString(message.data());

if (messageHandle == nullptr) {
    Log::error(LOG_TAG, "Unable to create a new IoTHubMessage");
    throw std::runtime_error("Unable to create a new IoTHubMessage");
}

if (!tagValue.empty() && !tagName.empty())
    (void)IoTHubMessage_SetProperty(messageHandle, tagName.c_str(), tagValue.c_str());

if (IoTHubDeviceClient_SendEventAsync(m_iotHubHandle, messageHandle, [](IOTHUB_CLIENT_CONFIRMATION_RESULT result, 
 void *userContextCallback) 
  {
        (void)userContextCallback;
        Log::debug(LOG_TAG, "updateMessage : %d \n", result);
   },nullptr) != IOTHUB_CLIENT_OK) {
    Log::error(LOG_TAG, "Failed to hand over the message to IoT Hub ");
}
IoTHubMessage_Destroy(messageHandle);

}

if i call this method in for loop then its crash.
for eg.
for(int i =0 ; i< 100000,; i++)
{
SendD2CMessage("Message","Tag", "Value");
}

Azure SDK Release : 2019-12-11

please find the following backtrace.

[New LWP 13599]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./iotagent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI___libc_free (mem=0x10ec8348e5894855) at malloc.c:3103
3103 malloc.c: No such file or directory.
[Current thread is 1 (Thread 0x7fc7ff310700 (LWP 13601))]
(gdb) bt
#0 __GI___libc_free (mem=0x10ec8348e5894855) at malloc.c:3103
#1 0x00007fc806c57922 in Map_Destroy (handle=0x55f6cce604b0) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/c-utility/src/map.c:51
#2 0x00007fc8076cf376 in DestroyMessageData (handleData=0x55f6ce836480) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_message.c:91
#3 0x00007fc8076d16cf in IoTHubMessage_Destroy (iotHubMessageHandle=0x55f6ce836480) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_message.c:1079
#4 0x00007fc8076c7c19 in IoTHubClientCore_LL_SendComplete (completed=0x7fc7ff30fc40, result=IOTHUB_CLIENT_CONFIRMATION_MESSAGE_TIMEOUT, ctx=0x55f6ccbae110)
at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_client_core_ll.c:514
#5 0x00007fc8074a7be1 in sendMsgComplete (iothubMsgList=0x55f6cce60680, transport_data=0x55f6ccba0000, confirmResult=IOTHUB_CLIENT_CONFIRMATION_MESSAGE_TIMEOUT)
at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c:599
#6 0x00007fc8074abcf3 in process_queued_ack_messages (transport_data=0x55f6ccba0000) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c:2059
#7 0x00007fc8074aea5e in IoTHubTransport_MQTT_Common_DoWork (handle=0x55f6ccba0000) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c:3181
#8 0x00007fc8074b009a in IoTHubTransportMqtt_WS_DoWork (handle=0x55f6ccba0000) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransportmqtt_websockets.c:171
#9 0x00007fc8076cbe24 in IoTHubClientCore_LL_DoWork (iotHubClientHandle=0x55f6ccbae110) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_client_core_ll.c:2084
#10 0x00007fc8076c2d92 in ScheduleWork_Thread (threadArgument=0x55f6ccba7590) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_client_core.c:812
#11 0x00007fc806c65a92 in ThreadWrapper (threadInstanceArg=0x55f6ccbb12f0) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/c-utility/adapters/threadapi_pthreads.c:35
#12 0x00007fc804e6d6db in start_thread (arg=0x7fc7ff310700) at pthread_create.c:463
#13 0x00007fc807a05a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

could you please check this on priority ?

The text was updated successfully, but these errors were encountered:

ewertons · 2020-07-28T19:28:54Z

Hi @rajaggrawal ,

the memory being freed is in a path owned by the SDK, so that would be an issue to investigate right away.
However, we do have long haul tests and they haven't failed with memory leaks or corruption.

To help us assess the issue, could you run your application under valgrind, and share the details please? It would be better to have it attached as a file, since you might get a lot of prints.

Build the SDK with debug data and run the application with valgrind as in:

valgrind --leak-check=full --track-origins=yes ./your_application

At the end you should get a report from valgrind. Please do share it with us so we can assist.

rajaggrawal · 2020-07-29T07:48:20Z

Hi @ewertons

please find the attached report.

Thanks
Raj
Valgrind report.zip

rajaggrawal · 2020-08-05T05:26:59Z

Hi @ewertons ,

any update on this ?

Thanks

markrad · 2020-08-26T18:01:49Z

Hi @rajaggrawal

When you send your 1,000,000 messages do you have any type of delay in the loop? If you are sending them as fast as the CPU can process them then you will overwhelm your network connection. The TCP layer will not be able to send the messages fast enough which will cause the messages in the SDK to be buffered until you run out of memory.

rajaggrawal · 2020-08-27T11:11:13Z

Hi @markrad,

Thanks for your response.
yes , there is 1ms delay between each messages.
but incase TCP layer is not able be to send the messages then azure should give specific error to application, so application can slow down the sending of messages.
crash should not be expected behaviour.

markrad · 2020-08-27T17:58:32Z

Hi @rajaggrawal

Though I agree with you that a crash is not ideal my point is that, even if the crash were to be fixed, you will still not be able to send a message every millisecond without overwhelming your network connection. I would try experimenting with longer delays to see what rate your network is able to keep up with. If your requirement is to deliver these messages in a certain time span then you will likely need to look into batching multiple individual messages into a single message to the IoT hub and unbatching them at some point later. My calculations suggest that sending 1,000,000 messages with a 10ms delay between each will take approximately 2 hours 45 minutes.

ericwolz · 2020-08-27T18:00:59Z

There is no limit on memory usage of the internal queue and memory exhaustion can occur. Message transmission can be impacted by several factor, including network speed, device memory size and message QOS level. You can use the message confirmation callback to track when packets are removed from our queue (successfully transmitted or not).

rajaggrawal · 2020-09-09T08:07:19Z

we are already using message confirmation callback and retrying the messages which is failed.
how a application will know at what speed telemetry messages can be send from device ?

Ozaq · 2020-09-18T12:36:44Z

I work on an application that sends messages in high volume and I would like to know if this issue is being worked on. Independently from the general validity of this case, i.e. can the amount of message actually be send w.o. getting into an OOM condition, there is still a segfault. The Valgrind output reads Segmentation fault (core dumped) unless the OOM killer has been explicitly disabled the process should have received a SIGKILL and should just have "disappeared". So sure the scenario is "not useful" but as a user I am a bit dismayed by the seemingly dismissal so far.

rajaggrawal · 2020-12-14T07:58:23Z

I am still seeing crash, now i am sending one message in every 20ms (instead of 1ms) in infinite loop.
@markrad , can you please let me know when this will be resolved ?
or is there any other way to get rid from this crash.

markrad · 2020-12-14T21:21:50Z

@rajaggrawal you will need to first reproduce the problem with the latest release of the C SDK. If you are able to do so then please post the logs and the code you are using this allowing us to attempt a local reproduction of your problem.

danewalton-msft · 2021-04-13T02:00:43Z

I am going to go ahead and close this due to inactivity for now. Please let us know if you would like it reopened

ewertons self-assigned this Jul 28, 2020

ewertons added the investigation-required label Jul 28, 2020

danewalton-msft closed this as completed Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C sdk crash when sending D2C messages continuously . #1603

C sdk crash when sending D2C messages continuously . #1603

rajaggrawal commented Jul 28, 2020

ewertons commented Jul 28, 2020

rajaggrawal commented Jul 29, 2020

rajaggrawal commented Aug 5, 2020

markrad commented Aug 26, 2020

rajaggrawal commented Aug 27, 2020

markrad commented Aug 27, 2020

ericwolz commented Aug 27, 2020

rajaggrawal commented Sep 9, 2020

Ozaq commented Sep 18, 2020 •

edited

Loading

rajaggrawal commented Dec 14, 2020 •

edited

Loading

markrad commented Dec 14, 2020

danewalton-msft commented Apr 13, 2021

C sdk crash when sending D2C messages continuously . #1603

C sdk crash when sending D2C messages continuously . #1603

Comments

rajaggrawal commented Jul 28, 2020

ewertons commented Jul 28, 2020

rajaggrawal commented Jul 29, 2020

rajaggrawal commented Aug 5, 2020

markrad commented Aug 26, 2020

rajaggrawal commented Aug 27, 2020

markrad commented Aug 27, 2020

ericwolz commented Aug 27, 2020

rajaggrawal commented Sep 9, 2020

Ozaq commented Sep 18, 2020 • edited Loading

rajaggrawal commented Dec 14, 2020 • edited Loading

markrad commented Dec 14, 2020

danewalton-msft commented Apr 13, 2021

Ozaq commented Sep 18, 2020 •

edited

Loading

rajaggrawal commented Dec 14, 2020 •

edited

Loading