Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C sdk crash when sending D2C messages continuously . #1603

Closed
rajaggrawal opened this issue Jul 28, 2020 · 12 comments
Closed

C sdk crash when sending D2C messages continuously . #1603

rajaggrawal opened this issue Jul 28, 2020 · 12 comments

Comments

@rajaggrawal
Copy link
Contributor

i am trying to send 1,00,000 device to cloud message continuously in loop. but sdk crash intermittently.

following SendD2CMessage sends device to cloud messages.

void SendD2CMessage(const std::string& message, const std::string& tagName, const std::string& tagValue)
{
if (nullptr == m_iotHubHandle) {
Log::error(LOG_TAG, "IoT hub handle is not initialised");
throw std::runtime_error("IoT hub handle is not initialised");
}
auto messageHandle = IoTHubMessage_CreateFromString(message.data());

if (messageHandle == nullptr) {
    Log::error(LOG_TAG, "Unable to create a new IoTHubMessage");
    throw std::runtime_error("Unable to create a new IoTHubMessage");
}

if (!tagValue.empty() && !tagName.empty())
    (void)IoTHubMessage_SetProperty(messageHandle, tagName.c_str(), tagValue.c_str());

if (IoTHubDeviceClient_SendEventAsync(m_iotHubHandle, messageHandle, [](IOTHUB_CLIENT_CONFIRMATION_RESULT result, 
 void *userContextCallback) 
  {
        (void)userContextCallback;
        Log::debug(LOG_TAG, "updateMessage : %d \n", result);
   },nullptr) != IOTHUB_CLIENT_OK) {
    Log::error(LOG_TAG, "Failed to hand over the message to IoT Hub ");
}
IoTHubMessage_Destroy(messageHandle);

}

if i call this method in for loop then its crash.
for eg.
for(int i =0 ; i< 100000,; i++)
{
SendD2CMessage("Message","Tag", "Value");
}

Azure SDK Release : 2019-12-11

please find the following backtrace.

[New LWP 13599]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./iotagent'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI___libc_free (mem=0x10ec8348e5894855) at malloc.c:3103
3103 malloc.c: No such file or directory.
[Current thread is 1 (Thread 0x7fc7ff310700 (LWP 13601))]
(gdb) bt
#0 __GI___libc_free (mem=0x10ec8348e5894855) at malloc.c:3103
#1 0x00007fc806c57922 in Map_Destroy (handle=0x55f6cce604b0) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/c-utility/src/map.c:51
#2 0x00007fc8076cf376 in DestroyMessageData (handleData=0x55f6ce836480) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_message.c:91
#3 0x00007fc8076d16cf in IoTHubMessage_Destroy (iotHubMessageHandle=0x55f6ce836480) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_message.c:1079
#4 0x00007fc8076c7c19 in IoTHubClientCore_LL_SendComplete (completed=0x7fc7ff30fc40, result=IOTHUB_CLIENT_CONFIRMATION_MESSAGE_TIMEOUT, ctx=0x55f6ccbae110)
at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_client_core_ll.c:514
#5 0x00007fc8074a7be1 in sendMsgComplete (iothubMsgList=0x55f6cce60680, transport_data=0x55f6ccba0000, confirmResult=IOTHUB_CLIENT_CONFIRMATION_MESSAGE_TIMEOUT)
at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c:599
#6 0x00007fc8074abcf3 in process_queued_ack_messages (transport_data=0x55f6ccba0000) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c:2059
#7 0x00007fc8074aea5e in IoTHubTransport_MQTT_Common_DoWork (handle=0x55f6ccba0000) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransport_mqtt_common.c:3181
#8 0x00007fc8074b009a in IoTHubTransportMqtt_WS_DoWork (handle=0x55f6ccba0000) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothubtransportmqtt_websockets.c:171
#9 0x00007fc8076cbe24 in IoTHubClientCore_LL_DoWork (iotHubClientHandle=0x55f6ccbae110) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_client_core_ll.c:2084
#10 0x00007fc8076c2d92 in ScheduleWork_Thread (threadArgument=0x55f6ccba7590) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/iothub_client/src/iothub_client_core.c:812
#11 0x00007fc806c65a92 in ThreadWrapper (threadInstanceArg=0x55f6ccbb12f0) at /home/raj/GITBranches/CpuUsageDebugging/agent/azure-iot-sdk-c/c-utility/adapters/threadapi_pthreads.c:35
#12 0x00007fc804e6d6db in start_thread (arg=0x7fc7ff310700) at pthread_create.c:463
#13 0x00007fc807a05a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

could you please check this on priority ?

@ewertons
Copy link
Contributor

Hi @rajaggrawal ,

the memory being freed is in a path owned by the SDK, so that would be an issue to investigate right away.
However, we do have long haul tests and they haven't failed with memory leaks or corruption.

To help us assess the issue, could you run your application under valgrind, and share the details please? It would be better to have it attached as a file, since you might get a lot of prints.

Build the SDK with debug data and run the application with valgrind as in:

valgrind --leak-check=full --track-origins=yes ./your_application

At the end you should get a report from valgrind. Please do share it with us so we can assist.

@rajaggrawal
Copy link
Contributor Author

Hi @ewertons

please find the attached report.

Thanks
Raj
Valgrind report.zip

@rajaggrawal
Copy link
Contributor Author

Hi @ewertons ,

any update on this ?

Thanks

@markrad
Copy link
Member

markrad commented Aug 26, 2020

Hi @rajaggrawal

When you send your 1,000,000 messages do you have any type of delay in the loop? If you are sending them as fast as the CPU can process them then you will overwhelm your network connection. The TCP layer will not be able to send the messages fast enough which will cause the messages in the SDK to be buffered until you run out of memory.

@rajaggrawal
Copy link
Contributor Author

Hi @markrad,

Thanks for your response.
yes , there is 1ms delay between each messages.
but incase TCP layer is not able be to send the messages then azure should give specific error to application, so application can slow down the sending of messages.
crash should not be expected behaviour.

@markrad
Copy link
Member

markrad commented Aug 27, 2020

Hi @rajaggrawal

Though I agree with you that a crash is not ideal my point is that, even if the crash were to be fixed, you will still not be able to send a message every millisecond without overwhelming your network connection. I would try experimenting with longer delays to see what rate your network is able to keep up with. If your requirement is to deliver these messages in a certain time span then you will likely need to look into batching multiple individual messages into a single message to the IoT hub and unbatching them at some point later. My calculations suggest that sending 1,000,000 messages with a 10ms delay between each will take approximately 2 hours 45 minutes.

@ericwolz
Copy link
Contributor

There is no limit on memory usage of the internal queue and memory exhaustion can occur. Message transmission can be impacted by several factor, including network speed, device memory size and message QOS level. You can use the message confirmation callback to track when packets are removed from our queue (successfully transmitted or not).

@rajaggrawal
Copy link
Contributor Author

we are already using message confirmation callback and retrying the messages which is failed.
how a application will know at what speed telemetry messages can be send from device ?

@Ozaq
Copy link

Ozaq commented Sep 18, 2020

I work on an application that sends messages in high volume and I would like to know if this issue is being worked on. Independently from the general validity of this case, i.e. can the amount of message actually be send w.o. getting into an OOM condition, there is still a segfault. The Valgrind output reads Segmentation fault (core dumped) unless the OOM killer has been explicitly disabled the process should have received a SIGKILL and should just have "disappeared". So sure the scenario is "not useful" but as a user I am a bit dismayed by the seemingly dismissal so far.

@rajaggrawal
Copy link
Contributor Author

rajaggrawal commented Dec 14, 2020

I am still seeing crash, now i am sending one message in every 20ms (instead of 1ms) in infinite loop.
@markrad , can you please let me know when this will be resolved ?
or is there any other way to get rid from this crash.

@markrad
Copy link
Member

markrad commented Dec 14, 2020

@rajaggrawal you will need to first reproduce the problem with the latest release of the C SDK. If you are able to do so then please post the logs and the code you are using this allowing us to attempt a local reproduction of your problem.

@danewalton-msft
Copy link
Member

I am going to go ahead and close this due to inactivity for now. Please let us know if you would like it reopened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants