Expose detailed transport-level information (e.g. server idle timeout, ...) and statistics #453

20100g · 2018-04-11T22:47:17Z

OS and version used: Yocto Linux 2.1.3 (kernel version 3.14.52)
SDK version used: 1.1.30 (release_2018-01-12)

Description of the issue:

When using a custom remote server idle timeout of 25 minutes (currently available on a per-hub basis via a Support Request as described in this comment Azure/azure-iot-sdk-csharp#46 (comment)) we don't have visibility or control on the actual frequency the keep-alives (empty frames) are sent for the AMQP protocol. We only control part of the equation which is the remote idle ping ratio.

In some rare cases, following a disconnect, the server seems to return the default remote server timeout (4 minutes), which means the actual keep alives frequency will be much higher than expected. This is problematic when devices connect to the IoT Hub using an expensive metered connection (e.g. cellular).

We already have a support request opened for this specific issue. Could you still look into exposing the remote idle timeout timeout at the SDK-level so we can take appropriate action (i.e force a reconnection if it is not the expected value) ? If we look at it from a broader perspective, exposing detailed transport-level information and statistics (connection uptime, bytes sent, ...) would also be useful.

Code sample exhibiting the issue:

// Setting keep-alive ratio is optional. If it is not set the default ratio of 1/2 will be used. 
double cl2svc_keep_alive_send_ratio = 7.0 / 8.0; 
// Client will send keep-alives to service at 210 second interval for a remote idle timeout of 4 minutes. 
// For a 25 minutes remote timeout, they will be sent every 21 minutes.
if (IoTHubClient_LL_SetOption(iotHubClientHandle, OPTION_REMOTE_IDLE_TIMEOUT_RATIO, &cl2svc_keep_alive_send_ratio) != IOTHUB_CLIENT_OK) 
{
    (void)printf("ERROR: IoTHubClient_remote_idle_timeout_ratio..........FAILED!\r\n");
}

The text was updated successfully, but these errors were encountered:

ewertons · 2018-04-17T17:20:52Z

Hi @BloodWine ,
could you elaborate on "In some rare cases, following a disconnect, the server seems to return the default remote server timeout (4 minutes)" ?
Do you have logs showing that behavior?

We do not have plans to expose that setting at this point, since the client SDK design is geared towards abstracting the application protocol used.

If you really see the AMQP keep-alive timeout being negotiated with a value different value than you set we should investigate that.

A quick reconnection test here shows the AMQP negotiation is happening as expected:

-> [OPEN]* {80cb4eff-97f3-4ebb-93a5-faa08d2281fa,iot-sdks-test.azure-devices.net,4294967295,65535,1234567000}
<- [OPEN]* {DeviceGateway_630e09892368410f9008af968504c9f3,10.0.0.61,65536,8191,240000,NULL,NULL,NULL,NULL,NULL}
// forced disconnection, reconnection occurs.
-> [OPEN]* {fa8cd024-503e-435d-83b4-08c98bddceaa,iot-sdks-test.azure-devices.net,4294967295,65535,1234567000}
<- [OPEN]* {DeviceGateway_630e09892368410f9008af968504c9f3,10.0.0.61,65536,8191,240000,NULL,NULL,NULL,NULL,NULL}

In the traces above, 1234567000 (a) is the maximum number of millisec the client is informing it will wait for keep-alives from the service. 240000 (b) is the timeout the service will adopt for the same scenario.
As shown above the values are negotiated the same way before and after the reconnection.
(a) was set in our sample run with:

size_t svc_keep_alive_freq_secs = 1234567;
IoTHubClient_LL_SetOption(iothub_ll_handle, OPTION_SERVICE_SIDE_KEEP_ALIVE_FREQ_SECS, &svc_keep_alive_freq_secs);

(b) is dictated by the service.

ewertons · 2018-06-13T18:11:03Z

Hi @BloodWine ,
we will close this issue for now since it hasn't been updated in a while.
Feel free to reopen it if you would like to follow up.

Thanks for using the Azure IoT SDKs.

ewertons · 2018-06-13T18:11:20Z

@BloodWine , thank you for your contribution to our open-sourced project! Please help us improve by filling out this 2-minute customer satisfaction survey.

ewertons self-assigned this Apr 17, 2018

ewertons added the question label Apr 17, 2018

anhashia added the area-service label Apr 25, 2018

ewertons closed this as completed Jun 13, 2018

chinnaece mentioned this issue Jul 31, 2019

Bad Certificate error In client while authenticating servers chain certificate from *.azure-devices.net #1092

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose detailed transport-level information (e.g. server idle timeout, ...) and statistics #453

Expose detailed transport-level information (e.g. server idle timeout, ...) and statistics #453

20100g commented Apr 11, 2018

ewertons commented Apr 17, 2018

ewertons commented Jun 13, 2018

ewertons commented Jun 13, 2018

Expose detailed transport-level information (e.g. server idle timeout, ...) and statistics #453

Expose detailed transport-level information (e.g. server idle timeout, ...) and statistics #453

Comments

20100g commented Apr 11, 2018

Description of the issue:

Code sample exhibiting the issue:

ewertons commented Apr 17, 2018

ewertons commented Jun 13, 2018

ewertons commented Jun 13, 2018