Device Provisioning and IoT Hub service protocols require additional state management on top of the MQTT protocol. The Azure IoT Hub and Provisioning clients for C provide a common programming model. The clients must be layered on top of an MQTT client selected by the application developer.
The following aspects are being handled by the IoT Clients:
- Generate MQTT CONNECT credentials.
- Obtain SUBSCRIBE topic filters and PUBLISH topic strings required by various service features.
- Parse service errors and output an uniform error object model.
- Provide the correct sequence of events required to perform an operation.
- Provide suggested timing information when retrying operations.
The following aspects need to be handled by the application or convenience layers:
- Ensure secure TLS communication using either server or mutual X509 authentication.
- Perform MQTT transport-level operations.
- Delay execution for retry purposes.
- (Optional) Provide real-time clock information and perform HMAC-SHA256 operations for SAS token generation.
For more information about Azure IoT services using MQTT see this article.
In order to port the clients to a target platform the following items are required:
- Support for a C99 compiler.
- Types such as
uint8_t
must be defined. - The target platform supports a stack of several kB (actual requirement depends on features being used and data sizes).
- An MQTT over TLS client supporting QoS 0 and 1 messages.
Optionally, the IoT services support MQTT tunneling over WebSocket Secure which allows bypassing firewalls where port 8883 is not open. Using WebSockets also allows usage of devices that must go through a WebProxy. Application developers are responsible with setting up the wss:// tunnel.
The application code is required to initialize the TLS and MQTT stacks. Detailed information about TLS over TCP/IP requirements can be found at https://docs.microsoft.com/azure/iot-hub/iot-hub-tls-support.
Two authentication schemes are currently supported: X509 Client Certificate Authentication and Shared Access Signature authentication.
When X509 client authentication is used, the MQTT password field should be an empty string.
If SAS tokens are used the following APIs provide a way to create as well as refresh the lifetime of the used token upon reconnect.
Example:
if(az_result_failed(az_iot_hub_client_sas_get_signature(client, unix_time + 3600, signature, &signature)));
{
// error.
}
// Application will Base64Encode the HMAC256 of the az_span_ptr(signature) containing az_span_size(signature) bytes with the Shared Access Key.
if(az_result_failed(az_iot_hub_client_sas_get_password(client, NULL, base64_hmac_sha256_signature, password, password_size, &password_length)))
{
// error.
}
Recommended defaults:
- MQTT Keep-Alive Interval: AZ_IOT_DEFAULT_MQTT_CONNECT_KEEPALIVE_SECONDS
- MQTT Clean Session: false.
We recommend to always use Clean Session false when connecting to IoT Hub. Connecting with Clean Session true will remove all enqueued C2D messages.
Each service requiring a subscription implements a function similar to the following:
Example:
// AZ_IOT_HUB_CLIENT_METHODS_SUBSCRIBE_TOPIC contains the methods topic filter.
MQTTClient_subscribe(mqtt_client, AZ_IOT_HUB_CLIENT_METHODS_SUBSCRIBE_TOPIC, 1);
Note: If the MQTT stack allows, it is recommended to subscribe prior to connecting.
Each action (e.g. send telemetry, request twin) is represented by a separate public API. The application is responsible for filling in the MQTT payload with the format expected by the service.
Example:
if(az_result_failed(az_iot_hub_client_telemetry_get_publish_topic(client, NULL, topic, topic_size, NULL)))
{
// error.
}
Note: To limit overheads, when publishing, it is recommended to serialize as many MQTT messages within the same TLS record. This feature may not be available on all MQTT/TLS/Sockets stacks.
We recommend that the handling of incoming MQTT PUB messages is implemented by a chain-of-responsibility architecture. Each handler is passed the topic and will either accept and return a response, or pass it to the next handler.
Example:
az_iot_hub_client_c2d_request c2d_request;
az_iot_hub_client_method_request method_request;
az_iot_hub_client_twin_response twin_response;
//az_span received_topic is filled by the application.
if (az_result_succeeded(az_iot_hub_client_c2d_parse_received_topic(client, received_topic, &c2d_request)))
{
// This is a C2D message:
// c2d_request.properties contain the properties of the message.
// the MQTT message payload contains the data.
}
else if (az_result_succeeded(ret = az_iot_hub_client_methods_parse_received_topic(client, received_topic, &method_request)))
{
// This is a Method request:
// method_request.name contains the method
// method_request.request_id contains the request ID that must be used to submit the response using az_iot_hub_client_methods_response_get_publish_topic()
}
else if (az_result_succeeded(ret = az_iot_hub_client_twin_parse_received_topic(client, received_topic, &twin_response)))
{
// This is a Twin operation.
switch (twin_response.response_type)
{
case AZ_IOT_CLIENT_TWIN_RESPONSE_TYPE_GET:
// This is a response to a az_iot_hub_client_twin_document_get_publish_topic.
break;
case AZ_IOT_CLIENT_TWIN_RESPONSE_TYPE_DESIRED_PROPERTIES:
// This is received as the Twin desired properties were changed using the service client.
break;
case AZ_IOT_CLIENT_TWIN_RESPONSE_TYPE_REPORTED_PROPERTIES:
// This is a response received after patching the reported properties using az_iot_hub_client_twin_patch_get_publish_topic().
break;
default:
// error.
}
}
Important: C2D messages are not enqueued until the device establishes the first MQTT session (connects for the first time to IoT Hub). The C2D message queue is preserved (according to the per-message time-to-live) as long as the device connects with Clean Session false
.
Retrying operations requires understanding two aspects: error evaluation (did the operation fail, should the operation be retried) and retry timing (how long to delay before retrying the operation). The IoT client library is supplying optional APIs for error classification and retry timing.
The SDK will not handle protocol-level (WebSocket, MQTT, TLS or TCP) errors. The application-developer is expected to classify and handle errors the following way:
- Operations failing due to authentication errors should not be retried.
- Operations failing due to communication-related errors other than ones security-related (e.g. TLS Alert) may be retried.
Both IoT Hub and Provisioning services will use MQTT CONNACK
as described in Section 3.2.2.3 of the MQTT v3.1.1 specification.
Note: The Provisioning Service query polling operation may result in retriable errors. In some cases, the service response will not include an operation_id
. In this case, the device may either reuse a cached operation_id
or restart the flow from the register step.
APIs using az_iot_status
report service-side errors to the client through the IoT protocols.
The following APIs may be used to determine if the status indicates an error and if the operation should be retried:
az_iot_status status = response.status;
if (az_iot_status_succeeded(status))
{
// success case
}
else
{
if (az_iot_status_retriable(status))
{
// retry
}
else
{
// fail
}
}
Network timeouts and the MQTT keep-alive interval should be configured considering tradeoffs between how fast network issues are detected vs traffic overheads. This document describes the recommended keep-alive timeouts as well as the minimum idle timeout supported by Azure IoT services.
For connectivity issues at all layers (TCP, TLS, MQTT) as well as cases where there is no retry-after
sent by the service, we suggest using an exponential back-off with random jitter function. az_iot_retry_calc_delay
is available in Azure IoT Common:
// The previous operation took operation_msec.
// The application calculates random_jitter_msec between 0 and max_random_jitter_msec.
int32_t delay_msec = az_iot_calculate_retry_delay(operation_msec, attempt, min_retry_delay_msec, max_retry_delay_msec, random_jitter_msec);
Note 1: The network stack may have used more time than the recommended delay before timing out. (e.g. The operation timed out after 2 minutes while the delay between operations is 1 second). In this case there is no need to delay the next operation.
Note 2: To determine the parameters of the exponential with back-off retry strategy, we recommend modeling the network characteristics (including failure-modes). Compare the results with defined SLAs for device connectivity (e.g. 1M devices must be connected in under 30 minutes) and with the available Azure IoT Hub scale and Azure Provisioning Service Scale (especially consider throttling, quotas and maximum requests/connects per second).
In the absence of modeling, we recommend the following default:
min_retry_delay_msec = 1000;
max_retry_delay_msec = 100000;
max_random_jitter_msec = 5000;
For service-level errors, the Provisioning Service is providing a retry-after
(in seconds) parameter:
// az_iot_provisioning_client_received_topic_payload_parse was successful and created a az_iot_provisioning_client_register_response response
int32_t delay_ms;
if ( response.retry_after_seconds > 0 )
{
delay_ms = response.retry_after_seconds;
}
else
{
delay_ms = az_iot_calculate_retry_delay(operation_msec, attempt, min_retry_delay_msec, max_retry_delay_msec, random_jitter_msec);
}
Combining the functions above we recommend the following flow:
When using Provisioning Service, we recommend using a MAX_HUB_RETRY
(default 10) to handle cases where the Edge/Stack or IoT Hub changed endpoint information.
When devices are using IoT Hub without Provisioning Service, we recommend attempting to rotate the IoT Credentials (SAS Token or X509 Certificate) on authentication issues.
Note: Authentication issues observed in the following cases do not require credentials to be rotated:
- DNS issues (such as WiFi Captive Portal redirects)
- WebSockets Proxy server authentication