-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent TLS negotiation failure with (AP+STA) and max power save mode (IDFGH-4142) #6006
Comments
Thanks for the very detailed report and letting us know, we will look into. |
Thank you @Alvin1Zhang! |
In case it is helpful, attached is a network trace (in PCAP format... change .txt to .pcap). It contains a mixture of many HTTP request failures, co-mingled with I believe 2 successes (I wrote a for() to re-do the same HTTP request 100 times and took a 120 second capture of that traffic) I see many TCP RST and retransmissions within, but not sure why. Minimum heap space throughout was 108660. Free heap was typically between 150k and 180k throughout |
I did some more experimentation today, trying to narrow down the cause and noticed something interesting. After the device is initially programmed, it spins up a Wi-Fi AP for provisioning and uses a captive portal to do so (using a modified version of https://github.com/tonyp7/esp32-wifi-manager). It works as follows:
...HOWEVER... I noticed that if I reboot the device, and issue the same HTTP requests after rebooting (after obviously connecting to the in-range Wi-Fi network), the HTTP requests NEVER fail - I tried hundreds of requests and they all succeeded. The only difference I can think of here is that after rebooting, the device never spins up the internal AP - it goes right into STA mode (since it already knows which Wi-Fi network it needs to connect to). Could there be some edge case issuing HTTP requests that only presents itself when the device is in both AP and STA mode, but not when it jumps right into STA mode? |
Possibly the additional overhead of operating as STA+AP creates a timing issue and the server enforces a timeout |
If that was the case, wouldn't increasing the clock speed to 240 MHz improve things? It seems to occur equally as often at 80 MHz and 240 MHz. |
Hi @jgcjoec , As you are using a custom wifi-provisioning manager it is difficult to tell weather it properly closes the AP after provisiong. Can you please use the recommended wifi provisioning manager and see if the issue is still present ? The link is https://github.com/espressif/esp-idf/tree/master/examples/provisioning/wifi_prov_mgr. |
@AdityaHPatwardhan - the AP needs to remain active during and after provisioning completes to report success to a web-based UI. |
I did some more experimenting and when I add this single line of code (disabling the Soft AP) to my callback function for event SYSTEM_EVENT_STA_GOT_IP, the HTTP requests succeed - without having to reboot the device. esp_wifi_set_mode(WIFI_MODE_STA); This solves my HTTP request issue, but it unfortunately disconnects all stations that are connected, which prevents me from being able to send connection success/failure messages back to the web UI. Any suggestions for how to resolve this without shutting down the AP? |
On the softap side are you also using a tls connection? Is it websocket or polled? |
My web UI polls an API hosted by esp_http_server running on the AP. After credentials for the STA are supplied, the STA is started, and used to make an HTTPS (TLS 1.2) request to an AWS-hosted endpoint. I tried looking at the wifi_prov_mgr example as suggested, but it appears to have some sort built-in protocomm HTTP server that I unfortunately don't have the time to learn right now. I tried to bypass it by calling wifi_prov_scheme_softap_set_httpd_handle(), but I can't start my HTTP server until after the Wi-Fi AP starts (I get an "Invalid mbox" assertion in tcpip.c), but the documentation states I must call wifi_prov_scheme_softap_set_httpd_handle() before I call wifi_prov_mgr_start_provisioning(), which is before the Wi-Fi AP starts... seems like a catch 22. I saw no examples that used wifi_prov_scheme_softap_set_httpd_handle(). |
Ok, I think I figured out the problem.... Keep in mind my device works as follows:
To re-cap, things were falling apart on step #5... those HTTP requests were sometimes succeeding, sometimes failing (failing more often than succeeding. I observed that the HTTP requests ALWAYS succeed after step #7 (reboot) when the device is in STA mode only My device is battery powered so naturally I am taking steps to conserve battery life - including modem sleep. After commenting out the following line, HTTP requests succeed regardless of when they are issued: I was able to reproduce this behavior in a stand-alone application based on the "softAP" and "station" examples under "getting_started". I realize that power savings isn't free, but this seems like a bug to me... if the community agrees, I can post the sample code I wrote to reproduce this issue. |
Do you mean in either of examples (i.e. Anyhow, I think sample code can help us understand this better. Few additional points:
|
@mahavirj - I combined the "softAP" and "station" examples into one, added my HTTP posting code and was able to reproduce the problem at will by simply calling esp_wifi_set_ps(WIFI_PS_MAX_MODEM). I noticed that with WIFI_PS_MAX_MODEM enabled, AP power consumption is unchanged (which is what I expected from reading the docs), but STA power consumption is drastically reduced - which is perfect for a battery powered device. Regarding the provisioning examples, all of the examples I've looked at in the latest ESP-IDF release seem to focus on using a mobile app for provisioning and unfortunately, that won't work for my use-case. I need to be able to provision from a captive portal. Here is the code that I wrote to be able to reproduce this issue. To reproduce:
/* WiFi station Example
This example code is in the Public Domain (or CC0 licensed, at your option.)
Unless required by applicable law or agreed to in writing, this
software is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied.
*/
#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "freertos/event_groups.h"
#include "esp_http_client.h"
#include "esp_system.h"
#include "esp_wifi.h"
#include "esp_event.h"
#include "esp_log.h"
#include "nvs_flash.h"
#include "lwip/err.h"
#include "lwip/sys.h"
/* The examples use WiFi configuration that you can set via project configuration menu
If you'd rather not, just change the below entries to strings with
the config you want - ie #define EXAMPLE_WIFI_SSID "mywifissid"
*/
#define EXAMPLE_STA_ESP_WIFI_SSID "<YOUR WIFI SSID>"
#define EXAMPLE_STA_ESP_WIFI_PASS "<YOUR WIFI PASSWORD>"
#define EXAMPLE_STA_ESP_MAXIMUM_RETRY 5
#define EXAMPLE_AP_ESP_WIFI_SSID "SoftAP"
#define EXAMPLE_AP_ESP_WIFI_PASS "12345678"
#define EXAMPLE_AP_ESP_WIFI_CHANNEL 1
#define EXAMPLE_AP_MAX_STA_CONN 4
#define URL "https://httpbin.org/post"
#define JSON "{\"FirstName\":\"foo\",\"LastName\":\"bar\",\"PhoneNumber\":\"1112223333\",\"EmailAddress\":\"[email protected]\",\"NickName\":\"test\",\"DeviceAddress\":\"25:2f:2b:e6:1a:51\",\"DeviceName\":\"Temperature Sensor\",\"ModelName\":\"Temperature Sensor V1\"}"
/* FreeRTOS event group to signal when we are connected*/
static EventGroupHandle_t s_wifi_event_group;
/* The event group allows multiple bits for each event, but we only care about two events:
* - we are connected to the AP with an IP
* - we failed to connect after the maximum amount of retries */
#define WIFI_CONNECTED_BIT BIT0
#define WIFI_FAIL_BIT BIT1
static const char* TAG = "wifi station";
static int s_retry_num = 0;
static esp_http_client_handle_t http_client;
esp_err_t _http_event_handler(esp_http_client_event_t* evt)
{
switch (evt->event_id)
{
case HTTP_EVENT_ERROR:
ESP_LOGD(TAG, "HTTP_EVENT_ERROR");
break;
case HTTP_EVENT_ON_CONNECTED:
ESP_LOGD(TAG, "HTTP_EVENT_ON_CONNECTED");
break;
case HTTP_EVENT_HEADER_SENT:
ESP_LOGD(TAG, "HTTP_EVENT_HEADER_SENT");
break;
case HTTP_EVENT_ON_HEADER:
ESP_LOGD(TAG, "HTTP_EVENT_ON_HEADER, key=%s, value=%s", evt->header_key, evt->header_value);
break;
case HTTP_EVENT_ON_DATA:
ESP_LOGD(TAG, "HTTP_EVENT_ON_DATA, len=%d", evt->data_len);
break;
case HTTP_EVENT_ON_FINISH:
ESP_LOGD(TAG, "HTTP_EVENT_ON_FINISH");
break;
case HTTP_EVENT_DISCONNECTED:
ESP_LOGD(TAG, "HTTP_EVENT_DISCONNECTED");
break;
}
return ESP_OK;
}
void init_http(const char* url)
{
if (http_client == NULL)
{
ESP_LOGI(TAG, "Initializing HTTP client...");
esp_http_client_config_t config = {
.url = url,
.event_handler = _http_event_handler
};
http_client = esp_http_client_init(&config);
}
}
void shutdown_http_client()
{
if (http_client != NULL)
{
ESP_LOGI(TAG, "Shutting down HTTP client...");
esp_http_client_close(http_client);
esp_http_client_cleanup(http_client);
http_client = NULL;
}
}
void send_http_post_request(const char* url, char* json)
{
init_http(url);
int request_size = strlen(json);
esp_http_client_set_url(http_client, url);
esp_http_client_set_method(http_client, HTTP_METHOD_POST);
esp_http_client_set_post_field(http_client, json, request_size);
esp_http_client_set_header(http_client, "Content-Type", "application/json");
esp_http_client_set_header(http_client, "User-Agent", "kIoT");
esp_err_t err = esp_http_client_perform(http_client);
int status_code = esp_http_client_get_status_code(http_client);
if (err == ESP_OK)
{
if (status_code >= 100 && status_code <= 299)
{
ESP_LOGI(TAG, "HTTP POST to %s SUCCEEDED. HTTP status code: %d, %d bytes\nJSON: %s", url, status_code, request_size, json);
return;
}
else if ((status_code >= 400 && status_code <= 499) || (status_code == 500))
{
// These codes are NOT indicative of a connection problem. Bail immediately - don't waste power retrying.
ESP_LOGE(TAG, "HTTP POST to %s FAILED. HTTP status code: %d, %d bytes\nJSON: %s", url, status_code, request_size, json);
shutdown_http_client();
init_http(url);
return;
}
}
else
{
ESP_LOGE(TAG, "HTTP POST to %s FAILED with error %s. %d bytes\nJSON: %s", url, esp_err_to_name(err), request_size, json);
shutdown_http_client();
init_http(url);
return;
}
}
static void event_handler(void* arg, esp_event_base_t event_base,
int32_t event_id, void* event_data)
{
if (event_base == WIFI_EVENT && event_id == WIFI_EVENT_STA_START)
{
esp_wifi_connect();
}
else if (event_base == WIFI_EVENT && event_id == WIFI_EVENT_STA_DISCONNECTED)
{
if (s_retry_num < EXAMPLE_STA_ESP_MAXIMUM_RETRY)
{
esp_wifi_connect();
s_retry_num++;
ESP_LOGI(TAG, "retry to connect to the AP");
}
else
{
xEventGroupSetBits(s_wifi_event_group, WIFI_FAIL_BIT);
}
ESP_LOGI(TAG, "connect to the AP fail");
}
else if (event_id == WIFI_EVENT_AP_STACONNECTED)
{
wifi_event_ap_staconnected_t* event = (wifi_event_ap_staconnected_t*)event_data;
ESP_LOGI(TAG, "station "MACSTR" join, AID=%d", MAC2STR(event->mac), event->aid);
ESP_LOGI(TAG, "Station connected!");
esp_netif_create_default_wifi_sta();
wifi_config_t sta_config = {
.sta = {
.ssid = EXAMPLE_STA_ESP_WIFI_SSID,
.password = EXAMPLE_STA_ESP_WIFI_PASS,
/* Setting a password implies station will connect to all security modes including WEP/WPA.
* However these modes are deprecated and not advisable to be used. Incase your Access point
* doesn't support WPA2, these mode can be enabled by commenting below line */
.threshold.authmode = WIFI_AUTH_WPA2_PSK,
.pmf_cfg = {
.capable = true,
.required = false
},
}
};
esp_wifi_set_ps(WIFI_PS_MAX_MODEM);
ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_APSTA));
ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_STA, &sta_config));
}
else if (event_id == WIFI_EVENT_AP_STADISCONNECTED)
{
wifi_event_ap_stadisconnected_t* event = (wifi_event_ap_stadisconnected_t*)event_data;
ESP_LOGI(TAG, "station "MACSTR" leave, AID=%d",
MAC2STR(event->mac), event->aid);
}
else if (event_base == IP_EVENT && event_id == IP_EVENT_STA_GOT_IP)
{
ip_event_got_ip_t* event = (ip_event_got_ip_t*)event_data;
ESP_LOGI(TAG, "got ip:" IPSTR, IP2STR(&event->ip_info.ip));
s_retry_num = 0;
xEventGroupSetBits(s_wifi_event_group, WIFI_CONNECTED_BIT);
for (int i = 0; i < 200; i++)
{
send_http_post_request(URL, JSON);
ESP_LOGI(TAG, "Current free heap: %d, Minimum heap: %d", esp_get_free_heap_size(), esp_get_minimum_free_heap_size());
vTaskDelay(pdMS_TO_TICKS(1000));
}
}
}
void wifi_init_ap(void)
{
s_wifi_event_group = xEventGroupCreate();
ESP_ERROR_CHECK(esp_netif_init());
ESP_ERROR_CHECK(esp_event_loop_create_default());
esp_netif_create_default_wifi_ap();
wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();
ESP_ERROR_CHECK(esp_wifi_init(&cfg));
ESP_ERROR_CHECK(esp_event_handler_register(WIFI_EVENT, ESP_EVENT_ANY_ID, &event_handler, NULL));
ESP_ERROR_CHECK(esp_event_handler_register(IP_EVENT, IP_EVENT_STA_GOT_IP, &event_handler, NULL));
wifi_config_t ap_config = {
.ap = {
.ssid = EXAMPLE_AP_ESP_WIFI_SSID,
.ssid_len = 0,
.channel = EXAMPLE_AP_ESP_WIFI_CHANNEL,
.password = EXAMPLE_AP_ESP_WIFI_PASS,
.max_connection = EXAMPLE_AP_MAX_STA_CONN,
.beacon_interval = 100,
.authmode = WIFI_AUTH_WPA_WPA2_PSK
}
};
ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_AP));
ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_AP, &ap_config));
ESP_ERROR_CHECK(esp_wifi_start());
/* Waiting until either the connection is established (WIFI_CONNECTED_BIT) or connection failed for the maximum
* number of re-tries (WIFI_FAIL_BIT). The bits are set by event_handler() (see above) */
EventBits_t bits = xEventGroupWaitBits(s_wifi_event_group,
WIFI_CONNECTED_BIT | WIFI_FAIL_BIT,
pdFALSE,
pdFALSE,
portMAX_DELAY);
/* xEventGroupWaitBits() returns the bits before the call returned, hence we can test which event actually
* happened. */
if (bits & WIFI_CONNECTED_BIT)
{
ESP_LOGI(TAG, "connected to ap SSID:%s password:%s",
EXAMPLE_STA_ESP_WIFI_SSID, EXAMPLE_STA_ESP_WIFI_PASS);
}
else if (bits & WIFI_FAIL_BIT)
{
ESP_LOGI(TAG, "Failed to connect to SSID:%s, password:%s",
EXAMPLE_STA_ESP_WIFI_SSID, EXAMPLE_STA_ESP_WIFI_PASS);
}
else
{
ESP_LOGE(TAG, "UNEXPECTED EVENT");
}
}
void app_main(void)
{
//Initialize NVS
esp_err_t ret = nvs_flash_init();
if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND)
{
ESP_ERROR_CHECK(nvs_flash_erase());
ret = nvs_flash_init();
}
ESP_ERROR_CHECK(ret);
ESP_LOGI(TAG, "ESP_WIFI_MODE_STA");
wifi_init_ap();
} |
@jgcjoec Thanks for providing sample application, can confirm the issue for |
@mahavirj - excellent - thank you! |
Hi @jgcjoec, seems like this issue is related to lwip timeout. Its value is 5 sec, set it to 10 sec is a workaround. |
@YouDONG-ESP - would you mind indicating which specific setting in sdkconfig you're referring to? |
@jgcjoec It's not in sdkconfig, you can add |
@YouDONG-ESP - thank you, I will give that a try. Just to be clear, that is a work-around for this issue and not intended to be a permanent fix, correct? |
@jgcjoec yeah, just did a packet capture, it shows that the third step of SSL handshake will take more than 5 seconds when it's in AP STA coexist mode and max power save is enabled. Could you check the value of your AP's DTIM? You can do a packet capture and it's in the beacon frame. |
@YouDONG-ESP - I checked my wireless settings (I use Unifi AC Pro access points, if that is relevant) and DTIM is set to the default value of 3. |
@YouDONG-ESP - I just did some testing with various settings - here are the results: With timeout_ms set to 10,000 and WIFI_PS_MAX_MODEM enabled. I did 3 runs of 100, rebooting the device between each run, and examined the failure rate: With timeout_ms set to the default value and WIFI_PS_MAX_MODEM enabled. I did 3 runs of 100, rebooting the device between each run, and examined the failure rate: With timeout_ms set to the default value and WIFI_PS_MAX_MODEM disabled. I did 3 runs of 100, rebooting the device between each run, and examined the failure rate: |
Hi @jgcjoec, seems like this issue can't be reproduced again, is it the same on your side? |
@YouDONG-ESP - sorry for the late reply. I was out of town. I just re-ran the scenario above titled "timeout_ms set to the default value and WIFI_PS_MAX_MODEM enabled" and I still experienced a 10% failure rate. |
Please let me know if anyone needs me to do any further testing |
@jgcjoec can you provide me a packet capture for this issue? It didn't reproduce again on my side, Seems like the server replies faster than before. |
@YouDONG-ESP - please see attached trace where there were 2 failures. Please change the txt extension to pcap. Thanks,
|
@jgcjoec Thanks for your help, we have found the root casuse. Basically it's because our STA received beacon frames that shouldn't be received while sleeping. We are discusting solutions. |
@YouDONG-ESP - thanks, great news! I have a device that is briefly in AP & STA mode, then once provisioned, goes into STA only mode. esp_wifi_set_ps(WIFI_PS_MAX_MODEM) is used to save power when in STA mode only - power savings is fairly significant, too. |
@YouDONG-ESP wifi driver should be smart enough to know if ap is active then don't let sta sleep if there is no advantage. @jgcjoec i think he means call esp_wifi_set_ps(WIFI_PS_MAX_MODEM) after you switch to sta only mode |
@YouDONG-ESP, @negativekelvin - I could apply the work-around you suggest, but it would require me to modify a 3rd party component that I use for wifi provisioning (https://github.com/tonyp7/esp32-wifi-manager). |
#5638 3 Environment
Problem Description
I have an ESP-IDF application that has the following major components:
This application issues the following HTTP POST requests to an AWS-hosted application. The application is hosted using AWS ECS, with an application load balancer in front of ECS. SSL is terminated at the load balancer
For simplicity, let's just look at provisioning (all HTTP requests seem to be equally affected by this problem)
Sample provisioning request (NOTE: JSON has been purposefully obfuscated)
POST /api/Tenant/Register HTTP/1.1
User-Agent: kIoT
Host: www.jdiflowbot.com
Content-Type: application/json
Content-Length: 230
{"FirstName":"somename","LastName":"somename","PhoneNumber":"somenumber","EmailAddress":"[email protected]","NickName":"test","DeviceAddress":"1122334455","DeviceName":"Test","ModelName":"Test"}
Sometimes, these HTTP requests will succeed, but more often than not, they appear to fail during the TLS negotiation phase.
Things I know:
Things I've tried:
Expected Behavior
The HTTP requests should always succeed from ESP-IDF if they succeed from a PC on the same network
Actual Behavior
The HTTP requests from ESP-IDF do not always succeed
Steps to reproduce
I am hopeful that if someone sees my code snippet and the logs, someone more familiar with mbedTLS than I am can point me in the right direction
Code to reproduce this issue
Portions not relevent to HTTP requests have been omitted for simplicity. I also did not include the code for the http handler, but if necessary, I can provide that. It is fairly close to the sample from Espressif, though. This code is obviously designed to retry any failed HTTP request (up to 20 times currently)
Debug Logs
The debug logs are very, very long since logging is set to "Verbose". I copied first the "retry" from the while() loop above.
sdkconfig.txt
The text was updated successfully, but these errors were encountered: