Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

light sleep mode & WiFi & millis() overflow #8913

Open
Bruno-Muller opened this issue Apr 22, 2023 · 16 comments
Open

light sleep mode & WiFi & millis() overflow #8913

Bruno-Muller opened this issue Apr 22, 2023 · 16 comments

Comments

@Bruno-Muller
Copy link

Hi,

Issue:
WiFi stops working after a few hours when using the light sleep mode.

Description:
I am not sure if the Arduino library is doing well with the time drifting caused by the Light Sleep Mode because the clock is idle. And if the millis() overflow can cause any dead lock issue in the timeout condition to exit loops.

I'm not an expert, I suspect that the WiFi API does not renew and negotiate some WiFi token in a timely manner because millis() is not a good time indicator when we use the light sleep mode.

Steps to reproduce:
Periodically put the ESP8266 in light sleep mode.
Here in my tests, the wake time is around 5s, and sleep time is around 15s.

In setup():

WiFi.mode(WIFI_STA);

// I've seen that the library will override the value to MODEM_SLEEP later.
// but I will go around to avoid the situation with some custom sleep functions.
WiFi.setSleepMode(WiFiSleepType::WIFI_LIGHT_SLEEP);

WiFi.begin(WIFI_SSID, WIFI_PASSWORD);

Go to light sleep and wake up functions:

volatile bool wifi_sleeping = false;

// call this function to go to light sleep mode
void start_light_sleep(uint32_t time_ms) {

  // Not sure if this is needed, but I've read that all timer interrupts should be disabled to not interfere
  // So maybe it makes sense:
  extern os_timer_t *timer_list; 
  timer_list = nullptr; 

  wifi_station_disconnect();
  wifi_set_opmode(NULL_MODE);
  wifi_fpm_set_sleep_type(LIGHT_SLEEP_T);
  wifi_fpm_open(); 

  Serial.println("ENTER LIGHT SLEEP MODE");
  Serial.flush();
  Serial.end();

  wifi_sleeping = true;
  
  wifi_fpm_set_wakeup_cb(light_sleep_wakeup);
  wifi_fpm_do_sleep(time_ms*1000);  // in us, finally we are going to light sleep mode.
  
  // So I've read on the internet that the delay should be sleep_time_ms + 1 ms.
  // But I'm not sure if this is exact arithmancy.
  // I've measured that the ESP8266 needs around 460 ms to go to light sleep.
  // So I assume that the delay has to be at least 460ms.
  esp_delay(time_ms+1, [](){ return wifi_sleeping; }); 
}

// this function is called by the wake up callback interrupt. Do not call it yourself.
void light_sleep_wakeup(void) { 

 // I've read that some functions taking time should be called here. Not sure why.
 // Maybe removing it will break the wake up function.
 // I think the underlying function may call some esp yield or else.
  Serial.begin(BAUDRATE);
  Serial.println("EXIT LIGHT SLEEP MODE");
  Serial.flush();

  wifi_fpm_close();  
  wifi_set_opmode(STATION_MODE); 
  wifi_station_connect();   

  wifi_sleeping = false;
}

Analysis:

I notice that millis() function is badly understood on the internet, and a lot of people assume that it is 2^32 ms (i.e. 49.7 days) overflow.
I did some measurement and it is clear that millis() expires after 2^32/1000 ms (i.e. 2^32 us, or 71.6 min).

In addition to this, I notice in Arduino ESP8266 library source code that the code is compensating some clock drifting (maybe 826us per some amount of time, I don't recall exactly).

I am not sure that the clock drifting compensation or the overflow can cause side effects on the timeout while loops or exit condition used in the library.
For example, the following line is from the WiFi API, if millis() is less than last_sent, then it could lead to unexpected behavior.
if (millis() - last_sent > (uint32_t) max_wait_ms) {...}

I also noticed that the light sleep mode is badly understood as well. I've seen people measuring current with an oscilloscope and do some arithmancy, trying to guess if light sleep mode works or not, and doing assumption on how it has to be done, with no good reasons.......
Or maybe Google just does not help to find the reliable information lost somewhere on the Internet.
Well, by looking at the source code of Arduino ESP8266 Library source code and the official documentation of ESP 8266, I found out a workable solution to use the light sleep mode (mentioned above).

As a consequence, because the clock is idle during light sleep mode, the millis() function cannot be used to measure the time anymore.
I wonder if millis() is the time source in the WiFi API to trigger the renegotiation of some WiFi token.
That could explain why the wifi stops working.

@Bruno-Muller Bruno-Muller changed the title light sleep mode & WiFI & millis() overflow light sleep mode & WiFi & millis() overflow Apr 22, 2023
@JAndrassy
Copy link
Contributor

JAndrassy commented Apr 22, 2023

since millis() and last_sent are unsigned, the result of the subtraction is right even if millis() overflows and is less then last_sent

@Bruno-Muller
Copy link
Author

Bruno-Muller commented Apr 22, 2023

since millis() and last_sent are unsigned, the result of the subtraction is right even if millis() overflows and is less then last_sent

I agree, this is not a dead lock. It will timeout too early.
Maybe it is a bad example without the whole context.

I've reproduced the issue 3 times already but I'm still trying to understand what happen, I'm trying to collect more information.

I don't know if the library is doing something like below, I write this just as an example to highlight the concerns:

void refresh_token() {
  ...
  token_expiry_time = millis() + token_life_time;
}

...

// this will end in a dead lock, because millis overflow happens at 4 294 967 and not at 4 294 967 296
// Even if millis overflow is properly managed by the state of the art, the token will not be refreshed in a timely fashion because the clock is idle during sleep mode
if (millis() > token_expiry_time ) refresh_token(); 

Or the issue is maybe because I call the native ESP SDK. And it is preventing the Arduino ESP8266 library WiFi class to work properly.

Edit:

// in the Arduino setup()
WiFi.mode(WIFI_STA);
WiFi.setSleepMode(WiFiSleepType::WIFI_LIGHT_SLEEP);
WiFi.begin(WIFI_SSID, WIFI_PASSWORD);

....

// in a task called from the Arduino loop()
  wifi_station_disconnect();
  wifi_set_opmode(NULL_MODE);     
  wifi_fpm_set_sleep_type(LIGHT_SLEEP_T);
  wifi_fpm_open(); 
  wifi_fpm_set_wakeup_cb(wakeup_callback);
  wifi_fpm_do_sleep(15e3*1000); 
  
  esp_delay(15e3+1, [](){ return wifi_sleeping; });
....

// in the callback
wifi_fpm_close(); 
wifi_set_opmode(STATION_MODE); 
wifi_station_connect();  

.... 

// in the task called from the Arduino loop()
if (WiFi.status() != WL_CONNECTED) return; // do no run the task if wifi is not connected
// This works but after some time, Wifi is not able to reconnect anymore

Issue occurs after approximately 3h55 real up time, corresponding to 1h24 ESP up time (remember clock is idle in light sleep mode)
DHCP lease time is 12 hours

@JAndrassy
Copy link
Contributor

It will timeout too early.

no. it will timeout exactly as it should

@Bruno-Muller
Copy link
Author

It will timeout too early.

no. it will timeout exactly as it should

You can try this code,
It is configured to timeout after 30 ms, and it will timeout too early during the millis() overflow;
It uses _millis to fake the tick counter and not wait 71min real time.

    const uint32_t MILLIS_OVERFLOW = 0xFFFFFFFF/1000;

    char str[50];
    snprintf(str, 50, "MILLIS_OVERFLOW :%i", MILLIS_OVERFLOW);
    Serial.println(str);

    // We want to timeout after 30 millis() ticks
    uint32_t timeout = 30; 

    // We are 10 ticks before overflow
    uint32_t _millis = MILLIS_OVERFLOW - 10; // = millis(); 

    // last_sent happend 5 ticks earlier
    uint32_t last_sent = _millis - 5;

    // We expect that has_timed_out will be true in 25 ticks.
    for (int tick=0; tick<35; tick++) {

      // this will time out at tick 10, and not tick 25
      bool has_timed_out = (_millis - last_sent) > timeout;
      
      snprintf(str, 50, "tick:%i millis():%i has_timed_out:%s", tick, _millis, has_timed_out?"true":"false");
      Serial.println(str);

      // increment the millis() tick counter
      _millis = (_millis+1)%MILLIS_OVERFLOW;      
    }

The output:

tick:0 millis():4294957 has_timed_out:false
...
tick:9 millis():4294966 has_timed_out:false
tick:10 millis():0 has_timed_out:true

@JAndrassy
Copy link
Contributor

the test is not right because _millis - last_sent is not in done in %MILLIS_OVERFLOW

@Bruno-Muller
Copy link
Author

I don't see what you mean, _millis - last_sent is exactly the kind of test done by the Library in the WiFi API.

Any idea regarding the Wifi that does not connect after 4 hours ?
Could it be the WPA key rotation ?

@JAndrassy
Copy link
Contributor

sorry, now I get it. If millis() don't use the full bit width of the data type, then the subtraction 'trick' doesn't work. This is bad.

CC: @d-a-v

@Bruno-Muller
Copy link
Author

Yes :) Exactly, millis() doesn't use the full bit width of the data type.
Not sure if it can cause any problem, or be the root cause of mine.

And also, millis() does not count time when the chip is in light sleep mode.
This may cause some WPA key rotation issue (I really have no idea) or other issues.

@mcspr
Copy link
Collaborator

mcspr commented Apr 22, 2023

extern os_timer_t *timer_list;
timer_list = nullptr;

You stop the overflow handler, so works as expected? :)
System timer does that job, and does indeed correct the time so it does not go back to 0 on the limit you mentioned

os_timer_setfn(&micros_overflow_timer, (os_timer_func_t*) &micros_overflow_tick, 0);

From memory, forced sleep just does not work properly? It does something like this

  • fpm enable attaches specific func as an idle task in our system context
  • yield() / delay() returns to system scheduler, eventually running this idle task
  • idle task does some prep work like disabling interrupts and various system tasks. we have to disable wifi manually, though.
  • then, cpu sleep happens (which is just waiti INTERRUPT_MASK). after that point, either timer or gpio wakes us up
  • we wake up from light sleep, idle task exits and scheduler runs our loop. fpm disable removes idle task, everything back to normal

What does not seem to work is the 'run cpu sleep' part. esp_delay above just yields to system for the given duration, function would block and won't return until that time arrives; not really a full-fledged sleep like deep sleep or light sleep without timer
(ref. #6318)

edit2: there is also #7979 that helps out with sleep funcs. does the same thing with timers list, though. but, suppose restoring it right ater may somehow make it tick again

@dok-net
Copy link
Contributor

dok-net commented Apr 23, 2023

Did anyone check out #7979 and test this issue again based on it?

@Bruno-Muller
Copy link
Author

extern os_timer_t *timer_list;
timer_list = nullptr;

You stop the overflow handler, so works as expected? :)

Hum, that's one Voodoo thing someone mentioned to do on the Internet, so I did...

System timer does that job, and does indeed correct the time so it does not go back to 0 on the limit you mentioned

Do you mean that if I restore the pointer then I will get the full 32 bits in millis() ? I'll give it a try.

I'm not sure yet how #6318 can help, it talks a lot about Deep Sleep.
I'll have to found out what they are referring to when the say:

If anything, nikolz0 has created a working "light sleep'

They say also that the clock is still running for a few ms or us (and someone even suspected that the clock is still running but at a very lower frequency).

Looking at the documentation;
The clock stops, and is not running at a lower frequency.
The chip does not enter light sleep mode until some "idle task" is executed. This is from ESP NON-OS SDK documentation.
I measure around 460 ms to enter light sleep mode, so it makes sense.

But I suspect that the ESP SDK documentation is not accurate when they talk about the "idle task" in wifi_fpm_do_sleep
ESP NON-OS SDK uses the same sentence than ESP RTOS SDK and I think they mistakenly copy pasted the the "idle task" thing.

RTOS SDK has an idle task indeed but, I am not aware that NON-OS SDK has an idle task.
Arduino ESP 8266 Library is made upon the NON-OS SDK, so with no "idle task" to my knowledge.

So maybe some yield or I don't know what is triggering the final go_to_sleep.

What does not seem to work is the 'run cpu sleep' part. esp_delay above just yields to system for the given duration, function would block and won't return until that time arrives; not really a full-fledged sleep like deep sleep or light sleep without timer.

I'm not sure if you mean this :
I've seen people simply doing wifi_fpm_do_sleep(wait_time_us); esp_delay(wait_time_ms);
This initiate the ESP light sleep mode for wait_time_us.
It starts to wait in esp_delay,
Then after a few ms, it enters into light sleep, for wait_time_us duration.
Then after wake up, it continues to wait for esp_delay remaining wait_time_ms duration.

I fixed it like this, so esp_delay does not yields to system for the given duration,

volatile bool is_light_sleep = false;

void start_light_sleep(uint32_t wait_time_ms) {
  ...
  is_light_sleep = true;
  wifi_fpm_set_wakeup_cb(light_sleep_wakeup);
  wifi_fpm_do_sleep(wait_time_ms*1000);
  esp_delay(wait_time_ms+1, [](){ return is_light_sleep ; }); 
}


void light_sleep_wakeup(void) { 
  ...
  is_light_sleep = false;
}

Regarding #7979,
The current library does not have forceLightSleepBegin().
It has forceSleepBegin() however, but it does not take into account WiFi.setSleepMode(WiFiSleepType::WIFI_LIGHT_SLEEP); and forces wifi_fpm_set_sleep_type(MODEM_SLEEP_T); instead.

@dok-net
Copy link
Contributor

dok-net commented Apr 23, 2023

Regarding #7979, The current library does not have forceLightSleepBegin(). It has forceSleepBegin() however, but it does not take into account WiFi.setSleepMode(WiFiSleepType::WIFI_LIGHT_SLEEP); and forces wifi_fpm_set_sleep_type(MODEM_SLEEP_T); instead.

What do you mean by "the current" library? #7979 gives you ESP.forcedLightSleepBegin(microseconds, callback). If you like, maybe comment on this in terms of libraries\esp8266\examples\ForcedLightSleep\ForcedLightSleep.ino, that should make it easier to find some common ground.

@Bruno-Muller
Copy link
Author

What do you mean by "the current" library?

The library we get by adding the Board Manager, https://arduino.esp8266.com/stable/package_esp8266com_index.json version 3.1.2.
Hum, libraries\esp8266\examples\ does not contain ForcedLightSleep, the branch is not pulled yet or I don't know.

@dok-net
Copy link
Contributor

dok-net commented Apr 23, 2023

@Bruno-Muller The Board Manager selects BSPs, a.k.a. Arduino cores - not libraries :-)
The aforementioned #7979 is a pending PR, as such, not available via the Board Manager. Explaining all the steps is a little out of scope here. If you know or find out how to use Git master of the core, you should be able to fetch and checkout #7979 as well, for testing. Google your way to it, please.

Possible approach: clone https://github.com/dok-net/arduino-esp8266.git to your /hardware/esp8266com/esp8266 directory, checkout the wifioff branch. Delete later, you wouldn't always want to get my fork's main (master) later ;-) Remember to download the SDK by calling py get.py from the tools directory.

@mcspr
Copy link
Collaborator

mcspr commented Apr 23, 2023

But I suspect that the ESP SDK documentation is not accurate when they talk about the "idle task" in wifi_fpm_do_sleep
ESP NON-OS SDK uses the same sentence than ESP RTOS SDK and I think they mistakenly copy pasted the the "idle task" thing.

RTOS SDK has an idle task indeed but, I am not aware that NON-OS SDK has an idle task.
Arduino ESP 8266 Library is made upon the NON-OS SDK, so with no "idle task" to my knowledge.

Our questions about SDK currently can only be answered by reading disassembly ourselves or using some prior developments. No need to guess or believe without question any random voodoo from the internet :)
ROM has this minimal scheduling implementation, plus idle task. Running fpm sleep and wrapping around those funcs confirms they are actually used. The rest may be different depending on versions. As you mentioned, RTOS mostly ignores it in favour of its own scheduling. NONOS still uses it + some re-implemented parts.

My point is light sleep as implemented in NONOS does not really look that useful. Disabling existing timers still adds yet another one through the use of our delay() call, which returns us to the user task after expiring. System time is still ticking forward, timer callbacks can and will be executed, we can even re-enter user task unless explicitly blocking via something like esp_delay (notice it checks millis() counter when woken up early)

@Bruno-Muller
Copy link
Author

Hi, I need to update the status on this,

(I haven't touched my project during 1 year, waiting for library updates and bug fixes)

Wifi disconnection

Having updated all the libraries, and changed my WiFi access point.
The WiFi disconnection issue after 3h55 uptime is resolved.
I'm sorry, I cannot tell if it has been solved by the library update, or by the new WiFi access point.

Light Sleep Mode

My workaround to use Light Sleep Mode is functional.
I still see people struggling to implement Light Sleep Mode, because it is not natively supported by the library.
It is not an "issue", but could be a "feature request" instead.

However, offering the option to set the sleep mode to WIFI_LIGHT_SLEEP is another "issue".
It should be clear for the developers that the mode is not supported by the library and the function call WiFi.setSleepMode(WiFiSleepType::WIFI_LIGHT_SLEEP); should return false at least.

millis() function

And there is a lack of shared knowledge about millis() not using the full 32 bits.
In fact, millis() gets time_us / 1000 and maximum value is like 2^32 / 1000
While not being an "issue", there is a potential risk of bug if badly used, and developers must take extra precaution to avoid the situation.

However, the ESP8266 Arduino Library implementation does not meet the official Arduino library reference. regarding the overflow of millis().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants