Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic esp_timer variation with BLE (IDFGH-9053) #10457

Closed
3 tasks done
spanky411 opened this issue Dec 28, 2022 · 11 comments
Closed
3 tasks done

Periodic esp_timer variation with BLE (IDFGH-9053) #10457

spanky411 opened this issue Dec 28, 2022 · 11 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@spanky411
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

With BLE (bluedroid) enabled, I'm seeing some uneven execution of the hardware timer. I'm trying to achieve 100Hz sample rate on a sensor and have setup an esp_timer to fire every 10000us (10ms). This works perfectly with BLE disabled, but when BLE is advertising, I get some periodic irregularities in the timing. This is a plot of the difference in esp_timer_get_time() between adjacent timer executions. You can see something is causing a periodic delay on the timer execution:
logger_event_variation_5 0
The subsequent timer event fires early to balance it out, making it look like a Rorschach test. On average its firing at 100Hz, but this variation is too significant for my application.

The same behavior is present on idf 4.4 and 5.0. I don't think this is a bug exactly, but I'm having a rough time finding the source of this periodic delay, or a way to work around it. Its a pretty big delay for a high-resolution hardware timer. Is it possible to restrict the variation in timer events (I thought the minimum esp_timer resolution is 50us, which is way below the +/-1ms I'm seeing)?

Here's the same plot with BLE off (which shows it executing pretty close to 10ms with minimal variation). Same code as above, just not starting BLE:
logger_event_variation_5 0-no-ble

Some more background:
Here's a vTaskList() dump. I'm using core1 to do this timer, and have pushed basically everything else to core0:

Task Name       Status  Prio    HWM     Task    Affinity
SensorTask      X       24      272     13      1
LoggerTask      R       3       3864    8       0
IDLE            R       0       952     6       1
IDLE            R       0       944     5       0
BTC_TASK        B       19      912     10      0
ipc0            B       24      520     1       0
Tmr Svc         B       1       1552    7       0
ipc1            B       24      480     2       1
esp_timer       S       22      3600    3       0
btController    B       23      2116    9       0
hciT            B       22      1552    11      0
BTU_TASK        B       20      2920    12      0

I'm using the timer to unblock the high priority SensorTask. I am reading esp_timer_get_time() in the timer ISR and then unblocking the task via xSemaphoreGiveFromISR and then portYIELD_FROM_ISR() (I've also tried esp_timer_isr_dispatch_need_yield() with the same results). Its about 13-14us from the ISR to the unblocked task. That is not the source of the delay in the above plot. The timer ISR itself is what appears to be delayed.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Dec 28, 2022
@github-actions github-actions bot changed the title Periodic esp_timer variation with BLE Periodic esp_timer variation with BLE (IDFGH-9053) Dec 28, 2022
@spanky411
Copy link
Author

Looks like I might have figured this one out (✊🪵). There's a "general purpose" gptimer under peripherals and the esp_timer under system. Both are called high resolution timers. The documentation for esp_timer is a bit more streamlined and made it sound like it was suitable, but that does not seem to be the case. I swapped out some code to replace the esp_timer with the gptimer and now the events are firing as expected, with minimal variation, even with BLE enabled.

It might be good to clarify or crosslink these two pages in the documentation.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/esp_timer.html
https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/gptimer.html

@KonstantinKondrashov
Copy link
Collaborator

Hi @spanky411 !
Have you tried to change the CONFIG_ESP_TIMER_INTERRUPT_LEVEL?
If you think that the issue is - "The timer ISR itself is what appears to be delayed" then this option might help.
It sets the interrupt level for esp_timer ISR in range 1..3. A higher level (3) helps to decrease the ISR esp_timer latency.

@spanky411
Copy link
Author

I tried setting that to 3 (when using esp_timer), but it didn't resolve the timer irregularities when BLE was running.

@KonstantinKondrashov
Copy link
Collaborator

@spanky411
esp_timer ISR is configured to be run only on the 0 core (it can not be shared with other CPUs). So when the 0 core is busy with a BT task then such behavior might happen, and handling of esp_timer ISR is postponed a little bit. The critical sections that are used for tread safety reasons disable interrupts up to 3 levels. It means that even raising the esp_timer ISR to 3 does not help here (as you confirmed). It is the general behavior of the system.

Using GPtimer helped your case, probably for some reasons: gptimer ISR is configured to share with any CPUs, or gptimer ISR is allocated on the 1 core.

Maybe as another solution move the BT task to another core (see menuconfig).

@KonstantinKondrashov
Copy link
Collaborator

@spanky411 Could you try to enable CONFIG_ESP_TIMER_PROFILING and run esp_timer_dump? It allows for measuring the time taken by each of the callbacks (Cb_exec_time).
Maybe such behavior is a sign that we have a bug in another component, ISR might be delayed because a callback function is overloaded.
It will give a log like this:

Timer stats:
Name                  Period      Alarm         Times_armed   Times_trigg   Times_skip    Cb_exec_time
periodic              500000      2803009        1              4             0             882  

Thanks.

@spanky411
Copy link
Author

@KonstantinKondrashov thank you for the insight. My vTaskList() dump showed esp_timer on core0 at priority 22, while btcontroller was higher at 23, so that falls inline with your statements.
I didn't check to see the core and priority allocation since I switched to gptimer. I'm away from my test setup this week, but I can revert and try esp_timer_dump if that would help.

@KonstantinKondrashov
Copy link
Collaborator

KonstantinKondrashov commented Jan 5, 2023

@spanky411
One more question, you mention that you used esp_timer_isr_dispatch_need_yield() function. It is available when CONFIG_ESP_TIMER_SUPPORTS_ISR_DISPATCH_METHOD=y. Does it mean that your timer is configured with ESP_TIMER_ISR, is that right?

    const esp_timer_create_args_t timer_args = {
        .callback = &isr_callback_func,
        .dispatch_method = ESP_TIMER_ISR,
    };

And you see that the isr_callback_func is postponed about 1ms, right?
If so then a higher-priority ISR runs long on CPU0, or because interrupts are disabled by a critical section on CPU0.
Then it would be useful to get esp_timer_dump anyway. It can reveal that something is wrong with BT.

If your case is above then I can provide a feature for esp_timer that solve it (but only for timers with ESP_TIMER_ISR).
Thanks.

EDIT:
Please share the sdkconfig file, and some description of what is BLE app is running. It will help to reproduce it from our side.

@spanky411
Copy link
Author

Here is the esp_timer_dump(). I had to revert some code, but I'm 99% sure this is the same esp_timer as before (I updated to idf5.0 in the middle of all of this and switched to gptimer).

Timer stats:
Name                  Period      Alarm         Times_armed   Times_trigg   Times_skip    Cb_exec_time
cmd_rsp_to            0           0             27            0             0             0           
adv_fc_mon            0           0             0             0             0             0           
sensor_loop           10000       146556974     1             14282         0             127357      

The dump appears to impact the timing. The one above is printed "only" when there is a delay from the previous execution (I've got some timestamps to check that. see currTSx below) I think this makes it the most "fair" reading of the timing. Also, this is printed from the SensorTask following the unblocking of the semaphore... see below.

Here is the callback (to your other question, I set it up to use ESP_TIMER_ISR if CONFIG_ESP_TIMER_SUPPORTS_ISR_DISPATCH_METHOD is set), so I tried it with both. The dump above is with that enabled:

static IRAM_ATTR void periodic_timer_callback(void * arg) {
    BaseType_t mustYield = false;
    currTSx = esp_timer_get_time();
    xSemaphoreGiveFromISR(rdySem, &mustYield);
    if(mustYield) {
#ifdef CONFIG_ESP_TIMER_SUPPORTS_ISR_DISPATCH_METHOD
        esp_timer_isr_dispatch_need_yield();
#else
        portYIELD_FROM_ISR();
#endif
    }
}

The sensortask running on core1 is pretty straightforward - timer calls the isr/callback above, this unblocks the semaphore, the sensortask loop reads some sensors and waits on the semaphore again.
core0 has a lot going on. In addition to the bluetooth, I have a logger task that writes to an SD card every 250ms. The timing here is much less critical (assuming everything gets logged, which appears to be the case).

It could be an interaction between the ble and sd card causing the delays. I only notice disappear it when I disable BLE (the plot above is derived from the logged timestamps from the sensortask, so tracking this variation over time would be difficult without that logging). It can take minutes for the timing delays to become noticeable, and even then, "on average" its fine and even when there are delays, they're interspersed with "well behaved" timings.

@KonstantinKondrashov
Copy link
Collaborator

Hi @spanky411!
Thanks for more information about it.
In short, you set up the sensor_loop esp_timer with ESP_TIMER_ISR option and 10ms period, periodic_timer_callback() is called from ISR, and you get time when it was actually called (currTSx = esp_timer_get_time()). If we see diff time for currTSx as on the plot above then it definitely happens due to a few reasons: esp_timer ISR is called only from CPU0, esp_timer ISR doesn't run because a higher-priority ISR runs on CPU0, or because interrupts are disabled by a critical section on CPU0.

There is a solution to make esp_timer unsusceptible to such cases. This patch makes esp_timer callable from any core.
Could I ask you to try this patch 21923.zip (it is for the master but I think not many changes are required for other versions)? You need to have enabled the CONFIG_ESP_TIMER_SUPPORTS_ISR_DISPATCH_METHOD, use esp_timer with ESP_TIMER_ISR, and select a new option - CONFIG_ESP_TIMER_INTERRUPT_SHARED.

(It is not the final version, I want to get confirmation that it helps for your case).

@spanky411
Copy link
Author

I applied the patch and it looks good. I let it run for a while and here's the resulting plot. It's advertising BLE the whole time and I connected a few times to transfer data.
Patched esp_timer:
esp_timer-withpatch

I also ran it with gptimer, and it looks like esp_timer is much tighter. Here's the gptimer plot and its a bit noisier (fine for my purposes, but the patched esp_timer above is much better).
gptimer:
gptimer

I didn't dig very deep into this, so it seems to work, but that's only based on running it for a few minutes.

I couldn't determine the critical differences between the 2 timers from the documentation. Based on the above plot, it looks like esp_timer is better. Is there any reason to stick with my switch to gptimer (I'll probably switch to the patched esp_timer since it looks to be much less variable than gptimer).

Thanks for looking into this, by the way.

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new Status: Reviewing Issue is being reviewed Resolution: NA Issue resolution is unavailable labels Feb 28, 2023
espressif-bot pushed a commit to espressif/esp-hal-components that referenced this issue Mar 7, 2023
These new settings allow you to balance the load on cores.
Closes: espressif/esp-idf#10457
espressif-bot pushed a commit to espressif/esp-hal-components that referenced this issue Mar 7, 2023
These new settings allow you to balance the load on cores.
Closes: espressif/esp-idf#10457
@riwalker
Copy link

riwalker commented Sep 26, 2023

I need 4 separate one-shot timers, with callbacks. timing 100us to 10ms each. confused by GPTimer vs High-Resolution (ESP-Timer)
based on above, either ?
and quantity of each is still max 4 on ESP32-C6 ?
and is this patch released now for 5.1.1 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

4 participants