Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] xTaskNotifyWait & ulTaskNotifyTake perform a non-deterministic operation in a critical section. #612

Closed
karver8 opened this issue Jan 15, 2023 · 4 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@karver8
Copy link

karver8 commented Jan 15, 2023

Describe the bug
The API functions xTaskNotifyWait(), ulTaskNotifyTake() and their indexed variants both make calls to prvAddCurrentTaskToDelayedList() from the inside of a critical section, which in turn calls vListInsert() on pxOverflowDelayedTaskList or pxDelayedTaskList depending on the current tick count(unless xTicksToWait equals portMAX_DELAY). This results in the kernel walking the list from inside the critical section, which does not happen anywhere else in the library.

This appears to break the rule that "FreeRTOS never performs a non-deterministic operation, such as walking a linked list, from inside a critical section or interrupt. ". I found it a little hard to believe something like this would be missed, due to how many eyes are on this code base, so I am half expecting there to be a reason that this rule does not apply. If there is a reason that this rule does not apply here, I think it should be documented.

I searched around quite a bit for any information on this, and I finally came across this post from 2018 with the same concern, which gave me the confidence to make this bug report.

If this is in fact an error, there is one more related issue with a comment inside the function vTaskPlaceOnEventList() :

/* THIS FUNCTION MUST BE CALLED WITH EITHER INTERRUPTS DISABLED OR THE
 * SCHEDULER SUSPENDED AND THE QUEUE BEING ACCESSED LOCKED. */

However this function makes a call to vListInsert() and prvAddCurrentTaskToDelayedList(), making this function non-deterministic, although there does not appear to be anything that calls this function with interrupts disabled. The comment should be changed to:

/* THIS FUNCTION MUST BE CALLED WITH THE SCHEDULER SUSPENDED AND THE QUEUE BEING ACCESSED LOCKED. */

This potentially impacts all systems that use task notifications with block times other than 0 and portMAX_DELAY, and were relying on bounded interrupt latency. This issue may also be present in SafeRTOS.

Solution
It seems simple to fix this by just suspending the scheduler instead, many other portions of the kernel add tasks to the delayed tasks list doing exactly this.

@karver8 karver8 added the bug Something isn't working label Jan 15, 2023
@RichardBarry
Copy link
Contributor

Thanks for taking the time to report this. I just added configASSERT( ulCriticalNesting == 0UL ) into vListInsert() and sure enough it triggered with this line in the call stack.

@someburner
Copy link

Will this have a perf impact if using portMAX_DELAY? Sorry not super versed on this codebase but curious.

@karver8
Copy link
Author

karver8 commented Apr 2, 2023

If by performance you mean determinism then no, the delayed task list will not be walked if a time out equal portMAX_DELAY or 0 is used.

Or are you talking about the performance impact of the fix, in terms of CPU utilization?

aggarg pushed a commit that referenced this issue Oct 17, 2023
This PR fixes the bug described in the following issue:
#612.
This was originally contributed in the following PR:
#625.

The implementation suspends the scheduler before exiting the
critical section (i.e. before enabling interrupts). If we do not do
so, a notification sent from an ISR, which happens after exiting
the critical section and before suspending the scheduler, will
get lost. The sequence of events will be:

1. Exit critical section.
2. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the Ready list.
3. Suspend scheduler.
4. prvAddCurrentTaskToDelayedList moves the task to the delayed
    or suspended list.
5. Resume scheduler does not touch the task (because it is not on
    the pendingReady list), effectively losing the notification from
    the ISR.

The same does not happen when we suspend the scheduler before
exiting the critical section. The sequence of events in this case will
be:

1. Suspend scheduler.
2. Exit critical section.
3. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the pendingReady list as the scheduler is suspended.
4. prvAddCurrentTaskToDelayedList adds the task to delayed or
    suspended list. Note that this operation does not nullify the add
    to pendingReady list done in the above step because a different
    list item, namely xEventListItem, is used for adding the task to the
    pendingReady list. In other words, the task still remains on the
    pendingReady list.
5. Resume scheduler moves the task from pendingReady list to the Ready list.

------------

Co-authored-by: Jacob Carver <[email protected]>
@kar-rahul-aws
Copy link
Member

Closing this issue, since the fix has been merged in the PR.

n9wxu pushed a commit to n9wxu/FreeRTOS-Kernel that referenced this issue Oct 26, 2023
This PR fixes the bug described in the following issue:
FreeRTOS#612.
This was originally contributed in the following PR:
FreeRTOS#625.

The implementation suspends the scheduler before exiting the
critical section (i.e. before enabling interrupts). If we do not do
so, a notification sent from an ISR, which happens after exiting
the critical section and before suspending the scheduler, will
get lost. The sequence of events will be:

1. Exit critical section.
2. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the Ready list.
3. Suspend scheduler.
4. prvAddCurrentTaskToDelayedList moves the task to the delayed
    or suspended list.
5. Resume scheduler does not touch the task (because it is not on
    the pendingReady list), effectively losing the notification from
    the ISR.

The same does not happen when we suspend the scheduler before
exiting the critical section. The sequence of events in this case will
be:

1. Suspend scheduler.
2. Exit critical section.
3. Interrupt - ISR calls xTaskNotifyFromISR which adds the task to
    the pendingReady list as the scheduler is suspended.
4. prvAddCurrentTaskToDelayedList adds the task to delayed or
    suspended list. Note that this operation does not nullify the add
    to pendingReady list done in the above step because a different
    list item, namely xEventListItem, is used for adding the task to the
    pendingReady list. In other words, the task still remains on the
    pendingReady list.
5. Resume scheduler moves the task from pendingReady list to the Ready list.

------------

Co-authored-by: Jacob Carver <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants