Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reset kernel state for restarting scheduler #944

Merged
merged 73 commits into from
Feb 19, 2024

Conversation

chinglee-iot
Copy link
Member

@chinglee-iot chinglee-iot commented Jan 4, 2024

Support reset kernel state

Description

This PR is originated by cmorganBE in #914.

Application may restart the scheduler for each test case when it is running with a test framework.
This requirement is useful for simulator port like posix. Some of dynamic analysis tools are difficult to run on an embedded target. valgrind is one of the example. With this feature, application writers can better utilize the test framework and valgrind to help diagnosis memory and thread bugs.

FreeRTOS kernel initialize some of the internal variables at declaration time. Therefore, init function is not required in kernel.
To restart the scheduler, kernel state needs to be reset by setting internal variables to its default value before next time starting the scheduler.

In this PR:

  • Add functions to reset kernel state. These functions are only required for source file with initialized internal variables, including tasks.c, timer.c, croutine.c and heap_x.c.
    • void vTaskResetState( void )
    • void vTimerResetState( void )
    • void vPortHeapResetState( void )
    • void vCoRoutineResetState( void )

The change in the PR only re-initialize internal variables. Memory or other resource allocated during the time scheduler is running should still be freed by the port or application.

Test Steps section provides an example usage of these functions.

Test Steps

Testing with the following application on posix port.

#include <stdio.h>

#include "FreeRTOS.h"
#include "task.h"

void prvTestTask( void * pvParameters )
{
    vTaskDelay( pdMS_TO_TICKS( 1000 ) );
    printf( "End scheduler\r\n" );
    vTaskEndScheduler();
}

void main_test( void )
{
    for(;;)
   {
        xTaskCreate( prvTestTask, "Test", configMINIMAL_STACK_SIZE, NULL, tskIDLE_PRIORITY + 2, NULL );
        vTaskStartScheduler();
        vTaskResetState();   /* Without this function call, there will be exception. */
    }
}

Checklist:

  • I have tested my changes. No regression in existing tests.
  • I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Related Issue

#914

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

cmorganBE and others added 30 commits December 1, 2023 08:19
…ort to avoid shutdown deadlock

Fixes an issue where a pthread_cancel() of the idle thread will hang, blocking FreeRTOS shutdown, due to a lack of
cancellation points.

Introduce a POSIX port specific cancellation point in vPortIdle() through a call to pthread_testcancel().
…ling with non-FreeRTOS pthreads

Improve upon the elegant approach of using signals to cause task/pthreads
suspension and scheduler execution by using directed signals.

This fixes:
- Deadlocks in non-FreeRTOS pthreads
- Multiple FreeRTOS tasks(pthreads) incorrectly running at the same time

By directing the signals using pthread_kill() the signal handler in the presently running
FreeRTOS task/pthread will be called, ensuring that the scheduler runs both in the context
of a FreeRTOS task/pthread and from the presently executing FreeRTOS task/pthread.

Details
==============

The POSIX port uses signals to preempt FreeRTOS tasks (implemented as pthreads), a very neat and elegant
approach to forcing tasks/pthreads to suspend and run the scheduler.

Signal handlers are process global.

Posix timers generate signals when the timer expires, and the signal is sent to the currently
running pthread.

In systems where there are pthreads that are NOT a result of creating FreeRTOS tasks, such as the
entry point thread that calls main(), or user created pthreads, this poses a serious issue.

While the POSIX port only allows a single FreeRTOS pthread to run at once, by causing all suspended
threads to not be scheduled due to their waiting on a pthread condition variable,
this isn't the case with non-FreeRTOS pthreads.

Thus it is possible that a non-FreeRTOS pthread is running when the timer expires and the signal
is generated. This results in the signal handler running in the non-FreeRTOS thread.

The sequence of events results in these events from signal handler context:
- vPortSystemTickHandler() being called
- The scheduler running
- Selecting another FreeRTOS task to run and switching the active task
- The newly selected task released from suspension by pthread_cond_signal()
- The presently active thread calling event_wait()
- The pthread calling pthread_cond_wait(), suspending the thread and allowing the host OS scheduler
  to schedule another thread to run.

If this occurs from a non-FreeRTOS thread this results in:
- The active FreeRTOS pthread (Task A/Thread A) continuing to run (as the signal handler that calls
  event_wait() ran instead in a non-FreeRTOS pthread.
- The pthread where the signal handler did run (Thread B) will call event_wait() and pthread_cond_wait(),
  but on the condition variable of the previously active FreeRTOS task, oops. This causes the
  non-FreeRTOS pthread to block unexpectedly relative to what the developer might have expected.
- The newly selected FreeRTOS Task (Task C/Thread C) will resume and start running.

At this point Task A/Thread A is running concurrently with Task C/Thread C. While this may not
necessarily be an issue, it does not replicate the expected behavior of a single Task running at
once.

Note that Thread B will resume if/when Task A/ThreadA is switched to. However, this could be delayed
by an arbitrary amount of time, or could never occur.

Also note that if there are multiple non-FreeRTOS pthreads that Thread D, E, F...etc could suffer the
same fate as Thread B, if the scheduler were to suspend Task C/Thread C and resume Task E/Thread E.

Implementation
==============

Timer details
-------------
A standalone pthread for the signal generation thread was chosen, rather than using
a posix timer_settime() handler function because the latter creates a temporary
pthread for each handler callback. This makes debugging much more difficult due to
gdb detecting the creation and destruction of these temporary threads.

Signal delivery
--------------
While signal handlers are per-thread, it is possible for pthreads to selectively block
signals, rather than using thread directed signals. However, the approach of blocking
signals in non-FreeRTOS pthreads adds complexity to each of these non-FreeRTOS pthreads
including ensuring that these signals are blocked at thread creation, prior to the thread
starting up. Directed signals removes the requirement for non-FreeRTOS pthreads to be aware
of and take action to protect against these signals, reducing complexity.
For a clean shutdown where memory is freed, it is necessary for all pthreads to be joined
at shutdown.

Previously there was explicit cancellation of the idle task and timer daemon task, however
there may be a number of other tasks in the system, both system created and user created,
and those tasks/threads were being left at shutdown.

This change calls pthread_cancel()/pthread_join() on all FreeRTOS managed pthreads upon
shutdown.
…StartScheduler() to be called again

In the posix use case it is possible to repeatedly start and stop the scheduler.

FreeRTOS has no routine for initializing internal state as internal state variables are
initialized at application start up.

For subsequent scheduler starts there were variables left with state that was preventing
the scheduler from starting up.

Re-initialize these before vTaskStartScheduler() exits.
…d in xPortStartScheduler

Otherwise subsequent calls to xPortStartScheduler will incorrectly return immediately as
xSchedulerEnd was still set from the previous scheduler run.
…llowing them to specify the stack size

Change from pthread_attr_setstack() to pthread_attr_setstacksize(), and automatically adjust the stack size
to be at least PTHREAD_STACK_MIN if it wasn't already, removing the size warning.

This permits the user to increase the pthread stack size beyond the PTHREAD_STACK_MIN default of 16384 if
desired, without producing a warning in the typical case where stacks are minimized for RAM limited targets.

Continue to store thread paramters on the provided stack, for consistency with the MCU targets.

Previously pthread_attr_setstack() was used to enable user defined stacks.

Allowing user stacks has a few issues:

1. Stack sizes are limited to preserve available memory. Memory is not limited on systems running the POSIX port
   so there is no reason to attempt to limit the size of stack memory.

2. pxPortInitialiseStack() would print out warnings, and pthread_addr_setstack() would fail on stacks
   smaller than PTHREAD_STACK_MIN (16384) bytes. However PTHREAD_STACK_MIN may be larger than many task
   stacks so several warnings may be printed out. But, given #1 there is nothing really to worry about here.

3. Apparently it isn't possible to reuse stack memory once its been used in a pthread via pthread_attr_setstack(),
   see https://stackoverflow.com/a/5422134
   Reuse of stack memory was also resulting in valgrind 'invalid write' errors to what was demonstrably
   valid memory. Root cause not determined.
…use of it in POSIX port to avoid shutdown deadlock

Fixes an issue where a pthread_cancel() of the idle thread will hang, blocking FreeRTOS shutdown, due to a lack of
cancellation points.

Introduce a POSIX port specific cancellation point in vPortIdle() through a call to pthread_testcancel().
…ling with non-FreeRTOS pthreads

Improve upon the elegant approach of using signals to cause task/pthreads
suspension and scheduler execution by using directed signals.

This fixes:
- Deadlocks in non-FreeRTOS pthreads
- Multiple FreeRTOS tasks(pthreads) incorrectly running at the same time

By directing the signals using pthread_kill() the signal handler in the presently running
FreeRTOS task/pthread will be called, ensuring that the scheduler runs both in the context
of a FreeRTOS task/pthread and from the presently executing FreeRTOS task/pthread.

Details
==============

The POSIX port uses signals to preempt FreeRTOS tasks (implemented as pthreads), a very neat and elegant
approach to forcing tasks/pthreads to suspend and run the scheduler.

Signal handlers are process global.

Posix timers generate signals when the timer expires, and the signal is sent to the currently
running pthread.

In systems where there are pthreads that are NOT a result of creating FreeRTOS tasks, such as the
entry point thread that calls main(), or user created pthreads, this poses a serious issue.

While the POSIX port only allows a single FreeRTOS pthread to run at once, by causing all suspended
threads to not be scheduled due to their waiting on a pthread condition variable,
this isn't the case with non-FreeRTOS pthreads.

Thus it is possible that a non-FreeRTOS pthread is running when the timer expires and the signal
is generated. This results in the signal handler running in the non-FreeRTOS thread.

The sequence of events results in these events from signal handler context:
- vPortSystemTickHandler() being called
- The scheduler running
- Selecting another FreeRTOS task to run and switching the active task
- The newly selected task released from suspension by pthread_cond_signal()
- The presently active thread calling event_wait()
- The pthread calling pthread_cond_wait(), suspending the thread and allowing the host OS scheduler
  to schedule another thread to run.

If this occurs from a non-FreeRTOS thread this results in:
- The active FreeRTOS pthread (Task A/Thread A) continuing to run (as the signal handler that calls
  event_wait() ran instead in a non-FreeRTOS pthread.
- The pthread where the signal handler did run (Thread B) will call event_wait() and pthread_cond_wait(),
  but on the condition variable of the previously active FreeRTOS task, oops. This causes the
  non-FreeRTOS pthread to block unexpectedly relative to what the developer might have expected.
- The newly selected FreeRTOS Task (Task C/Thread C) will resume and start running.

At this point Task A/Thread A is running concurrently with Task C/Thread C. While this may not
necessarily be an issue, it does not replicate the expected behavior of a single Task running at
once.

Note that Thread B will resume if/when Task A/ThreadA is switched to. However, this could be delayed
by an arbitrary amount of time, or could never occur.

Also note that if there are multiple non-FreeRTOS pthreads that Thread D, E, F...etc could suffer the
same fate as Thread B, if the scheduler were to suspend Task C/Thread C and resume Task E/Thread E.

Implementation
==============

Timer details
-------------
A standalone pthread for the signal generation thread was chosen, rather than using
a posix timer_settime() handler function because the latter creates a temporary
pthread for each handler callback. This makes debugging much more difficult due to
gdb detecting the creation and destruction of these temporary threads.

Signal delivery
--------------
While signal handlers are per-thread, it is possible for pthreads to selectively block
signals, rather than using thread directed signals. However, the approach of blocking
signals in non-FreeRTOS pthreads adds complexity to each of these non-FreeRTOS pthreads
including ensuring that these signals are blocked at thread creation, prior to the thread
starting up. Directed signals removes the requirement for non-FreeRTOS pthreads to be aware
of and take action to protect against these signals, reducing complexity.
For a clean shutdown where memory is freed, it is necessary for all pthreads to be joined
at shutdown.

Previously there was explicit cancellation of the idle task and timer daemon task, however
there may be a number of other tasks in the system, both system created and user created,
and those tasks/threads were being left at shutdown.

This change calls pthread_cancel()/pthread_join() on all FreeRTOS managed pthreads upon
shutdown.
…StartScheduler() to be called again

In the posix use case it is possible to repeatedly start and stop the scheduler.

FreeRTOS has no routine for initializing internal state as internal state variables are
initialized at application start up.

For subsequent scheduler starts there were variables left with state that was preventing
the scheduler from starting up.

Re-initialize these before vTaskStartScheduler() exits.
…d in xPortStartScheduler

Otherwise subsequent calls to xPortStartScheduler will incorrectly return immediately as
xSchedulerEnd was still set from the previous scheduler run.
…llowing them to specify the stack size

Change from pthread_attr_setstack() to pthread_attr_setstacksize(), and automatically adjust the stack size
to be at least PTHREAD_STACK_MIN if it wasn't already, removing the size warning.

This permits the user to increase the pthread stack size beyond the PTHREAD_STACK_MIN default of 16384 if
desired, without producing a warning in the typical case where stacks are minimized for RAM limited targets.

Continue to store thread paramters on the provided stack, for consistency with the MCU targets.

Previously pthread_attr_setstack() was used to enable user defined stacks.

Allowing user stacks has a few issues:

1. Stack sizes are limited to preserve available memory. Memory is not limited on systems running the POSIX port
   so there is no reason to attempt to limit the size of stack memory.

2. pxPortInitialiseStack() would print out warnings, and pthread_addr_setstack() would fail on stacks
   smaller than PTHREAD_STACK_MIN (16384) bytes. However PTHREAD_STACK_MIN may be larger than many task
   stacks so several warnings may be printed out. But, given #1 there is nothing really to worry about here.

3. Apparently it isn't possible to reuse stack memory once its been used in a pthread via pthread_attr_setstack(),
   see https://stackoverflow.com/a/5422134
   Reuse of stack memory was also resulting in valgrind 'invalid write' errors to what was demonstrably
   valid memory. Root cause not determined.
* Prevent a running task can't be cancelled
@chinglee-iot chinglee-iot requested a review from aggarg January 25, 2024 11:08
croutine.c Outdated Show resolved Hide resolved
portable/MemMang/heap_1.c Outdated Show resolved Hide resolved
portable/MemMang/heap_4.c Outdated Show resolved Hide resolved
portable/MemMang/heap_5.c Outdated Show resolved Hide resolved
chinglee-iot and others added 2 commits January 26, 2024 18:03
Apply suggestions to prevent implicit type conversion.

Co-authored-by: Soren Ptak <[email protected]>
@chinglee-iot chinglee-iot requested a review from Skptak January 26, 2024 10:27
tasks.c Outdated Show resolved Hide resolved
@chinglee-iot chinglee-iot requested a review from hoxi January 31, 2024 03:50
@cmorganBE
Copy link
Contributor

Does it look like this one might be merged in the near future?

@chinglee-iot I'll go back at some point and update my examples repo based on yours and what I've been doing here for thread shutdown. Otherwise the fixes to the posix port and shutdown have been fantastic at improving the usabiilty of FreeRTOS on linux for testing and testing under valgrind.

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@chinglee-iot chinglee-iot merged commit 1a500f1 into FreeRTOS:main Feb 19, 2024
16 of 18 checks passed
@chinglee-iot
Copy link
Member Author

@cmorganBE
This PR is merged. Special thanks to you for creating issues and PRs for this feature and discussing with us about the implementation detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants