Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve POSIX port functionality #914

Merged
merged 14 commits into from
Jan 11, 2024
Merged

Conversation

cmorganBE
Copy link
Contributor

Posix port improvements.

I'm interested in feedback on these changes. You can see the test applications here, https://github.com/cmorganBE/freertos_posix

Description

The posix port had a handful of issues that were making it difficult to use, including:

  • Deadlocks in non-freertos pthreads
  • Unable for the scheduler to cleanly shut down
  • Unable for the scheduler to be restarted
  • Valgrind errors (leaks etc)

These commits build upon the posix port to resolve these issues.

Test Steps

Checklist:

  • I have tested my changes. No regression in existing tests.
  • I have modified and/or added unit-tests to cover the code changes in this Pull Request.

Related Issue

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@cmorganBE cmorganBE requested a review from a team as a code owner November 29, 2023 20:04

if( iRet != 0 )
{
fprintf( stderr, "[WARN] pthread_attr_setstack failed with return value: %d. Default stack will be used.\n", iRet );
fprintf( stderr, "[WARN] Increase the stack size to PTHREAD_STACK_MIN.\n" );
fprintf( stderr, "[WARN] pthread_attr_setstacksize failed with return value: %d. Default stack size will be used.\n", iRet );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this log message unnecessary now that the stack size is set to be PTHREAD_STACK_MIN if ulStackSize is less than PTHREAD_STACK_SIZE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pthreaad_attr_setstacksize() could still fail and in those cases I thought it was useful to report it to the user. It should be very rare but there are a number of conditions that could result in it failing and given that this is an OS it seemed like we should handle all error conditions in some manner, even if that's a warning printout since the consequences are usually minimal if this does fail.

@cmorganBE cmorganBE force-pushed the cmm_posix branch 2 times, most recently from 844cd8a to c4ffc40 Compare November 30, 2023 15:26
@cmorganBE
Copy link
Contributor Author

@Skptak can you re-initiate the workflows? I was waiting for them to complete and just realized they don't run without approval....

tasks.c Outdated Show resolved Hide resolved
@Skptak
Copy link
Member

Skptak commented Dec 1, 2023

Hey @cmorganBE, thanks for moving that idle hook call to work the way you have it set up now!
And thanks for responding to the comments in your PR 😄

There are a few FreeRTOS Style Guide guide differences in your PR than what we use. For example the static pthread_t timer_tick_thread; would be called something closer to static Thread_t pxTimerTask. Then these files should be run using uncrustify version .69.

Where I was wondering if you'd be fine with me adding a few commits to your PR to address these problems? If you want to do it yourself that's fine as well!

I appreciate the effort you've put into this PR so far, and thanks again for your contribution!

@cmorganBE
Copy link
Contributor Author

cmorganBE commented Dec 1, 2023

Hi @Skptak . It's no trouble. We've got a ton of code that would be helped by having this working better and its been interesting learning how the port works and more about freertos internals.

If you want to add commits and feel they align with the spirit of the PR feel free to do so.

On the timer tick thread, this isn't a FreeRTOS managed thread per-se, that's why I didn't use Thread_t there, there isn't much use for most of the stuff in that struct. Renaming it to pxTimerTask might make sense but again, it's not a freeRTOS task at the moment. I think it wouldn't make sense to make it a FreeRTOS task either, in that case the task would end up being suspended and I think the signals that serve as interrupts in this approach wouldn't be generated correctly.

Hey @cmorganBE, thanks for moving that idle hook call to work the way you have it set up now! And thanks for responding to the comments in your PR 😄

There are a few FreeRTOS Style Guide guide differences in your PR than what we use. For example the static pthread_t timer_tick_thread; would be called something closer to static Thread_t pxTimerTask. Then these files should be run using uncrustify version .69.

Where I was wondering if you'd be fine with me adding a few commits to your PR to address these problems? If you want to do it yourself that's fine as well!

I appreciate the effort you've put into this PR so far, and thanks again for your contribution!

@chinglee-iot
Copy link
Member

@cmorganBE
I am also running the test you shared in this repository https://github.com/cmorganBE/freertos_posix.
It probably take some time for me to review some of the changes.
Once I finish, I will update in this PR again.

@cmorganBE cmorganBE force-pushed the cmm_posix branch 2 times, most recently from fc37d5f to 45a6885 Compare December 5, 2023 21:03
@chinglee-iot
Copy link
Member

@chinglee-iot I saw some mention of interrupts being disabled and the timer tick causing issues but maybe that went away during an edit? If that is an issue we could have the port use an atomic variable to determine if a tick should be dispatched, and disabling interrupts call could flip that atomic, timer thread checks that and won't dispatch it if it isn't set to true.

Yes. I edited the reply cause this information is wrong and no need to be discussed. In this PR, SIGALARM will only be sent to the thread which is running the FreeRTOS current task.

@chinglee-iot
Copy link
Member

@cmorganBE
Thank you for your reply. Timer tick thread makes sense to me. Just valgrind signal handling doesn't work with it very well. I intend to approve timer tick thread as it has more benefit. Some format NIT suggestion:

  • Variable naming convention : this file use camel case. We can rename timer_tick_thread to hTimerTickThread, timer_tick_thread_should_run to xTimerTickThreadShouldRun and timer_tick_handler to prvTimerTickHandler
  • Not using cpp style comment //. Using /* and */ instead.

At the moment we are only calling vTaskEndScheduler from a freertos thread. I this it is a reasonable requirement to say that "vTaskEndScheduler must be called from a freertos thread".

I'm not following the vApplicationIdleHook. Are you saying that the application idle hook could be used to call vTaskEndScheduler to aid in not requiring a separate task from which to call it? I can see that as working and that makes sense.

Yes, this is one of the way how you can use vApplicationIdleHook.

From your previous description. The idle hook introduced in this PR tries to create a cancellation point for the idle task. Otherwise, the idle task will be running in a loop and can't be cancelled by main thread.

I would like to discuss the following scenarios where the vTaskEndScheduler is called:

  • In other FreeRTOS task : The idle task must be waiting for event event_wait( thread->ev ); and can later be cancelled by main thread.
  • In the idle task : You may do this with vApplicationIdleHook. The task calling vTaskEndScheduler can be cancelled without problem.
  • In non-FreeRTOS task : It is still possible that the idle task is running in a loop without cancellation point. We may create a cancellation point in vApplicationIdleHook for idle task instead of portHAS_IDLE_HOOK.

Since vTaskEndScheduler is only called from a FreeRTOS task, I don't think portHAS_IDLE_HOOK is necessary to cancel the idle task.

@cmorganBE
Copy link
Contributor Author

@chinglee-iot I also don't see any test failures if I disable the cancellation point. I'll drop the port idle change off in the next push of this branch. I can't help but think that if the idle thread isn't waiting on an event that we'd have an issue where it never hit a cancellation point. Perhaps that behavior was changed by the directed signals. In any case I'll drop it and keep a copy of it here locally just in case.

I'll also update the comments to use c style comments, and adjust various names as you've mentioned and across the changes made.

…ling with non-FreeRTOS pthreads

Improve upon the elegant approach of using signals to cause task/pthreads
suspension and scheduler execution by using directed signals.

This fixes:
- Deadlocks in non-FreeRTOS pthreads
- Multiple FreeRTOS tasks(pthreads) incorrectly running at the same time

By directing the signals using pthread_kill() the signal handler in the presently running
FreeRTOS task/pthread will be called, ensuring that the scheduler runs both in the context
of a FreeRTOS task/pthread and from the presently executing FreeRTOS task/pthread.

Details
==============

The POSIX port uses signals to preempt FreeRTOS tasks (implemented as pthreads), a very neat and elegant
approach to forcing tasks/pthreads to suspend and run the scheduler.

Signal handlers are process global.

Posix timers generate signals when the timer expires, and the signal is sent to the currently
running pthread.

In systems where there are pthreads that are NOT a result of creating FreeRTOS tasks, such as the
entry point thread that calls main(), or user created pthreads, this poses a serious issue.

While the POSIX port only allows a single FreeRTOS pthread to run at once, by causing all suspended
threads to not be scheduled due to their waiting on a pthread condition variable,
this isn't the case with non-FreeRTOS pthreads.

Thus it is possible that a non-FreeRTOS pthread is running when the timer expires and the signal
is generated. This results in the signal handler running in the non-FreeRTOS thread.

The sequence of events results in these events from signal handler context:
- vPortSystemTickHandler() being called
- The scheduler running
- Selecting another FreeRTOS task to run and switching the active task
- The newly selected task released from suspension by pthread_cond_signal()
- The presently active thread calling event_wait()
- The pthread calling pthread_cond_wait(), suspending the thread and allowing the host OS scheduler
  to schedule another thread to run.

If this occurs from a non-FreeRTOS thread this results in:
- The active FreeRTOS pthread (Task A/Thread A) continuing to run (as the signal handler that calls
  event_wait() ran instead in a non-FreeRTOS pthread.
- The pthread where the signal handler did run (Thread B) will call event_wait() and pthread_cond_wait(),
  but on the condition variable of the previously active FreeRTOS task, oops. This causes the
  non-FreeRTOS pthread to block unexpectedly relative to what the developer might have expected.
- The newly selected FreeRTOS Task (Task C/Thread C) will resume and start running.

At this point Task A/Thread A is running concurrently with Task C/Thread C. While this may not
necessarily be an issue, it does not replicate the expected behavior of a single Task running at
once.

Note that Thread B will resume if/when Task A/ThreadA is switched to. However, this could be delayed
by an arbitrary amount of time, or could never occur.

Also note that if there are multiple non-FreeRTOS pthreads that Thread D, E, F...etc could suffer the
same fate as Thread B, if the scheduler were to suspend Task C/Thread C and resume Task E/Thread E.

Implementation
==============

Timer details
-------------
A standalone pthread for the signal generation thread was chosen, rather than using
a posix timer_settime() handler function because the latter creates a temporary
pthread for each handler callback. This makes debugging much more difficult due to
gdb detecting the creation and destruction of these temporary threads.

Signal delivery
--------------
While signal handlers are per-thread, it is possible for pthreads to selectively block
signals, rather than using thread directed signals. However, the approach of blocking
signals in non-FreeRTOS pthreads adds complexity to each of these non-FreeRTOS pthreads
including ensuring that these signals are blocked at thread creation, prior to the thread
starting up. Directed signals removes the requirement for non-FreeRTOS pthreads to be aware
of and take action to protect against these signals, reducing complexity.
For a clean shutdown where memory is freed, it is necessary for all pthreads to be joined
at shutdown.

Previously there was explicit cancellation of the idle task and timer daemon task, however
there may be a number of other tasks in the system, both system created and user created,
and those tasks/threads were being left at shutdown.

This change calls pthread_cancel()/pthread_join() on all FreeRTOS managed pthreads upon
shutdown.
@cmorganBE
Copy link
Contributor Author

@chinglee-iot addressed naming, the port idle change, comments, and a few other minor formatting things with this latest push.

@chinglee-iot
Copy link
Member

chinglee-iot commented Jan 10, 2024

@cmorganBE
I created a PR to your branch to fix the following:

  • Add back pthread_attr_setstacksize in your PR
  • Stop timer in vPortEndScheduler
  • Move list insert and remove function call to a critical section
  • Create cancellation point in prvSuspendSelf function

Please help to take a look. I will also revert the posix change in #944.

Some update about re-initialize PR.
We intend to public the re-initialization ( will be rename to reset state ) API in this PR. User will need to call these APIs before next time restart scheduler.

Pseudo example code :

/* First time start scheduler. */
vTaskStartScheduler();

/* End scheduler in a FreeRTOS task. */
vTaskEndScheduler();

/* Calling FreeRTOS reset state API before next time restart scheduler. */
#if ( configUSE_CO_ROUTINES == 1 )
{
    vCoRoutineResetState();
}
#endif
#if ( configUSE_TIMERS == 1 )
{
    vTimerResetState();
}
#endif /* #if ( configUSE_TIMERS == 1 ) */
#if ( configSUPPORT_DYNAMIC_ALLOCATION == 1 )
    vPortHeapResetState();
#endif
vTaskResetState();

/* Restart scheduler. */
vTaskStartScheduler();

/* End scheduler in a FreeRTOS task. */
vTaskEndScheduler();

@cmorganBE
Copy link
Contributor Author

@chinglee-iot only had one comment on the pthread stack stuff on the other PR.

chinglee-iot and others added 3 commits January 10, 2024 20:32
…y itself, to allowing them to specify the stack size

Change from pthread_attr_setstack() to pthread_attr_setstacksize(), and automatically adjust the stack size
to be at least PTHREAD_STACK_MIN if it wasn't already, removing the size warning.

This permits the user to increase the pthread stack size beyond the PTHREAD_STACK_MIN default of 16384 if
desired, without producing a warning in the typical case where stacks are minimized for RAM limited targets.

Continue to store thread paramters on the provided stack, for consistency with the MCU targets.

Previously pthread_attr_setstack() was used to enable user defined stacks.

Note that:

1. The stack size can still be specified by the user.

2. pxPortInitialiseStack(), and pthread_addr_setstack() was failing on stacks of typical size, as
   these are smaller than PTHREAD_STACK_MIN (16384) bytes, and printing out a series of warnings.
   Improve usability by having the posix port automatically increase the stack size to be
   at least PTHREAD_STACK_MIN as posix platforms have enough memory for this not to be a concern.

3. Reuse of stack memory will also result in valgrind 'invalid write' errors to what is demonstrably
   valid memory. Root cause is that Valgrind is tracking a stack pointer as the stack is used.
   Reuse of a stack buffer results in the stack being used at its start, in an area that Valgrind thinks
   is far away from the start of the stack. There are ways to notify Valgrind of these changes
   however this would require linking against and calling Valgrind functions from the FreeRTOS application using
   the posix port, https://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq.

   Also, apparently it isn't permitted by posix to reuse stack memory once its been used in a pthread via pthread_attr_setstack(),
   see https://stackoverflow.com/a/5422134
@cmorganBE
Copy link
Contributor Author

@chinglee-iot merged and re-added the stack size commit, let me know if it looks ok. Rewrote the commit log there to improve the arguments for the change.

@cmorganBE
Copy link
Contributor Author

@chinglee-iot all tests except the restart ones pass from [email protected]:cmorganBE/freertos_posix.git

Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@paulbartell paulbartell merged commit 3baa3dd into FreeRTOS:main Jan 11, 2024
16 checks passed
@Skptak Skptak mentioned this pull request Jan 18, 2024
2 tasks
@cmorganBE cmorganBE deleted the cmm_posix branch January 25, 2024 13:20
aggarg added a commit that referenced this pull request Jan 28, 2024
PR #914 caused Posix Port to fail to build on MacOS. This PR fixes
teh build failure.

This PR also adds a Matrix configuration to the GitHub kernel-demo
workflow to build the Posix Demos on MacOS.
---------

Co-authored-by: chinglee-iot <[email protected]>
Co-authored-by: Gaurav-Aggarwal-AWS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants