-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler: Unexpected behaviour on 'sched_yield' when task priority is 0 (minimum) and 'configIDLE_SHOULD_YIELD = 1' (IDFGH-5528) #7256
Comments
Hello again, esp_err_t err = esp_register_freertos_idle_hook_for_cpu(my_idle_hook, 0);
if (err != ESP_OK)
{
printf("'esp_register_freertos_idle_hook_for_cpu' on coreID 0 failed\r\n");
}
err = esp_register_freertos_idle_hook_for_cpu(my_idle_hook, 1);
if (err != ESP_OK)
{
printf("'esp_register_freertos_idle_hook_for_cpu' on coreID 1 failed\r\n");
} bool my_idle_hook(void)
{
sched_yield();
return true;
} Doing that the behaviour is the same as setting Reading the documentation I see that on "esp_register_freertos_idle_hook_for_cpu": * @warning Idle callbacks MUST NOT, UNDER ANY CIRCUMSTANCES, CALL
* A FUNCTION THAT MIGHT BLOCK. So ESP-IDF is not prepared to make IDLE tasks to YIELD maybe? Thanks |
We don't support modification of But I'll double check the scheduling behavior. Is the code you posted above run with both cores enabled (i.e., is |
Thanks @Dazza0 , Carles |
@carlessole We've located the bug. There's a bug that can cause the same task to be repeatedly schedule under certain conditions. We have an MR in review to fix the "skipping" behavior of the round robin scheduling. |
@Dazza0 |
@AxelLin Regarding temporary code for testing, based off the current master, replace the static void taskSelectHighestPriorityTaskSMP( void )
{
/* This function is called from a critical section. So some optimizations are made */
BaseType_t uxCurPriority;
BaseType_t xTaskScheduled = pdFALSE;
BaseType_t xNewTopPrioritySet = pdFALSE;
BaseType_t xCoreID = xPortGetCoreID(); /* Optimization: Read once */
/* Search for tasks, starting form the highest ready priority. If nothing is
* found, we eventually default to the IDLE tasks at priority 0 */
for ( uxCurPriority = uxTopReadyPriority; uxCurPriority >= 0 && xTaskScheduled == pdFALSE; uxCurPriority-- )
{
/* Check if current priority has one or more ready tasks. Skip if none */
if( listLIST_IS_EMPTY( &( pxReadyTasksLists[ uxCurPriority ] ) ) )
{
continue;
}
/* Save a copy of highest priority that has a ready state task */
if( xNewTopPrioritySet == pdFALSE )
{
xNewTopPrioritySet = pdTRUE;
uxTopReadyPriority = uxCurPriority;
}
/* We now search this priority's ready task list for a runnable task */
/* We always start searching from the head of the list, so we reset
* pxIndex to point to the tail so that we start walking the list from
* the first item */
pxReadyTasksLists[ uxCurPriority ].pxIndex = ( ListItem_t * ) &( pxReadyTasksLists[ uxCurPriority ].xListEnd );
/* Get the first item on the list */
TCB_t * pxTCBCur;
TCB_t * pxTCBFirst;
listGET_OWNER_OF_NEXT_ENTRY( pxTCBCur, &( pxReadyTasksLists[ uxCurPriority ] ) );
pxTCBFirst = pxTCBCur;
do
{
/* Check if the current task is currently being executed. However, if
* it's being executed by the current core, we can still schedule it.
* Todo: Each task can store a xTaskRunState, instead of needing to
* check each core */
UBaseType_t ux;
for( ux = 0; ux < ( UBaseType_t )configNUM_CORES; ux++ )
{
if ( ux == xCoreID )
{
continue;
}
else if ( pxCurrentTCB[ux] == pxTCBCur )
{
/* Current task is already being executed. Get the next task */
goto get_next_task;
}
}
/* Check if the current task has has a compatible affinity */
if ( ( pxTCBCur->xCoreID != xCoreID ) && ( pxTCBCur->xCoreID != tskNO_AFFINITY ) )
{
goto get_next_task;
}
/* The current task is runnable. Schedule it */
pxCurrentTCB[ xCoreID ] = pxTCBCur;
xTaskScheduled = pdTRUE;
/* Move the current tasks list item to the back of the list in order
* to implement best effort round robin. To do this, we need to reset
* the pxIndex to point to the tail again. */
pxReadyTasksLists[ uxCurPriority ].pxIndex = ( ListItem_t * ) &( pxReadyTasksLists[ uxCurPriority ].xListEnd );
uxListRemove( &( pxTCBCur->xStateListItem ) );
vListInsertEnd( &( pxReadyTasksLists[ uxCurPriority ] ), &( pxTCBCur->xStateListItem ) );
break;
get_next_task:
/* The current task cannot be scheduled. Get the next task in the list */
listGET_OWNER_OF_NEXT_ENTRY( pxTCBCur, &( pxReadyTasksLists[ uxCurPriority ] ) );
} while( pxTCBCur != pxTCBFirst); /* Check to see if we've walked the entire list */
}
assert( xTaskScheduled == pdTRUE ); /* At this point, a task MUST have been scheduled */
}
void vTaskSwitchContext( void )
{
//Theoretically, this is only called from either the tick interrupt or the crosscore interrupt, so disabling
//interrupts shouldn't be necessary anymore. Still, for safety we'll leave it in for now.
int irqstate=portENTER_CRITICAL_NESTED();
if( uxSchedulerSuspended[ xPortGetCoreID() ] != ( UBaseType_t ) pdFALSE )
{
/* The scheduler is currently suspended - do not allow a context
* switch. */
xYieldPending[ xPortGetCoreID() ] = pdTRUE;
}
else
{
xYieldPending[ xPortGetCoreID() ] = pdFALSE;
xSwitchingContext[ xPortGetCoreID() ] = pdTRUE;
traceTASK_SWITCHED_OUT();
#if ( configGENERATE_RUN_TIME_STATS == 1 )
{
#ifdef portALT_GET_RUN_TIME_COUNTER_VALUE
portALT_GET_RUN_TIME_COUNTER_VALUE( ulTotalRunTime );
#else
ulTotalRunTime = portGET_RUN_TIME_COUNTER_VALUE();
#endif
/* Add the amount of time the task has been running to the
* accumulated time so far. The time the task started running was
* stored in ulTaskSwitchedInTime. Note that there is no overflow
* protection here so count values are only valid until the timer
* overflows. The guard against negative values is to protect
* against suspect run time stat counter implementations - which
* are provided by the application, not the kernel. */
taskENTER_CRITICAL_ISR();
if( ulTotalRunTime > ulTaskSwitchedInTime[ xPortGetCoreID() ] )
{
pxCurrentTCB[ xPortGetCoreID() ]->ulRunTimeCounter += ( ulTotalRunTime - ulTaskSwitchedInTime[ xPortGetCoreID() ] );
}
else
{
mtCOVERAGE_TEST_MARKER();
}
taskEXIT_CRITICAL_ISR();
ulTaskSwitchedInTime[ xPortGetCoreID() ] = ulTotalRunTime;
}
#endif /* configGENERATE_RUN_TIME_STATS */
/* Check for stack overflow, if configured. */
taskFIRST_CHECK_FOR_STACK_OVERFLOW();
taskSECOND_CHECK_FOR_STACK_OVERFLOW();
/* Select a new task to run */
/*
We cannot do taskENTER_CRITICAL_ISR(); here because it saves the interrupt context to the task tcb, and we're
swapping that out here. Instead, we're going to do the work here ourselves. Because interrupts are already disabled, we only
need to acquire the mutex.
*/
vPortCPUAcquireMutex( &xTaskQueueMutex );
#if !configUSE_PORT_OPTIMISED_TASK_SELECTION
taskSelectHighestPriorityTaskSMP();
#else
//For Unicore targets we can keep the current FreeRTOS O(1)
//Scheduler. I hope to optimize better the scheduler for
//Multicore settings -- This will involve to create a per
//affinity ready task list which will impact hugely on
//tasks module
taskSELECT_HIGHEST_PRIORITY_TASK();
#endif
traceTASK_SWITCHED_IN();
xSwitchingContext[ xPortGetCoreID() ] = pdFALSE;
#ifdef ESP_PLATFORM
//Exit critical region manually as well: release the mux now, interrupts will be re-enabled when we
//exit the function.
vPortCPUReleaseMutex( &xTaskQueueMutex );
#endif // ESP_PLATFORM
#if CONFIG_FREERTOS_WATCHPOINT_END_OF_STACK
vPortSetStackWatchpoint(pxCurrentTCB[xPortGetCoreID()]->pxStack);
#endif
}
portEXIT_CRITICAL_NESTED(irqstate);
} The new algorithm basically inserts a scheduled tasks to the back of the list, thus making sure that unscheduled are selected first on the next tick. |
@AxelLin Yes, that is also a solution. Edit: Had another look. Looks like it does the same thing as the current algorithm. |
@Dazza0 (I do have multiple tasks with the same priority in my own project, so I'm nervous about the impact of this issue.) |
Edit: In summary, most application scenarios won't run into this issue as we don't expect applications to have scenarios where two tasks of the same priority run forever (the task watch dog will trigger). So as long as those tasks block, the pattern will eventually break. |
I notice your commit removed taskENTER_CRITICAL_ISR(); and taskEXIT_CRITICAL_ISR(); around the code calculating ulRunTimeCounter Also FYI, now the ulTotalRunTime is volatile in upstream code: |
|
The previous SMP freertos round robin would skip over tasks when time slicing. This commit implements a Best Effort Round Robin where selected tasks are put to the back of the list, thus makes the time slicing more fair. - Documentation has been updated accordingly. - Tidy up vTaskSwitchContext() to match v10.4.3 more - Increased esp_ipc task stack size to avoid overflow Closes #7256
@Dazza0 |
v4.3 branch still does not include this fix. |
@AxelLin Thanks for the attention, we have back ported this fix to 4.3, but still under internal reviewing, will sync up here once the fix is available on GitHub. |
The previous SMP freertos round robin would skip over tasks when time slicing. This commit implements a Best Effort Round Robin where selected tasks are put to the back of the list, thus makes the time slicing more fair. - Documentation has been updated accordingly. - Tidy up vTaskSwitchContext() to match v10.4.3 more - Increased esp_ipc task stack size to avoid overflow Closes #7256
I have found these 3 configuration parameters that I consider very important on deciding the scheduler behaviour. Their default values on ESP-IDF are:
With that default configuration if we create 3 tasks with minimum priority (0) that just perform a sched_yield (yes I have a lot of code duplications on 'test_scheduler_task_X' but I want these way to make debugging better:
With freeRTOS above configuration and these code, these are the results:
'test_scheduler_task_A' has been called 1000 times in 9999992us. Period = 9999us
'test_scheduler_task_B' has been called 1000 times in 10000000us. Period = 10000us
'test_scheduler_task_C' has been called 1000 times in 10000000us. Period = 10000us
'test_scheduler_task_A' has been called 1000 times in 9999345us. Period = 9999us
'test_scheduler_task_B' has been called 1000 times in 9999865us. Period = 9999us
'test_scheduler_task_C' has been called 1000 times in 9999869us. Period = 9999us
'test_scheduler_task_A' has been called 1000 times in 9999872us. Period = 9999us
'test_scheduler_task_B' has been called 1000 times in 9999872us. Period = 9999us
'test_scheduler_task_C' has been called 1000 times in 9999872us. Period = 9999us
As it can be seen, our tasks are executed every 10ms. We don't like it because we need to return to the tasks faster. I assume that so is happening because
configIDLE_SHOULD_YIELD = 0
and so, IDLE task are not YIELDING until a tick happens. And as freeRTOS frequency is 100Hz -> Here are the 10ms. I know I can configure the freeRTOS tick up to 1000Hz -> period = 1ms but I would like to avoid being dependent on the freeRTOS tick.
So, as here (https://www.freertos.org/a00110.html#configIDLE_SHOULD_YIELD) is explained, I tried to set
configIDLE_SHOULD_YIELD = 1
(onesp-idf\components\freertos\port\xtensa\include\freertos\FreeRTOSConfig.h
. So here are my current configuration parameters:Just doing this change on
configIDLE_SHOULD_YIELD
I see that task A is active every 20us (or less), that's great! But it seems like just task A is being executed with more latency (20us). The other two tasks are still running every 10ms.Witht he help of the debugger and this plugin (https://mcuoneclipse.com/2016/07/06/freertos-kernel-awareness-for-eclipse-from-nxp/) I can also get this information:
Is this the normal behviour?
Am I missing something?
Thanks
The text was updated successfully, but these errors were encountered: