Skip to content

Commit

Permalink
sched/core: Fix forceidle balancing
Browse files Browse the repository at this point in the history
Steve reported that ChromeOS encounters the forceidle balancer being
ran from rt_mutex_setprio()'s balance_callback() invocation and
explodes.

Now, the forceidle balancer gets queued every time the idle task gets
selected, set_next_task(), which is strictly too often.
rt_mutex_setprio() also uses set_next_task() in the 'change' pattern:

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

However, rt_mutex_setprio() will explicitly not run this pattern on
the idle task (since priority boosting the idle task is quite insane).
Most other 'change' pattern users are pidhash based and would also not
apply to idle.

Also, the change pattern doesn't contain a __balance_callback()
invocation and hence we could have an out-of-band balance-callback,
which *should* trigger the WARN in rq_pin_lock() (which guards against
this exact anti-pattern).

So while none of that explains how this happens, it does indicate that
having it in set_next_task() might not be the most robust option.

Instead, explicitly queue the forceidle balancer from pick_next_task()
when it does indeed result in forceidle selection. Having it here,
ensures it can only be triggered under the __schedule() rq->lock
instance, and hence must be ran from that context.

This also happens to clean up the code a little, so win-win.

Fixes: d2dfa17 ("sched: Trivial forced-newidle balancer")
Reported-by: Steven Rostedt <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: T.J. Alumbaugh <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
  • Loading branch information
Peter Zijlstra committed Apr 5, 2022
1 parent 3123109 commit 5b6547e
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 11 deletions.
14 changes: 10 additions & 4 deletions kernel/sched/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -5752,6 +5752,8 @@ static inline struct task_struct *pick_task(struct rq *rq)

extern void task_vruntime_update(struct rq *rq, struct task_struct *p, bool in_fi);

static void queue_core_balance(struct rq *rq);

static struct task_struct *
pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
Expand Down Expand Up @@ -5801,7 +5803,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
}

rq->core_pick = NULL;
return next;
goto out;
}

put_prev_task_balance(rq, prev, rf);
Expand Down Expand Up @@ -5851,7 +5853,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
*/
WARN_ON_ONCE(fi_before);
task_vruntime_update(rq, next, false);
goto done;
goto out_set_next;
}
}

Expand Down Expand Up @@ -5970,8 +5972,12 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
resched_curr(rq_i);
}

done:
out_set_next:
set_next_task(rq, next);
out:
if (rq->core->core_forceidle_count && next == rq->idle)
queue_core_balance(rq);

return next;
}

Expand Down Expand Up @@ -6066,7 +6072,7 @@ static void sched_core_balance(struct rq *rq)

static DEFINE_PER_CPU(struct callback_head, core_balance_head);

void queue_core_balance(struct rq *rq)
static void queue_core_balance(struct rq *rq)
{
if (!sched_core_enabled(rq))
return;
Expand Down
1 change: 0 additions & 1 deletion kernel/sched/idle.c
Original file line number Diff line number Diff line change
Expand Up @@ -434,7 +434,6 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir
{
update_idle_core(rq);
schedstat_inc(rq->sched_goidle);
queue_core_balance(rq);
}

#ifdef CONFIG_SMP
Expand Down
6 changes: 0 additions & 6 deletions kernel/sched/sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -1232,8 +1232,6 @@ static inline bool sched_group_cookie_match(struct rq *rq,
return false;
}

extern void queue_core_balance(struct rq *rq);

static inline bool sched_core_enqueued(struct task_struct *p)
{
return !RB_EMPTY_NODE(&p->core_node);
Expand Down Expand Up @@ -1267,10 +1265,6 @@ static inline raw_spinlock_t *__rq_lockp(struct rq *rq)
return &rq->__lock;
}

static inline void queue_core_balance(struct rq *rq)
{
}

static inline bool sched_cpu_cookie_match(struct rq *rq, struct task_struct *p)
{
return true;
Expand Down

0 comments on commit 5b6547e

Please sign in to comment.