[PATCH] sched_rt: the task in irq context can be migrated during context switching

January 05th, 2012 - 06:10 am ET by Chanho Min | Report spam
This issue happens under the following conditions
1. preemption is off
2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined.
3. RT scheduling class
4. SMP system

Sequence is as follows:
1.suppose current task is A. start schedule()
2.task A is enqueued pushable task at the entry of schedule()
__schedule
prev = rq->curr;
...
put_prev_task
put_prev_task_rt
enqueue_pushable_task
4.pick the task B as next task.
next = pick_next_task(rq);
3.rq->curr set to task B and context_switch is started.
rq->curr = next;
4.At the entry of context_swtich, release this cpu's rq->lock.
context_switch
prepare_task_switch
prepare_lock_switch
raw_spin_unlock_irq(&rq->lock);
5.Shortly after rq->lock is released, interrupt is occurred and start
IRQ context
6.try_to_wake_up() which called by ISR acquires rq->lock
try_to_wake_up
ttwu_remote
rq = __task_rq_lock(p)
ttwu_do_wakeup(rq, p, wake_flags);
task_woken_rt
7.push_rt_task picks the task A which is enqueued before.
task_woken_rt
push_rt_tasks(rq)
next_task = pick_next_pushable_task(rq)
8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
lowest_rq can be the remote rq.
(But,If preemption is on, double_lock_balance always return 1 and it
does't happen.)
push_rt_task
find_lock_lowest_rq
if (double_lock_balance(rq, lowest_rq))..
9.find_lock_lowest_rq return the available rq. task A is migrated to
the remote cpu/rq.
push_rt_task
...
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
10. But, task A is on irq context at this cpu.
So, task A is scheduled by two cpus at the same time until restore from IRQ.
Task A's stack is corrupted. Unexpected problem is occurred.

In recent ARM, I saw the patch to remove
__ARCH_WANT_INTERRUPTS_ON_CTXSW is posted.
But, if that feature is adopted by the others architecture or to
remain in the release version,,this can occur.
This is my patch to fix it. Any opinions will be appreciated.

Signed-off-by: Chanho Min <chanho.min@lge.com>

kernel/sched_rt.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 583a136..59e66e3 100644
a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1388,6 +1388,11 @@ static int push_rt_task(struct rq *rq)
if (!next_task)
return 0;

+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+ if (unlikely(task_running(rq, next_task)))
+ return 0;
+#endif
+
retry:
if (unlikely(next_task == rq->curr)) {
WARN_ON(1);
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 9 repliesReplies Make a reply

Similar topics

Replies

#1 Peter Zijlstra
January 05th, 2012 - 09:30 am ET | Report spam
On Thu, 2012-01-05 at 20:00 +0900, Chanho Min wrote:
In recent ARM, I saw the patch to remove
__ARCH_WANT_INTERRUPTS_ON_CTXSW is posted.
But, if that feature is adopted by the others architecture



No, the moment ARM removes the need, the whole feature will be removed
never to be brought back, its a horrid complication to the scheduler we
really don't ever want to bring back (like you found out).

I'll go and digest your scenario now. At the very least your patch might
want to go into -stable.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#2 Peter Zijlstra
January 05th, 2012 - 01:00 pm ET | Report spam
On Thu, 2012-01-05 at 20:00 +0900, Chanho Min wrote:
This issue happens under the following conditions
1. preemption is off
2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined.
3. RT scheduling class
4. SMP system

Sequence is as follows:
1.suppose current task is A. start schedule()
2.task A is enqueued pushable task at the entry of schedule()
__schedule
prev = rq->curr;
...
put_prev_task
put_prev_task_rt
enqueue_pushable_task
4.pick the task B as next task.
next = pick_next_task(rq);
3.rq->curr set to task B and context_switch is started.
rq->curr = next;
4.At the entry of context_swtich, release this cpu's rq->lock.
context_switch
prepare_task_switch
prepare_lock_switch
raw_spin_unlock_irq(&rq->lock);
5.Shortly after rq->lock is released, interrupt is occurred and start
IRQ context
6.try_to_wake_up() which called by ISR acquires rq->lock
try_to_wake_up
ttwu_remote
rq = __task_rq_lock(p)
ttwu_do_wakeup(rq, p, wake_flags);
task_woken_rt
7.push_rt_task picks the task A which is enqueued before.
task_woken_rt
push_rt_tasks(rq)
next_task = pick_next_pushable_task(rq)
8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
lowest_rq can be the remote rq.
(But,If preemption is on, double_lock_balance always return 1 and it
does't happen.)
push_rt_task
find_lock_lowest_rq
if (double_lock_balance(rq, lowest_rq))..
9.find_lock_lowest_rq return the available rq. task A is migrated to
the remote cpu/rq.
push_rt_task
...
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
10. But, task A is on irq context at this cpu.
So, task A is scheduled by two cpus at the same time until restore from IRQ.
Task A's stack is corrupted. Unexpected problem is occurred.

In recent ARM, I saw the patch to remove
__ARCH_WANT_INTERRUPTS_ON_CTXSW is posted.
But, if that feature is adopted by the others architecture or to
remain in the release version,,this can occur.
This is my patch to fix it. Any opinions will be appreciated.



So the problem is quite real, as already said we don't need to worry
about the future, but we might want to fix this in previous kernels.
What I'm not entirely sure of is the proposed solution, Steven don't we
get in trouble by simply bailing out on the push?

Signed-off-by: Chanho Min

kernel/sched_rt.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 583a136..59e66e3 100644
a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1388,6 +1388,11 @@ static int push_rt_task(struct rq *rq)
if (!next_task)
return 0;

+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+ if (unlikely(task_running(rq, next_task)))
+ return 0;
+#endif
+
retry:
if (unlikely(next_task == rq->curr)) {
WARN_ON(1);



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#3 Steven Rostedt
January 05th, 2012 - 01:20 pm ET | Report spam
On Thu, 2012-01-05 at 18:55 +0100, Peter Zijlstra wrote:

So the problem is quite real, as already said we don't need to worry
about the future, but we might want to fix this in previous kernels.
What I'm not entirely sure of is the proposed solution, Steven don't we
get in trouble by simply bailing out on the push?



It shouldn't break anything. We shouldn't be pushing tasks that are
running on a rq anyway. I don't see any harm here. As this scenario can
only happen if we get an interrupt after letting go of the rq lock and
before doing the switch_to(). The schedule_tail() calls
post_schedule_rt() which does the push again, and will push task A at
that time.

That said, I'm not sure this patch is enough. I'm worried about a pull
happening. As task A is running, we could possible possibly pick it on
another CPU to do a pull.

Hmm, looking at the code, the pull already does a task_running() test,
so I guess we should be fine.

Acked-by: Steven Rostedt



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#4 Peter Zijlstra
January 05th, 2012 - 04:50 pm ET | Report spam
On Thu, 2012-01-05 at 13:15 -0500, Steven Rostedt wrote:
On Thu, 2012-01-05 at 18:55 +0100, Peter Zijlstra wrote:

> So the problem is quite real, as already said we don't need to worry
> about the future, but we might want to fix this in previous kernels.
> What I'm not entirely sure of is the proposed solution, Steven don't we
> get in trouble by simply bailing out on the push?

It shouldn't break anything. We shouldn't be pushing tasks that are
running on a rq anyway.



Its not running, but its in the middle of getting scheduled out.

I don't see any harm here. As this scenario can
only happen if we get an interrupt after letting go of the rq lock and
before doing the switch_to(). The schedule_tail() calls
post_schedule_rt() which does the push again, and will push task A at
that time.



Right, so the post_schedule() hook will try again.

That said, I'm not sure this patch is enough. I'm worried about a pull
happening. As task A is running, we could possible possibly pick it on
another CPU to do a pull.

Hmm, looking at the code, the pull already does a task_running() test,
so I guess we should be fine.



Yeah, I'm not sure all those task_running() things make sense though,
when !->on_rq && ->on_cpu we should busy wait for tasks, not skip them.

Then again, with this WANT_INTERRUPTS_ON_CTXSW the busy wait crap is
tricky. Luckily its going the way of the Dodo very soon.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#5 Chanho Min
January 10th, 2012 - 03:20 am ET | Report spam
On Fri, Jan 6, 2012 at 6:45 AM, Peter Zijlstra wrote:
On Thu, 2012-01-05 at 13:15 -0500, Steven Rostedt wrote:
On Thu, 2012-01-05 at 18:55 +0100, Peter Zijlstra wrote:

> So the problem is quite real, as already said we don't need to worry
> about the future, but we might want to fix this in previous kernels.
> What I'm not entirely sure of is the proposed solution, Steven don't we
> get in trouble by simply bailing out on the push?

It shouldn't break anything. We shouldn't be pushing tasks that are
running on a rq anyway.



Its not running, but its in the middle of getting scheduled out.

I don't see any harm here. As this scenario can
only happen if we get an interrupt after letting go of the rq lock and
before doing the switch_to(). The schedule_tail() calls
post_schedule_rt() which does the push again, and will push task A at
that time.



Right, so the post_schedule() hook will try again.

That said, I'm not sure this patch is enough. I'm worried about a pull
happening. As task A is running, we could possible possibly pick it on
another CPU to do a pull.

Hmm, looking at the code, the pull already does a task_running() test,
so I guess we should be fine.



Yeah, I'm not sure all those task_running() things make sense though,
when !->on_rq && ->on_cpu we should busy wait for tasks, not skip them.

Then again, with this WANT_INTERRUPTS_ON_CTXSW the busy wait crap is
tricky. Luckily its going the way of the Dodo very soon.



So Will this be applied? It will be critical and very hard to debug.
( pretty rare, sometimes system hang without any message )
It has been well tested on our ARM platform.

Thanks
Chanho
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Replies Make a reply
Search Make your own search