load balancing regression since commit 367456c7

April 10th, 2012 - 09:10 pm ET by Tim Chen | Report spam
Peter,

We noticed in a hackbench test (./hackbench 100 process 2000)
on a Sandy bridge 2 socket server, there has been a slow down
by a factor of 4 since commit 367456c7 was applied
(sched: Ditch per cgroup task lists for load-balancing).

The commit 5d6523e (sched: Fix load-balance wreckage) did
not fix the regression.

In the profile, there is heavy spin lock contention in the load_balance path of 3.4-rc2
where it was less than .003% of cpu before commit 367456c7.

When we looked into /proc/schedstat for 3.4-rc2 for the run duration,
on cpu0 schedule was called 13x more often, and schedule call which
left the processor idle was 530x as much.

There was also a big increase in try to wake up remote (sd->ttwu_wake_remote) count.

increase in sd->ttwu_wake_remote for cpu0
domain 0 540%
domain 1 7570%
domain 2 4426%

Wonder if there is unnecessary load balancing to remote cpu?

Tim


profile for 3.4-rc2

7.16% hackbench [kernel.kallsyms] [k] _raw_spin_lock
|
_raw_spin_lock
|
|--56.52%-- load_balance
| idle_balance
| __schedule
| schedule
| |
| |--98.73%-- schedule_timeout
| | |
| | |--97.80%-- unix_stream_recvmsg
| | | sock_aio_read.part.7
| | | sock_aio_read
| | | do_sync_read
| | | vfs_read
| | | sys_read
| | | system_call
| | | __read_nocancel
| | | create_worker
| | | group
| | | main
| | | __libc_start_main
| | |






To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 12 repliesReplies Make a reply

Replies

#1 Peter Zijlstra
April 17th, 2012 - 07:50 am ET | Report spam
On Tue, 2012-04-10 at 18:06 -0700, Tim Chen wrote:
Peter,

We noticed in a hackbench test (./hackbench 100 process 2000)
on a Sandy bridge 2 socket server, there has been a slow down
by a factor of 4 since commit 367456c7 was applied
(sched: Ditch per cgroup task lists for load-balancing).

The commit 5d6523e (sched: Fix load-balance wreckage) did
not fix the regression.

In the profile, there is heavy spin lock contention in the load_balance path of 3.4-rc2
where it was less than .003% of cpu before commit 367456c7.



I can't actually reproduce but does the below help?

If not, can you shoot your .config over?


kernel/sched/fair.c | 8 +++++
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0d97ebd..e1da5c6 100644
a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3215,6 +3215,8 @@ static int move_one_task(struct lb_env *env)

static unsigned long task_h_load(struct task_struct *p);

+static const unsigned int sched_nr_migrate_break = IS_ENABLED(CONFIG_PREEMPT) ? 8 : 32;
+
/*
* move_tasks tries to move up to load_move weighted load from busiest to
* this_rq, as part of a balancing operation within domain "sd".
@@ -3242,7 +3244,7 @@ static int move_tasks(struct lb_env *env)

/* take a breather every nr_migrate tasks */
if (env->loop > env->loop_break) {
- env->loop_break += sysctl_sched_nr_migrate;
+ env->loop_break += sched_nr_migrate_break;
env->flags |= LBF_NEED_BREAK;
break;
}
@@ -4407,7 +4409,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.dst_cpu = this_cpu,
.dst_rq = this_rq,
.idle = idle,
- .loop_break = sysctl_sched_nr_migrate,
+ .loop_break = sched_nr_migrate_break,
+ .loop_max = sysctl_sched_nr_migrate,
};

cpumask_copy(cpus, cpu_active_mask);
@@ -4448,7 +4451,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
env.load_move = imbalance;
env.src_cpu = busiest->cpu;
env.src_rq = busiest;
- env.loop_max = busiest->nr_running;

more_balance:
local_irq_save(flags);


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Similar topics