[RFC PATCH 00/32] Nohz cpusets (was: Nohz Tasks)

August 15th, 2011 - 12:00 pm ET by Frederic Weisbecker | Report spam
So it's still in draft stage. It's far from covering everything
the periodic timer does but it has made some progress since last
posting. So I think it's time now for another early release.

= What's that? =

On the mainline kernel we have a feature (CONFIG_NO_HZ) that is
able to turn off the periodic scheduler tick when the CPU has
nothing to do, namely when it's running the idle task.

The scheduler tick handles many things like RCU and scheduler
internal state, jiffies accouting, wall time accounting, load
accounting, cputime accounting, timer wheel, posix cpu timers,
etc...

However by the time we run idle and the CPU is going to sleep,
none of these things are useful for the CPU. We can then shut it
down.

The benefit of this is for energy saving purposes. We avoid
to wake up the CPU needlessly with these useless interrupts.

What this patchset do is to extend that feature to non idle
cases, implementing some new kind of "adaptive nohz". But the
purpose is different and the implementation too.

= How does that work
It tries to handle all the things that the timer tick usually
handle but using different tricks. Sometimes we can't really
afford to avoid the periodic tick, but sometimes we can and if
we do, we need to take some special care.

- We can't shutdown the tick if we have more than one task
running, due to the need for the tick for preemption. But I believe
that one day we can avoid the periodic tick for that and rather
anticipate when the scheduler really needs the tick.

- We can't shutdown the tick if RCU needs to complete a grace
period from the current CPU, or if it has callbacks to handle.

- We can't shutdown the tick if we have a posix cpu timer queued. Similarly
to the preemption case, we should be able to anticipate that with a
precise timer and avoid a periodic check based on HZ.

- Restart the tick when more than one non-idle task are in the runqueue.

- We need to handle process accounting, RCU, rq clock, task tick, etc...

And that patchset for now only handles a part of the whole needs.

= What's the interface
We use the cpuset interface by adding a nohz flag to it.
As long as a CPU is part of a nohz cpuset, then this CPU will
try to enter into adaptive nohz mode when it can, even if it is part
of another cpuset that is not nohz.

= Why do we need that?
There are at least two potential users of this feature:

* High performance computing: To optimize the throughput, some
workloads involve running one task per CPU that mostly run in
userspace. These tasks don't want and don't need to suffer from the
overhead of the timer interrupt. It consumes CPU time and it trashes
the CPU cache.

* Real time: Minimizing timer interrupts means less interrupts and thus
less critical sections that usually induce latency.

= What's missing?
Many things like handling of perf events, irq work, sched clock tick,
runqueue clock, sched_class::task_tick(), rq clock, cpu load, ...

The handling of cputimes is also incomplete as there are other places
that use the utime/stime. Process time accounting is globally incomplete.

But anyway the thing is moving forward. An early posting was just very
needed at that step.

For those who want to play:

git://git.kernel.org/pub/scm/linux/...racing.git
nohz/cpuset-v1

Frederic Weisbecker (32):
nohz: Drop useless call in tick_nohz_start_idle()
nohz: Drop ts->idle_active
nohz: Drop useless ts->inidle check before rearming the tick
nohz: Separate idle sleeping time accounting from nohz switching
nohz: Move rcu dynticks idle mode handling to idle enter/exit APIs
nohz: Move idle ticks stats tracking out of nohz handlers
nohz: Rename ts->idle_tick to ts->last_tick
nohz: Move nohz load balancer selection into idle logic
nohz: Move ts->idle_calls into strict idle logic
nohz: Move next idle expiring time record into idle logic area
cpuset: Set up interface for nohz flag
nohz: Try not to give the timekeeping duty to a cpuset nohz cpu
nohz: Adaptive tick stop and restart on nohz cpuset
nohz/cpuset: Don't turn off the tick if rcu needs it
nohz/cpuset: Restart tick when switching to idle task
nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued
x86: New cpuset nohz irq vector
nohz/cpuset: Don't stop the tick if posix cpu timers are running
nohz/cpuset: Restart tick when nohz flag is cleared on cpuset
nohz/cpuset: Restart the tick if printk needs it
rcu: Restart the tick on non-responding adaptive nohz CPUs
rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU
nohz/cpuset: Account user and system times in adaptive nohz mode
nohz/cpuset: Handle kernel entry/exit to account cputime
nohz/cpuset: New API to flush cputimes on nohz cpusets
nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting leader
nohz/cpuset: Flush cputimes on procfs stat file read
nohz/cpuset: Flush cputimes for getrusage() and times() syscalls
x86: Syscall hooks for nohz cpusets
x86: Exception hooks for nohz cpusets
rcu: Switch to extended quiescent state in userspace from nohz cpuset
nohz/cpuset: Disable under some configs

arch/Kconfig | 3 +
arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 +-
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 4 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/Kconfig | 1 +
arch/x86/include/asm/entry_arch.h | 3 +
arch/x86/include/asm/hw_irq.h | 6 +
arch/x86/include/asm/irq_vectors.h | 2 +
arch/x86/include/asm/smp.h | 11 +
arch/x86/include/asm/thread_info.h | 10 +-
arch/x86/kernel/entry_64.S | 4 +
arch/x86/kernel/irqinit.c | 4 +
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 5 +-
arch/x86/kernel/ptrace.c | 10 +
arch/x86/kernel/smp.c | 26 ++
arch/x86/kernel/traps.c | 22 +-
arch/x86/mm/fault.c | 13 +-
fs/proc/array.c | 2 +
include/linux/cpuset.h | 29 ++
include/linux/kernel_stat.h | 2 +
include/linux/posix-timers.h | 1 +
include/linux/rcupdate.h | 1 +
include/linux/sched.h | 10 +-
include/linux/tick.h | 50 +++-
init/Kconfig | 8 +
kernel/cpuset.c | 105 +++++++
kernel/exit.c | 2 +
kernel/posix-cpu-timers.c | 12 +
kernel/printk.c | 17 +-
kernel/rcutree.c | 28 ++-
kernel/sched.c | 132 +++++++++-
kernel/softirq.c | 6 +-
kernel/sys.c | 6 +
kernel/time/tick-sched.c | 479 ++++++++++++++++++++++++--
kernel/time/timer_list.c | 4 +-
kernel/timer.c | 8 +-
47 files changed, 897 insertions(+), 185 deletions(-)

1.7.5.4

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 94 repliesReplies Make a reply

Replies

#1 Frederic Weisbecker
August 15th, 2011 - 12:00 pm ET | Report spam
To prepare for having nohz mode switching independant from idle,
pull the idle sleeping time accounting out of the tick stop API.

This implies to implement some new API to call when we
enter/exit idle.

Signed-off-by: Frederic Weisbecker
Cc: Andrew Morton
Cc: Anton Blanchard
Cc: Avi Kivity
Cc: Ingo Molnar
Cc: Lai Jiangshan
Cc: Paul E . McKenney
Cc: Paul Menage
Cc: Peter Zijlstra
Cc: Stephen Hemminger
Cc: Thomas Gleixner
Cc: Tim Pepper

arch/arm/kernel/process.c | 4 +-
arch/avr32/kernel/process.c | 4 +-
arch/blackfin/kernel/process.c | 4 +-
arch/microblaze/kernel/process.c | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/powerpc/kernel/idle.c | 4 +-
arch/powerpc/platforms/iseries/setup.c | 8 +-
arch/s390/kernel/process.c | 4 +-
arch/sh/kernel/idle.c | 4 +-
arch/sparc/kernel/process_64.c | 4 +-
arch/tile/kernel/process.c | 4 +-
arch/um/kernel/process.c | 4 +-
arch/unicore32/kernel/process.c | 4 +-
arch/x86/kernel/process_32.c | 4 +-
arch/x86/kernel/process_64.c | 5 +-
include/linux/tick.h | 10 ++-
kernel/softirq.c | 2 +-
kernel/time/tick-sched.c | 102 ++++++++++++++++++-
18 files changed, 98 insertions(+), 81 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 5e1e541..27b68b0 100644
a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -182,7 +182,7 @@ void cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
leds_event(led_idle_start);
while (!need_resched()) {
#ifdef CONFIG_HOTPLUG_CPU
@@ -208,7 +208,7 @@ void cpu_idle(void)
}
}
leds_event(led_idle_end);
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index ef5a2a0..e683a34 100644
a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched())
cpu_idle_sleep();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 6a660fa..8082a8f 100644
a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@ void cpu_idle(void)
#endif
if (!idle)
idle = default_idle;
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched())
idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 968648a..1b295b2 100644
a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@ void cpu_idle(void)
if (!idle)
idle = default_idle;

- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched())
idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();

preempt_enable_no_resched();
schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index c28fbe6..3aa4020 100644
a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@ void __noreturn cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched() && cpu_online(cpu)) {
#ifdef CONFIG_MIPS_MT_SMTC
extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@ void __noreturn cpu_idle(void)
system_state == SYSTEM_BOOTING))
play_dead();
#endif
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 39a2baa..1108260 100644
a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@ void cpu_idle(void)

set_thread_flag(TIF_POLLING_NRFLAG);
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched() && !cpu_should_die()) {
ppc64_runlatch_off();

@@ -93,7 +93,7 @@ void cpu_idle(void)

HMT_medium();
ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
if (cpu_should_die())
cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index c25a081..d40dcd9 100644
a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@ static void yield_shared_processor(void)
static void iseries_shared_idle(void)
{
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched() && !hvlpevent_is_pending()) {
local_irq_disable();
ppc64_runlatch_off();
@@ -576,7 +576,7 @@ static void iseries_shared_idle(void)
}

ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();

if (hvlpevent_is_pending())
process_iSeries_events();
@@ -592,7 +592,7 @@ static void iseries_dedicated_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);

while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
if (!need_resched()) {
while (!need_resched()) {
ppc64_runlatch_off();
@@ -609,7 +609,7 @@ static void iseries_dedicated_idle(void)
}

ppc64_runlatch_on();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 541a750..560cd94 100644
a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@ static void default_idle(void)
void cpu_idle(void)
{
for (;;) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched())
default_idle();
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 425d604..b7ea6ff 100644
a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -88,7 +88,7 @@ void cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();

while (!need_resched()) {
check_pgt_cache();
@@ -109,7 +109,7 @@ void cpu_idle(void)
start_critical_timings();
}

- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index c158a95..5c36632 100644
a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@ void cpu_idle(void)
set_thread_flag(TIF_POLLING_NRFLAG);

while(1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();

while (!need_resched() && !cpu_is_offline(cpu))
sparc64_yield(cpu);

- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();

preempt_enable_no_resched();

diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 9c45d8b..cc1bd4f 100644
a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@ void cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched()) {
if (cpu_is_offline(cpu))
BUG(); /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@ void cpu_idle(void)
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index fab4371..f1b3864 100644
a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@ void default_idle(void)
if (need_resched())
schedule();

- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
nsecs = disable_timer();
idle_sleep(nsecs);
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
}
}

diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index ba401df..e2df91a 100644
a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@ void cpu_idle(void)
{
/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched()) {
local_irq_disable();
stop_critical_timings();
@@ -63,7 +63,7 @@ void cpu_idle(void)
local_irq_enable();
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index a3d0dc5..1d7e26c 100644
a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -97,7 +97,7 @@ void cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched()) {

check_pgt_cache();
@@ -112,7 +112,7 @@ void cpu_idle(void)
pm_idle();
start_critical_timings();
}
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ca6f7ab..5fce49b 100644
a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -120,7 +120,7 @@ void cpu_idle(void)

/* endless idle loop with no priority at all */
while (1) {
- tick_nohz_stop_sched_tick(1);
+ tick_nohz_enter_idle();
while (!need_resched()) {

rmb();
@@ -144,8 +144,7 @@ void cpu_idle(void)
loops can be woken up without interrupt. */
__exit_idle();
}
-
- tick_nohz_restart_sched_tick();
+ tick_nohz_exit_idle();
preempt_enable_no_resched();
schedule();
preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 532e650..04f6418 100644
a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -120,14 +120,16 @@ static inline int tick_oneshot_mode_active(void) { return 0; }
#endif /* !CONFIG_GENERIC_CLOCKEVENTS */

# ifdef CONFIG_NO_HZ
-extern void tick_nohz_stop_sched_tick(int inidle);
-extern void tick_nohz_restart_sched_tick(void);
+extern void tick_nohz_enter_idle(void);
+extern void tick_nohz_exit_idle(void);
+extern void tick_nohz_irq_exit(void);
extern ktime_t tick_nohz_get_sleep_length(void);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
# else
-static inline void tick_nohz_stop_sched_tick(int inidle) { }
-static inline void tick_nohz_restart_sched_tick(void) { }
+static inline void tick_nohz_enter_idle(void) { }
+static inline void tick_nohz_exit_idle(void) { }
+
static inline ktime_t tick_nohz_get_sleep_length(void)
{
ktime_t len = { .tv64 = NSEC_PER_SEC/HZ };
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 40cf63d..67a1401 100644
a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -343,7 +343,7 @@ void irq_exit(void)
#ifdef CONFIG_NO_HZ
/* Make sure that timer wheel updates are propagated */
if (idle_cpu(smp_processor_id()) && !in_interrupt() && !need_resched())
- tick_nohz_stop_sched_tick(0);
+ tick_nohz_irq_exit();
#endif
preempt_enable_no_resched();
}
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5934aee..df6bb4c 100644
a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -249,38 +249,19 @@ EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
* Called either from the idle loop or from irq_exit() when an idle period was
* just interrupted by an interrupt which did not cause a reschedule.
*/
-void tick_nohz_stop_sched_tick(int inidle)
+static void tick_nohz_stop_sched_tick(ktime_t now)
{
- unsigned long seq, last_jiffies, next_jiffies, delta_jiffies, flags;
+ unsigned long seq, last_jiffies, next_jiffies, delta_jiffies;
struct tick_sched *ts;
- ktime_t last_update, expires, now;
+ ktime_t last_update, expires;
struct clock_event_device *dev = __get_cpu_var(tick_cpu_device).evtdev;
u64 time_delta;
int cpu;

- local_irq_save(flags);
-
cpu = smp_processor_id();
ts = &per_cpu(tick_cpu_sched, cpu);

/*
- * Call to tick_nohz_start_idle stops the last_update_time from being
- * updated. Thus, it must not be called in the event we are called from
- * irq_exit() with the prior state different than idle.
- */
- if (!inidle && !ts->inidle)
- goto end;
-
- /*
- * Set ts->inidle unconditionally. Even if the system did not
- * switch to NOHZ mode the cpu frequency governers rely on the
- * update of the idle time accounting in tick_nohz_start_idle().
- */
- ts->inidle = 1;
-
- now = tick_nohz_start_idle(cpu, ts);
-
- /*
* If this cpu is offline and it is the one which updates
* jiffies, then give up the assignment and let it be taken by
* the cpu which runs the tick timer next. If we don't drop
@@ -293,10 +274,10 @@ void tick_nohz_stop_sched_tick(int inidle)
}

if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
- goto end;
+ return;

if (need_resched())
- goto end;
+ return;

if (unlikely(local_softirq_pending() && cpu_online(cpu))) {
static int ratelimit;
@@ -306,7 +287,7 @@ void tick_nohz_stop_sched_tick(int inidle)
(unsigned int) local_softirq_pending());
ratelimit++;
}
- goto end;
+ return;
}

ts->idle_calls++;
@@ -443,10 +424,31 @@ out:
ts->next_jiffies = next_jiffies;
ts->last_jiffies = last_jiffies;
ts->sleep_length = ktime_sub(dev->next_event, now);
-end:
- local_irq_restore(flags);
}

+static void __tick_nohz_enter_idle(struct tick_sched *ts, int cpu)
+{
+ ktime_t now;
+
+ now = tick_nohz_start_idle(cpu, ts);
+ tick_nohz_stop_sched_tick(now);
+}
+
+void tick_nohz_enter_idle(void)
+{
+ struct tick_sched *ts;
+ int cpu;
+
+ local_irq_disable();
+
+ ts = &__get_cpu_var(tick_cpu_sched);
+ ts->inidle = 1;
+ cpu = smp_processor_id();
+ __tick_nohz_enter_idle(ts, cpu);
+
+ local_irq_enable();
+ }
+
/**
* tick_nohz_get_sleep_length - return the length of the current sleep
*
@@ -490,27 +492,12 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
*
* Restart the idle tick when the CPU is woken up from idle
*/
-void tick_nohz_restart_sched_tick(void)
+static void tick_nohz_restart_sched_tick(ktime_t now, struct tick_sched *ts)
{
int cpu = smp_processor_id();
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
unsigned long ticks;
#endif
- ktime_t now;
-
- local_irq_disable();
-
- if (ts->inidle) {
- now = ktime_get();
- tick_nohz_stop_idle(cpu, now);
- ts->inidle = 0;
- }
-
- if (!ts->tick_stopped) {
- local_irq_enable();
- return;
- }

rcu_exit_nohz();

@@ -541,10 +528,39 @@ void tick_nohz_restart_sched_tick(void)
ts->idle_exittime = now;

tick_nohz_restart(ts, now);
+}
+
+void tick_nohz_exit_idle(void)
+{
+ struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+ ktime_t now;
+
+ local_irq_disable();
+
+ if (!ts->inidle) {
+ local_irq_enable();
+ return;
+ }
+
+ now = ktime_get();
+
+ tick_nohz_stop_idle(smp_processor_id(), now);
+ ts->inidle = 0;
+
+ if (ts->tick_stopped)
+ tick_nohz_restart_sched_tick(now, ts);

local_irq_enable();
}

+void tick_nohz_irq_exit(void)
+{
+ struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
+
+ if (ts->inidle)
+ __tick_nohz_enter_idle(ts, smp_processor_id());
+}
+
static int tick_nohz_reprogram(struct tick_sched *ts, ktime_t now)
{
hrtimer_forward(&ts->sched_timer, now, tick_period);
1.7.5.4

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Similar topics