[RFC PATCH v2 0/4] cpuidle: (POWER) cpuidle driver for pSeries

November 17th, 2011 - 06:40 am ET by Deepthi Dharwar | Report spam
This patch series ports the cpuidle framework for ppc64 platform and
implements a cpuidle back-end driver for ppc64 (pSeries) platform.
Currently idle states are managed by pseries_{dedicated,shared}_idle_sleep()
routines in arch/powerpc/platforms/pseries/setup.c. There are
two idle states (snooze and cede) that are exploited by
these routines based on simple heuristics.

Moving the idle states over to cpuidle framework can take advantage of
the advanced heuristics, tunables, and features provided by cpuidle
framework. Additional idle states like extended cede with hints would be
included and exploited using the cpuidle framework. The statistics and
tracing infrastructure provided by the cpuidle framework also helps in
enabling power management related tools and help tune the system and
applications.

This series aims to maintain compatibility and functionality to
existing pSeries idle cpu management code. There are no new functions
or idle states added as part of this series.

The previous version of this patch can be found at
https://lkml.org/lkml/2011/6/7/375

Changes from the previous version (v1):

1] Rebased to latest 3.2-rc2

2] Incorporated the changes from the feedback provided by Ben
in the previous version of this series.

3] The first three patches in this series posted in v1 are not
re-posted as they were taken from "idle cleanup - v3" posted by
Len Brown (https://lkml.org/lkml/2011/4/2/8) which have made
to the mainline in 3.1.
This cleanup was a community requirement and major dependency
before cpuidle framework can be ported over to powerpc
architecture.

This patch series includes:

[1/4] - Provides arch specific cpu_idle_wait() function required by cpuidle
subsystem.
[2/4] - pseries_idle cpuidle driver
[3/4] - Enables cpuidle for pSeries and directly calls cpuidle_idle_call()
[4/4] - Handles powersave=off kernel boot parameter and disables registration
of pseries_idle cpuidle driver.

This series has been tested on ppc64 pSeries POWER7 system with the snooze
and cede states

arch/powerpc/Kconfig | 4
arch/powerpc/include/asm/processor.h | 3
arch/powerpc/include/asm/system.h | 9 +
arch/powerpc/kernel/idle.c | 26 ++
arch/powerpc/kernel/sysfs.c | 2
arch/powerpc/platforms/Kconfig | 6
arch/powerpc/platforms/pseries/Kconfig | 9 +
arch/powerpc/platforms/pseries/Makefile | 1
arch/powerpc/platforms/pseries/processor_idle.c | 321 +++++++++++++++++++++++
arch/powerpc/platforms/pseries/pseries.h | 3
arch/powerpc/platforms/pseries/setup.c | 89
arch/powerpc/platforms/pseries/smp.c | 1
include/linux/cpuidle.h | 2
13 files changed, 387 insertions(+), 89 deletions(-)
create mode 100644 arch/powerpc/platforms/pseries/processor_idle.c



-Deepthi

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 16 repliesReplies Make a reply

Similar topics

Replies

#11 Deepthi Dharwar
November 29th, 2011 - 01:50 am ET | Report spam
On 11/29/2011 02:09 AM, Benjamin Herrenschmidt wrote:

On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote:

On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to
cpuidle_idle_call by default here. This would result in calling of
cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The
default idle is never executed.
This would be a major design flaw. No fallback idle routine.

We propose to fix this by checking the return value of
ppc_md.power_save() call from void to int.
Right now return value is void, but if we change this to int, this
would solve two problems. One being removing the cast to a function
pointer in the prev patch and this design flaw stated above.

So by checking the return value of ppc_md.power_save(), we can invoke
the default idle on failure. But my only concern is about the effects of
changing the ppc_md.power_save() to return int on other powerpc
architectures. Would it be a good idea to change the return type to int
which would help us flag an error and fallback to default idle?



I would have preferred an approach where the cpuidle module sets
ppc_md.power_save when loaded and restores it when unloaded ... but that
would have to go into the cpuidle core as a powerpc specific tweak and
might not be generally well received.

So go for it, add the return value, but you'll have to update all the
idle functions (grep for power_save in arch/powerpc to find them).





Thanks Ben. Yes, I will update all the idle functions under powerpc.
I will re-work these patches with the discussed changes.

Regards,
Deepthi

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#12 Deepthi Dharwar
November 29th, 2011 - 01:50 am ET | Report spam
On 11/29/2011 02:05 AM, Benjamin Herrenschmidt wrote:

On Mon, 2011-11-28 at 16:32 +0530, Deepthi Dharwar wrote:

Additionally, I'm a bit worried (but maybe we already discussed that a
while back, I don't know) but cpu_idle_wait() has "wait" in the name,
which makes me think it might need to actually -wait- for all cpus to
have come out of the function.



cpu_idle_wait is used to ensure that all the CPUs discard old idle
handler and update to new one. Required while changing idle
handler on SMP systems.

Now your implementation doesn't provide that guarantee. It might be
fine, I don't know, but if it is, you'd better document it well in the
comments surrounding the code, because as it is, all you do is shoot an
interrupt which will cause the target CPU to eventually come out of idle
some time in the future.




I was hoping that sending an explicit reschedule to the cpus would
do the trick but sure we can add some documentation around the code.



Well, the question is what guarantee do you expect. Sending a reschedule
IPI will take the other CPUs out of the actual sleep mode, but it will
be some time from there back to getting out of the handler function
(first back out of hypervisor etc...).

The code as you implemented it doesn't wait for that to happen. It might
be fine ... or not. I don't know what semantics you are after precisely.

Cheers,
Ben.






Yes, this could be problematic as there is small window for the
race condition to occur . Otherwise we need to manually schedule
it by running a kernel thread but this would definitely have a
overhead and would be an overkill.

Regards,
Deepthi


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#13 Benjamin Herrenschmidt
November 29th, 2011 - 02:10 am ET | Report spam
On Tue, 2011-11-29 at 12:12 +0530, Deepthi Dharwar wrote:

Yes, this could be problematic as there is small window for the
race condition to occur . Otherwise we need to manually schedule
it by running a kernel thread but this would definitely have a
overhead and would be an overkill.



Depends what this "window" is. IE. What are you trying to protect
yourself against ? What's the risk ?

If it's just module unload, then stop_machine is probably your
friend :-)

Cheers,
Ben.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#14 Deepthi Dharwar
November 29th, 2011 - 02:20 am ET | Report spam
On 11/29/2011 12:31 PM, Benjamin Herrenschmidt wrote:

On Tue, 2011-11-29 at 12:12 +0530, Deepthi Dharwar wrote:

Yes, this could be problematic as there is small window for the
race condition to occur . Otherwise we need to manually schedule
it by running a kernel thread but this would definitely have a
overhead and would be an overkill.



Depends what this "window" is. IE. What are you trying to protect
yourself against ? What's the risk ?

If it's just module unload, then stop_machine is probably your
friend :-)

Cheers,
Ben.






Yup, it is the module unload that I am worried about. Otherwise
manually doing it using kernel thread would be an overkill -:(

Regards,
Deepthi

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#15 Deepthi Dharwar
November 29th, 2011 - 08:30 pm ET | Report spam
On 11/29/2011 12:14 PM, Deepthi Dharwar wrote:

On 11/29/2011 02:09 AM, Benjamin Herrenschmidt wrote:

On Mon, 2011-11-28 at 16:33 +0530, Deepthi Dharwar wrote:

On an LPAR if cpuidle is disabled, ppc_md.power_save is still set to
cpuidle_idle_call by default here. This would result in calling of
cpuidle_idle_call repeatedly, only for the call to return -ENODEV. The
default idle is never executed.
This would be a major design flaw. No fallback idle routine.

We propose to fix this by checking the return value of
ppc_md.power_save() call from void to int.
Right now return value is void, but if we change this to int, this
would solve two problems. One being removing the cast to a function
pointer in the prev patch and this design flaw stated above.






kernel/idle.c: ppc_md.power_save = NULL;

So by checking the return value of ppc_md.power_save(), we can invoke
the default idle on failure. But my only concern is about the effects of
changing the ppc_md.power_save() to return int on other powerpc
architectures. Would it be a good idea to change the return type to int
which would help us flag an error and fallback to default idle?



I would have preferred an approach where the cpuidle module sets
ppc_md.power_save when loaded and restores it when unloaded ... but that
would have to go into the cpuidle core as a powerpc specific tweak and
might not be generally well received.

So go for it, add the return value, but you'll have to update all the
idle functions (grep for power_save in arch/powerpc to find them).





Thanks Ben. Yes, I will update all the idle functions under powerpc.
I will re-work these patches with the discussed changes.

Regards,
Deepthi

_______________________________________________
linux-pm mailing list

https://lists.linuxfoundation.org/m...o/linux-pm





Hi Ben,

I was trying to add a return value for power_save for all arch/powepc
idle functions but a few of them directly call *.S routines, as they
are asm.

What would be a good way to change the return value for asm routines ?
Do we make a change in asm only, put the return value in r3 or write a
wrapper function which would call these asm routines and return an int ?

Regards,
Deepthi

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Previous pageReplies Make a reply
Search Make your own search